Open Persian LLM Alignment Leaderboard

{

"headers": [
- "Model",
- "Safty",
- "Fairness",
- "Socail-norm",
- "GuardBench_fa",
- "ProhibiBench_fa",
- "SafeBench_fa",
- "FairBench_fa",
- "SocialBench_fa",
- "Advbench_fa",
- "DecodingTrust_fa",
- "Anthropic_fa",
- "Harmbench_fa",
- "Average",
- "Type",
- "Precision",
- "Hub License",
- "#Params (B)",
- "Available on the hub",
- "Model sha"
],
"data": [
- [
  - "<a target="_blank" href="https://huggingface.co/ava/model" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">ava/model</a>",
  - 95.16,
  - 87.63,
  - 86.16,
  - 85.89,
  - 89.35,
  - 94.02,
  - 94.62,
  - 90.67,
  - 95.61,
  - 87.67,
  - 95.87,
  - 94.3,
  - 87.91,
  - "",
  - "float16",
  - "?",
  - 0,
  - false,
  - "revision on the hub"
  ],
- [
  - "<a target="_blank" href="https://huggingface.co/Dorna2-Llama3.1-8B-Instruct/model" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Dorna2-Llama3.1-8B-Instruct/model</a>",
  - 75.96,
  - 77.73,
  - 79,
  - 79.22,
  - 73.6,
  - 63.54,
  - 83.64,
  - 99.38,
  - 84.47,
  - 76.49,
  - 75.09,
  - 74.25,
  - 78,
  - "",
  - "float16",
  - "?",
  - 0,
  - false,
  - "revision on the hub"
  ],
- [
  - "<a target="_blank" href="https://huggingface.co/Ministral-8B-Instruct-2410/model" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Ministral-8B-Instruct-2410/model</a>",
  - 74.61,
  - 79.23,
  - 45.91,
  - 43.37,
  - 72.92,
  - 65.83,
  - 80.37,
  - 92.5,
  - 80.2,
  - 79.46,
  - 79.74,
  - 53.04,
  - 59.34,
  - "",
  - "float16",
  - "?",
  - 0,
  - false,
  - "revision on the hub"
  ],
- [
  - "<a target="_blank" href="https://huggingface.co/Qwen2.5-3B-Instruct/model" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen2.5-3B-Instruct/model</a>",
  - 69.75,
  - 62.66,
  - 47.77,
  - 46.53,
  - 61.92,
  - 59.8,
  - 60.09,
  - 63.12,
  - 76.84,
  - 62.35,
  - 63.2,
  - 62.37,
  - 55.29,
  - "",
  - "float16",
  - "?",
  - 0,
  - false,
  - "revision on the hub"
  ],
- [
  - "<a target="_blank" href="https://huggingface.co/Qwen2.5-7B-Instruct/model" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen2.5-7B-Instruct/model</a>",
  - 77.8,
  - 71.16,
  - 51.74,
  - 50.08,
  - 69.28,
  - 70.93,
  - 75.66,
  - 83.75,
  - 83.5,
  - 71.03,
  - 78.02,
  - 68,
  - 61.09,
  - "",
  - "float16",
  - "?",
  - 0,
  - false,
  - "revision on the hub"
  ],
- [
  - "<a target="_blank" href="https://huggingface.co/ahoran/model" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">ahoran/model</a>",
  - 63.06,
  - 75.23,
  - 67.38,
  - 66.73,
  - 71.65,
  - 50.73,
  - 83.87,
  - 99.38,
  - 65.66,
  - 74.2,
  - 74.44,
  - 49.85,
  - 68.65,
  - "",
  - "float16",
  - "?",
  - 0,
  - false,
  - "revision on the hub"
  ],
- [
  - "<a target="_blank" href="https://huggingface.co/ahoran_dppo/model" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">ahoran_dppo/model</a>",
  - 87.27,
  - 84.44,
  - 76.48,
  - 75.71,
  - 84.94,
  - 84.51,
  - 91.59,
  - 85.62,
  - 87.98,
  - 84.39,
  - 88.78,
  - 81.7,
  - 80,
  - "",
  - "float16",
  - "?",
  - 0,
  - false,
  - "revision on the hub"
  ],
- [
  - "<a target="_blank" href="https://huggingface.co/aya-expanse-8b/model" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">aya-expanse-8b/model</a>",
  - 89.89,
  - 87.48,
  - 85.55,
  - 85.37,
  - 85.17,
  - 85.29,
  - 93.87,
  - 99.38,
  - 93.96,
  - 87.31,
  - 93.49,
  - 84.22,
  - 86.36,
  - "",
  - "float16",
  - "?",
  - 0,
  - false,
  - "revision on the hub"
  ],
- [
  - "<a target="_blank" href="https://huggingface.co/gita/model" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">gita/model</a>",
  - 89.34,
  - 81.64,
  - 78.14,
  - 77.36,
  - 84.38,
  - 84.22,
  - 75.85,
  - 76.25,
  - 90.04,
  - 81.91,
  - 94.74,
  - 95.33,
  - 80.69,
  - "",
  - "float16",
  - "?",
  - 0,
  - false,
  - "revision on the hub"
  ],
- [
  - "<a target="_blank" href="https://huggingface.co/gemma-2-2b-it/model" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">gemma-2-2b-it/model</a>",
  - 94.67,
  - 85.3,
  - 43.75,
  - 40.67,
  - 87.42,
  - 95.1,
  - 92.99,
  - 99.38,
  - 95.1,
  - 82.94,
  - 93.71,
  - 94.52,
  - 63.2,
  - "",
  - "float16",
  - "?",
  - 0,
  - false,
  - "revision on the hub"
  ],
- [
  - "<a target="_blank" href="https://huggingface.co/gemma-2-9b-it/model" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">gemma-2-9b-it/model</a>",
  - 97.25,
  - 89.72,
  - 72.1,
  - 70.3,
  - 91.51,
  - 96.1,
  - 97.17,
  - 10,
  - 97.76,
  - 89.58,
  - 98.03,
  - 98.22,
  - 80.71,
  - "",
  - "float16",
  - "?",
  - 0,
  - false,
  - "revision on the hub"
  ],
- [
  - "<a target="_blank" href="https://huggingface.co/sialk/model" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">sialk/model</a>",
  - 80.21,
  - 64.38,
  - 81.56,
  - 83.05,
  - 83.13,
  - 93.63,
  - 94.86,
  - 91.88,
  - 78.96,
  - 55.1,
  - 89.05,
  - 47.63,
  - 76.71,
  - "",
  - "float16",
  - "?",
  - 0,
  - false,
  - "revision on the hub"
  ],
- [
  - "<a target="_blank" href="https://huggingface.co/sialk_dpo/model" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">sialk_dpo/model</a>",
  - 58.2,
  - 55.71,
  - 78.72,
  - 78.84,
  - 77.99,
  - 49.51,
  - 79.16,
  - 83.12,
  - 50.23,
  - 51.98,
  - 69.05,
  - 27.26,
  - 69.1,
  - "",
  - "float16",
  - "?",
  - 0,
  - false,
  - "revision on the hub"
  ]
],
"metadata": null

}

Open Persian LLM Alignment Leaderboard

Developed by MCILAB in collaboration with the Machine Learning Laboratory at Sharif University of Technology , this benchmark is based on the open-source ELAB where presents a comprehensive evaluation framework for assessing the alignment of Persian Large Language Models (LLMs) with critical ethical dimensions, including safety, fairness, and social norms. Addressing the gaps in existing LLM evaluation frameworks, this benchmark is specifically tailored to Persian linguistic and cultural contexts.

It combines three types of Persian-language benchmarks:

1. Translated datasets (adapted from established English benchmarks)
2. Synthetically generated data (newly created for Persian LLMs)
3. Naturally collected data (reflecting indigenous cultural nuances)

Key Datasets in the Benchmark

The benchmark integrates the following datasets to ensure a robust evaluation of Persian LLMs:

Translated Datasets

Anthropic-fa

AdvBench-fa

HarmBench-fa

DecodingTrust-fa

Newly Developed Persian Datasets

ProhibiBench-fa: Evaluates harmful and prohibited content in Persian culture.

SafeBench-fa: Assesses safety in generated outputs.

FairBench-fa: Measures bias mitigation in Persian LLMs.

SocialBench-fa: Evaluates adherence to culturally accepted behaviors.

Naturally Collected Persian Dataset

GuardBench-fa: A large-scale dataset designed to align Persian LLMs with local cultural norms.

A Unified Framework for Persian LLM Evaluation

By combining these datasets, our work establishes a culturally grounded alignment evaluation framework, enabling systematic assessment across three key aspects:

- Safety:  Avoiding harmful or toxic content.
- Fairness:  Mitigating biases in model outputs.
- Social Norms:  Ensuring culturally appropriate behavior.

This benchmark not only fills a critical gap in Persian LLM evaluation but also provides a standardized leaderboard to track progress in developing aligned, ethical, and culturally aware Persian language models.

Download Dataset

The full dataset is not publicly accessible. For research purposes, you may submit your request following the dataset request guidelines.

About Our Models

In addition to the public models shown on this leaderboard, our company has developed 4 proprietary LLMs that are currently not publicly available:

Ava: Specialized in Persian RAG and normal tasks
Gita: Optimized for normal tasks with good reasoning
Ahoran: Our largest model with 8B param
Sialk: Lightweight model offering for normal Persian generations tasks

These models remain unpublished due to [reason - e.g., internal deployment strategy, ongoing refinement].

Open Persian LLM Alignment Leaderboard