Multilingual Q&A
Massively Multilingual MMLU. Knowledge and reasoning across 14 languages — exposes which models actually generalize beyond English.
Average MMLU accuracy across 14 languages (knowledge + reasoning, not just translation). Higher is better. Hover a bar to reveal the exact score.
Every model's overall and per-language accuracy across all 14 MMMLU language subsets. Click any column header to sort. Bold cells mark the leader for each language.
| # | Model | Overall | FR-FR | PT-BR | BN-BD | JA-JP | DE-DE | YO-NG | ES-LA | ID-ID | ZH-CN | SW-KE | IT-IT | AR-XY | KO-KR | HI-IN |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Interfaze | 90.9% | 91.9% | 93.0% | 90.4% | 91.9% | 91.8% | 83.9% | 92.1% | 91.5% | 90.5% | 89.8% | 92.4% | 90.5% | 90.1% | 92.6% |
| 2 | Grok-4.3 | 89.7% | 92.1% | 92.4% | 90.1% | 90.4% | 90.5% | 76.3% | 91.4% | 91.2% | 90.6% | 88.0% | 91.9% | 90.2% | 90.7% | 90.4% |
| 3 | Gemini-3-Flash | 88.7% | 90.4% | 91.5% | 87.4% | 90.5% | 89.5% | 79.3% | 90.8% | 90.5% | 89.3% | 86.3% | 90.1% | 88.6% | 88.0% | 89.9% |
| 4 | Claude-Sonnet-4.6 | 84.9% | 87.6% | 88.6% | 85.1% | 87.6% | 87.5% | 66.7% | 88.0% | 86.5% | 85.8% | 79.8% | 86.7% | 86.0% | 86.9% | 86.2% |
| 5 | GPT-5.4-Mini | 75.3% | 80.0% | 79.6% | 74.0% | 78.5% | 76.7% | 57.0% | 79.0% | 77.2% | 76.6% | 70.5% | 79.7% | 75.4% | 74.2% | 76.5% |
How each model performs on individual MMMLU language subsets. Each chart describes the language and any notable evaluation context.