Interfaze

logo

Beta

pricing

docs

blog

sign in

All leaderboards

Multilingual Q&A

MMMLU

Massively Multilingual MMLU. Knowledge and reasoning across 14 languages — exposes which models actually generalize beyond English.

Average MMLU accuracy across 14 languages (knowledge + reasoning, not just translation). Higher is better. Hover a bar to reveal the exact score.

Model rankings

Per-language breakdown

Every model's overall and per-language accuracy across all 14 MMMLU language subsets. Click any column header to sort. Bold cells mark the leader for each language.

#Model
Overall
FR-FR
PT-BR
BN-BD
JA-JP
DE-DE
YO-NG
ES-LA
ID-ID
ZH-CN
SW-KE
IT-IT
AR-XY
KO-KR
HI-IN
1Interfaze90.9%91.9%93.0%90.4%91.9%91.8%83.9%92.1%91.5%90.5%89.8%92.4%90.5%90.1%92.6%
2Grok-4.389.7%92.1%92.4%90.1%90.4%90.5%76.3%91.4%91.2%90.6%88.0%91.9%90.2%90.7%90.4%
3Gemini-3-Flash88.7%90.4%91.5%87.4%90.5%89.5%79.3%90.8%90.5%89.3%86.3%90.1%88.6%88.0%89.9%
4Claude-Sonnet-4.684.9%87.6%88.6%85.1%87.6%87.5%66.7%88.0%86.5%85.8%79.8%86.7%86.0%86.9%86.2%
5GPT-5.4-Mini75.3%80.0%79.6%74.0%78.5%76.7%57.0%79.0%77.2%76.6%70.5%79.7%75.4%74.2%76.5%

Language details

How each model performs on individual MMMLU language subsets. Each chart describes the language and any notable evaluation context.