MMMLU

Name: MMMLU — AI model leaderboard
Creator: Interfaze
License: https://creativecommons.org/licenses/by/4.0/
Keywords: MMMLU, Multilingual Q&A, AI benchmark, model leaderboard, Interfaze

Massively Multilingual MMLU. Knowledge and reasoning across 14 languages — exposes which models actually generalize beyond English.

Average MMLU accuracy across 14 languages (knowledge + reasoning, not just translation). Higher is better. Hover a bar to reveal the exact score.

Model rankings

Per-language breakdown

Every model's overall and per-language accuracy across all 14 MMMLU language subsets. Click any column header to sort. Bold cells mark the leader for each language.

#	Model	Overall	FR-FR	PT-BR	BN-BD	JA-JP	DE-DE	YO-NG	ES-LA	ID-ID	ZH-CN	SW-KE	IT-IT	AR-XY	KO-KR	HI-IN
1	Interfaze	90.9%	91.9%	93.0%	90.4%	91.9%	91.8%	83.9%	92.1%	91.5%	90.5%	89.8%	92.4%	90.5%	90.1%	92.6%
2	Grok-4.3	89.7%	92.1%	92.4%	90.1%	90.4%	90.5%	76.3%	91.4%	91.2%	90.6%	88.0%	91.9%	90.2%	90.7%	90.4%
3	Gemini-3-Flash	88.7%	90.4%	91.5%	87.4%	90.5%	89.5%	79.3%	90.8%	90.5%	89.3%	86.3%	90.1%	88.6%	88.0%	89.9%
4	Claude-Sonnet-4.6	84.9%	87.6%	88.6%	85.1%	87.6%	87.5%	66.7%	88.0%	86.5%	85.8%	79.8%	86.7%	86.0%	86.9%	86.2%
5	GPT-5.4-Mini	75.3%	80.0%	79.6%	74.0%	78.5%	76.7%	57.0%	79.0%	77.2%	76.6%	70.5%	79.7%	75.4%	74.2%	76.5%

Language details

How each model performs on individual MMMLU language subsets. Each chart describes the language and any notable evaluation context.

Back to all leaderboards