MMMU-Pro

Name: MMMU-Pro — AI model leaderboard
Creator: Interfaze
License: https://creativecommons.org/licenses/by/4.0/
Keywords: MMMU-Pro, Multimodal understanding, AI benchmark, model leaderboard, Interfaze

Hard subset of MMMU for college-level multimodal problems. Removes shortcut-prone questions to isolate true vision-language reasoning over diagrams, charts, and figures.

Accuracy on hard college-level multimodal questions — diagrams, charts, figures + text. Higher is better. Hover a bar to reveal the exact score.

Model rankings

Headline MMMU-Pro score is the mean of two tracks: standard (text + image, n=1729) and vision-only (image-only, n=1730).

Overall scores

Each model's combined MMMU-Pro score alongside the two track scores it averages. Click any column header to sort. Bold cells mark the leader for each column.

#	Model	Overall	Standard	Vision-only
1	Interfaze	71.1%	72.5%	69.7%
2	Grok-4.3	68.7%	69.4%	67.9%
3	Gemini-3-Flash	67.6%	68.0%	67.2%
4	Claude-Sonnet-4.6	46.3%	47.1%	45.5%
5	GPT-5.4-Mini	40.4%	42.0%	38.8%

MMMU-Pro standard

Standard MMMU-Pro track (n=1729). Models receive both the rendered question image and surrounding text. Per-subject accuracy across 30 college-level subjects. Click any column header to sort. Bold cells mark the leader for each subject.

#	Model	Overall	Accounting	Agriculture	Architecture & Engineering	Art	Art Theory	Basic Medical Science	Biology	Chemistry	Clinical Medicine	Computer Science	Design	Diagnostics & Lab Medicine	Economics	Electronics	Energy & Power	Finance	Geography	History	Literature	Manage	Marketing	Materials	Math	Mechanical Engineering	Music	Pharmacy	Physics	Psychology	Public Health	Sociology
1	Interfaze	72.5%	65.5%	68.3%	65.0%	90.6%	83.6%	82.7%	72.9%	61.7%	67.8%	65.0%	75.0%	53.3%	89.1%	90.0%	77.6%	86.4%	71.2%	76.8%	78.8%	70.0%	88.1%	73.3%	60.0%	52.5%	35.0%	68.4%	78.3%	71.7%	82.8%	79.6%
2	Grok-4.3	69.4%	82.8%	58.3%	66.7%	77.4%	81.8%	59.6%	64.4%	70.0%	61.0%	65.0%	80.0%	40.0%	91.5%	70.0%	56.9%	88.3%	73.1%	71.4%	82.7%	70.0%	86.4%	55.0%	70.0%	49.1%	31.7%	84.2%	78.3%	70.0%	74.1%	77.8%
3	Gemini-3-Flash	68.0%	63.8%	70.0%	48.3%	83.0%	85.5%	84.6%	76.3%	70.0%	72.9%	63.3%	75.0%	51.7%	84.7%	75.0%	44.8%	65.0%	75.0%	80.4%	80.8%	56.0%	69.5%	46.7%	63.3%	61.0%	28.3%	71.9%	71.7%	75.0%	79.3%	75.9%
4	Claude-Sonnet-4.6	47.1%	48.3%	57.1%	13.3%	83.9%	81.8%	51.9%	40.7%	51.7%	47.5%	45.0%	71.7%	40.0%	59.3%	16.7%	17.2%	31.7%	55.8%	69.6%	75.0%	38.0%	45.6%	21.7%	33.3%	15.3%	33.3%	59.7%	30.0%	56.7%	70.7%	64.8%
5	GPT-5.4-Mini	42.0%	27.6%	40.0%	21.7%	75.5%	65.5%	57.7%	40.7%	31.7%	47.5%	50.0%	70.0%	38.3%	30.5%	55.0%	22.4%	23.3%	36.5%	53.6%	71.2%	30.0%	43.2%	25.0%	26.7%	37.3%	26.7%	54.4%	31.7%	43.3%	36.2%	57.4%

MMMU-Pro vision-only

Vision-only MMMU-Pro track (n=1730). The model only sees the rendered image — no text context — so VLM grounding becomes the bottleneck. Same 30 subjects, same sortable layout.

#	Model	Overall	Accounting	Agriculture	Architecture & Engineering	Art	Art Theory	Basic Medical Science	Biology	Chemistry	Clinical Medicine	Computer Science	Design	Diagnostics & Lab Medicine	Economics	Electronics	Energy & Power	Finance	Geography	History	Literature	Manage	Marketing	Materials	Math	Mechanical Engineering	Music	Pharmacy	Physics	Psychology	Public Health	Sociology
1	Interfaze	69.7%	62.3%	63.7%	45.3%	83.1%	89.3%	88.5%	69.8%	73.7%	78.3%	68.7%	75.3%	57.0%	90.1%	73.7%	53.7%	63.7%	73.2%	75.2%	82.8%	66.0%	73.2%	52.0%	70.3%	57.9%	43.7%	79.2%	75.3%	60.3%	76.1%	77.9%
2	Grok-4.3	67.9%	82.8%	53.3%	68.3%	67.9%	83.6%	57.7%	55.9%	70.0%	55.9%	71.7%	78.3%	40.0%	88.1%	68.3%	62.1%	83.3%	67.3%	71.4%	82.7%	66.0%	86.4%	45.0%	75.0%	54.2%	41.7%	73.7%	80.0%	58.3%	81.0%	70.4%
3	Gemini-3-Flash	67.2%	62.1%	66.7%	38.3%	81.1%	87.3%	80.8%	67.8%	68.3%	74.6%	66.7%	73.3%	56.7%	79.7%	76.7%	51.7%	65.0%	78.8%	73.2%	84.6%	56.0%	71.2%	41.7%	63.3%	54.2%	36.7%	75.4%	70.0%	68.3%	79.3%	74.1%
4	Claude-Sonnet-4.6	45.5%	50.0%	45.0%	18.3%	73.6%	85.5%	63.5%	50.8%	43.3%	52.5%	48.3%	68.3%	41.7%	59.3%	15.0%	19.0%	28.3%	51.9%	69.6%	75.0%	28.0%	40.7%	15.0%	31.7%	17.0%	21.7%	47.4%	36.7%	48.3%	63.8%	70.4%
5	GPT-5.4-Mini	38.8%	19.0%	28.3%	20.0%	52.8%	60.0%	48.1%	40.7%	35.0%	54.2%	35.0%	66.7%	36.7%	30.5%	46.7%	15.5%	16.7%	44.2%	57.1%	61.5%	30.0%	39.0%	21.7%	36.7%	35.6%	26.7%	52.6%	45.0%	36.7%	25.9%	53.7%

Back to all leaderboards