Interfaze

logo

Beta

pricing

docs

blog

sign in

All leaderboards

Multimodal understanding

MMMU-Pro

Hard subset of MMMU for college-level multimodal problems. Removes shortcut-prone questions to isolate true vision-language reasoning over diagrams, charts, and figures.

Accuracy on hard college-level multimodal questions — diagrams, charts, figures + text. Higher is better. Hover a bar to reveal the exact score.

Model rankings

Headline MMMU-Pro score is the mean of two tracks: standard (text + image, n=1729) and vision-only (image-only, n=1730).

Overall scores

Each model's combined MMMU-Pro score alongside the two track scores it averages. Click any column header to sort. Bold cells mark the leader for each column.

#Model
Overall
Standard
Vision-only
1Interfaze71.1%72.5%69.7%
2Grok-4.368.7%69.4%67.9%
3Gemini-3-Flash67.6%68.0%67.2%
4Claude-Sonnet-4.646.3%47.1%45.5%
5GPT-5.4-Mini40.4%42.0%38.8%

MMMU-Pro standard

Standard MMMU-Pro track (n=1729). Models receive both the rendered question image and surrounding text. Per-subject accuracy across 30 college-level subjects. Click any column header to sort. Bold cells mark the leader for each subject.

#Model
Overall
Accounting
Agriculture
Architecture & Engineering
Art
Art Theory
Basic Medical Science
Biology
Chemistry
Clinical Medicine
Computer Science
Design
Diagnostics & Lab Medicine
Economics
Electronics
Energy & Power
Finance
Geography
History
Literature
Manage
Marketing
Materials
Math
Mechanical Engineering
Music
Pharmacy
Physics
Psychology
Public Health
Sociology
1Interfaze72.5%65.5%68.3%65.0%90.6%83.6%82.7%72.9%61.7%67.8%65.0%75.0%53.3%89.1%90.0%77.6%86.4%71.2%76.8%78.8%70.0%88.1%73.3%60.0%52.5%35.0%68.4%78.3%71.7%82.8%79.6%
2Grok-4.369.4%82.8%58.3%66.7%77.4%81.8%59.6%64.4%70.0%61.0%65.0%80.0%40.0%91.5%70.0%56.9%88.3%73.1%71.4%82.7%70.0%86.4%55.0%70.0%49.1%31.7%84.2%78.3%70.0%74.1%77.8%
3Gemini-3-Flash68.0%63.8%70.0%48.3%83.0%85.5%84.6%76.3%70.0%72.9%63.3%75.0%51.7%84.7%75.0%44.8%65.0%75.0%80.4%80.8%56.0%69.5%46.7%63.3%61.0%28.3%71.9%71.7%75.0%79.3%75.9%
4Claude-Sonnet-4.647.1%48.3%57.1%13.3%83.9%81.8%51.9%40.7%51.7%47.5%45.0%71.7%40.0%59.3%16.7%17.2%31.7%55.8%69.6%75.0%38.0%45.6%21.7%33.3%15.3%33.3%59.7%30.0%56.7%70.7%64.8%
5GPT-5.4-Mini42.0%27.6%40.0%21.7%75.5%65.5%57.7%40.7%31.7%47.5%50.0%70.0%38.3%30.5%55.0%22.4%23.3%36.5%53.6%71.2%30.0%43.2%25.0%26.7%37.3%26.7%54.4%31.7%43.3%36.2%57.4%

MMMU-Pro vision-only

Vision-only MMMU-Pro track (n=1730). The model only sees the rendered image — no text context — so VLM grounding becomes the bottleneck. Same 30 subjects, same sortable layout.

#Model
Overall
Accounting
Agriculture
Architecture & Engineering
Art
Art Theory
Basic Medical Science
Biology
Chemistry
Clinical Medicine
Computer Science
Design
Diagnostics & Lab Medicine
Economics
Electronics
Energy & Power
Finance
Geography
History
Literature
Manage
Marketing
Materials
Math
Mechanical Engineering
Music
Pharmacy
Physics
Psychology
Public Health
Sociology
1Interfaze69.7%62.3%63.7%45.3%83.1%89.3%88.5%69.8%73.7%78.3%68.7%75.3%57.0%90.1%73.7%53.7%63.7%73.2%75.2%82.8%66.0%73.2%52.0%70.3%57.9%43.7%79.2%75.3%60.3%76.1%77.9%
2Grok-4.367.9%82.8%53.3%68.3%67.9%83.6%57.7%55.9%70.0%55.9%71.7%78.3%40.0%88.1%68.3%62.1%83.3%67.3%71.4%82.7%66.0%86.4%45.0%75.0%54.2%41.7%73.7%80.0%58.3%81.0%70.4%
3Gemini-3-Flash67.2%62.1%66.7%38.3%81.1%87.3%80.8%67.8%68.3%74.6%66.7%73.3%56.7%79.7%76.7%51.7%65.0%78.8%73.2%84.6%56.0%71.2%41.7%63.3%54.2%36.7%75.4%70.0%68.3%79.3%74.1%
4Claude-Sonnet-4.645.5%50.0%45.0%18.3%73.6%85.5%63.5%50.8%43.3%52.5%48.3%68.3%41.7%59.3%15.0%19.0%28.3%51.9%69.6%75.0%28.0%40.7%15.0%31.7%17.0%21.7%47.4%36.7%48.3%63.8%70.4%
5GPT-5.4-Mini38.8%19.0%28.3%20.0%52.8%60.0%48.1%40.7%35.0%54.2%35.0%66.7%36.7%30.5%46.7%15.5%16.7%44.2%57.1%61.5%30.0%39.0%21.7%36.7%35.6%26.7%52.6%45.0%36.7%25.9%53.7%