GPQA Diamond

Name: GPQA Diamond — AI model leaderboard
Creator: Interfaze
License: https://creativecommons.org/licenses/by/4.0/
Keywords: GPQA Diamond, PhD-level problem solving, AI benchmark, model leaderboard, Interfaze

The hardest split of GPQA across physics, chemistry, biology, and graduate-level reasoning. Designed so domain experts score ~65% and laypeople ~30% even with web access.

Percent correct on PhD-level science questions — physics, chemistry, biology, graduate reasoning. Higher is better. Hover a bar to reveal the exact score.

Model rankings

Per-domain breakdown

Every model's overall score and per-domain accuracy across physics, chemistry, and biology. Click any column header to sort. Bold cells mark the leader for each domain.

#	Model	Overall	Physics	Chemistry	Biology
1	Interfaze	92.4%	95.3%	88.2%	73.7%
2	Claude-Sonnet-4.6	89.9%	93.0%	89.0%	80.0%
3	Gemini-3-Flash	88.5%	96.1%	84.6%	73.4%
4	GPT-5.4-Mini	82.8%	90.7%	75.3%	84.2%
5	Grok-4.3	73.6%	79.1%	67.7%	77.8%

Domain details

How each model performs on individual GPQA Diamond science domains. Each chart describes what the domain measures.

Back to all leaderboards