Interfaze

logo

Beta

pricing

docs

blog

sign in

All leaderboards

Native OCR

OCRBench V2

Reading text directly from images: multilingual scripts, low-quality scans, handwriting, structured layouts, charts, and screenshots. The benchmark that matters when the document never reaches a parser.

Composite OCR score across 5 sub-tasks (text recognition, VQA, key-info extraction, formulas, tables). Higher is better. Hover a bar to reveal the exact score.

Model rankings

Per-task breakdown

Every model's average and per-task accuracy across the eight OCRBench V2 task categories. Click any column header to sort. Bold cells mark the leader for each task.

#Model
Avg
Recog
Refer
Spot
Extract
Parse
Calc
Understand
Reason
1Interfaze70.7%73.8%72.8%59.2%91.6%47.3%70.5%77.3%72.8%
2Gemini-3-Flash55.8%67.7%52.4%7.2%89.9%41.7%48.9%77.2%61.2%
3Grok-4.354.7%65.7%49.5%3.5%88.3%42.1%56.7%75.2%56.3%
4Claude-Sonnet-4.654.7%71.4%48.1%13.9%84.6%44.0%41.0%74.8%59.8%
5GPT-5.4-Mini52.7%60.0%52.8%10.4%82.9%43.8%39.5%71.8%60.7%

Task details

How each model performs on individual OCRBench V2 task categories. Each chart describes what the task measures.