Native OCR
Reading text directly from images: multilingual scripts, low-quality scans, handwriting, structured layouts, charts, and screenshots. The benchmark that matters when the document never reaches a parser.
Composite OCR score across 5 sub-tasks (text recognition, VQA, key-info extraction, formulas, tables). Higher is better. Hover a bar to reveal the exact score.
Every model's average and per-task accuracy across the eight OCRBench V2 task categories. Click any column header to sort. Bold cells mark the leader for each task.
| # | Model | Avg | Recog | Refer | Spot | Extract | Parse | Calc | Understand | Reason |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Interfaze | 70.7% | 73.8% | 72.8% | 59.2% | 91.6% | 47.3% | 70.5% | 77.3% | 72.8% |
| 2 | Gemini-3-Flash | 55.8% | 67.7% | 52.4% | 7.2% | 89.9% | 41.7% | 48.9% | 77.2% | 61.2% |
| 3 | Grok-4.3 | 54.7% | 65.7% | 49.5% | 3.5% | 88.3% | 42.1% | 56.7% | 75.2% | 56.3% |
| 4 | Claude-Sonnet-4.6 | 54.7% | 71.4% | 48.1% | 13.9% | 84.6% | 44.0% | 41.0% | 74.8% | 59.8% |
| 5 | GPT-5.4-Mini | 52.7% | 60.0% | 52.8% | 10.4% | 82.9% | 43.8% | 39.5% | 71.8% | 60.7% |
How each model performs on individual OCRBench V2 task categories. Each chart describes what the task measures.