Interfaze

logo

Beta

pricing

docs

blog

sign in

All leaderboards

Complex document processing

olmOCR

End-to-end document understanding on long, layout-rich PDFs with tables, footnotes, equations, headers, and multi-column flows. Tests whether the model preserves reading order, not just characters.

Mean accuracy on long, layout-rich PDFs — graded against the original document, including reading order. Higher is better. Hover a bar to reveal the exact score.

Model rankings

Includes general-purpose LLMs and purpose-built OCR systems.

Specialized OCR models

Beyond general-purpose LLMs, Interfaze also outperforms purpose-built OCR systems on the same benchmark — the models you'd reach for if you were going to wire up a dedicated document pipeline.

Per-task breakdown

Every model's overall score and per-task accuracy across the eight olmOCR-bench task categories. Click any column header to sort. Bold cells mark the leader for each task.

#Model
Overall
ArXiv
Old Scans Math
Tables
Old Scans
Headers
Multi-Column
Long Tiny Text
Base
1Interfaze85.7%87.2%88.9%86.4%53.1%91.7%83.6%94.8%99.8%
2Chandra OCR 284.3%86.5%83.0%87.9%49.2%92.5%81.3%93.9%99.9%
3olmOCR v0.4.082.4%83.0%82.3%84.9%47.7%96.1%83.7%81.9%99.7%
4Grok-4.381.9%79.6%77.5%81.5%47.3%95.8%81.6%92.1%99.6%
5GPT-5.4-Mini80.1%79.1%78.6%81.1%43.9%90.9%79.4%87.6%99.9%
6PaddleOCR-VL*80.0%85.7%71.0%84.1%37.8%97.0%79.9%85.7%98.5%
7Reducto76.2%68.7%68.8%92.5%45.8%79.6%68.0%86.5%99.5%
8DeepSeek-OCR75.7%77.2%73.6%80.2%33.3%96.1%66.4%79.4%99.8%
9Gemini-3-Flash75.3%78.6%70.2%79.8%33.9%93.4%73.1%79.5%94.0%
10Claude-Sonnet-4.673.9%76.4%63.5%73.2%31.7%92.5%69.8%76.1%98.0%
11Mistral OCR72.0%77.2%67.5%60.6%29.3%93.6%71.3%77.1%99.4%

* Self-reported score from the model's own announcement.

Task details

How each model performs on individual olmOCR-bench task categories. Each chart describes what the task measures.