LightOnOCR 1B 1025
LightOnOCR 1B 1025 by lightonai, a image-to-text model with OCR capabilities. Understand and compare OCR features, benchmarks, and capabilities.
Comparison
| Feature | LightOnOCR 1B 1025 | Interfaze |
|---|---|---|
| Input Modalities | image | image, text, audio, video, document |
| Native OCR | Yes | Yes |
| Long Document Processing | No | Yes |
| Language Support | 9 partial | 162+ |
| Native Speech-to-Text | No | Yes |
| Native Object Detection | No | Yes |
| Guardrail Controls | No | Yes |
| Context Input Size | unknown | 1M |
| Tool Calling | No | Tool calling supported + built in browser, code execution and web search |
OCR Capabilities
| Feature | LightOnOCR 1B 1025 | Interfaze |
|---|---|---|
| Text Bounding Boxes | No | Yes |
| Confidence Scores | No | Yes |
| Dense Image Processing | No | Yes |
| Low Quality Images | No | Yes |
| Handwritten Text | Yes | Yes |
| Charts, Tables & Equations | Yes | Yes |
Scaling
| Feature | LightOnOCR 1B 1025 | Interfaze |
|---|---|---|
| Scaling | Self-hosted/Provider-hosted with quantization | Unlimited |
View model card on Hugging Face
[!NOTE] ๐ LightOnOCR-2 is now available and state-of-the-art on OlmOCR-bench, with new image detection variants! Check it out here: lightonai/LightOnOCR-2-1B
Full BF16 version of the model. We recommend this variant for inference and further fine-tuning.
LightOnOCR-1B is a compact, end-to-end visionโlanguage model for Optical Character Recognition (OCR) and document understanding. It achieves state-of-the-art accuracy in its weight class while being several times faster and cheaper than larger general-purpose VLMs.
๐ [Paper](https://huggingface.co/lightonai/LightOnOCR-1B-1025/blob/main/ https://arxiv.org/pdf/2601.14251) | ๐ Read the full blog post | ๐ Try the demo | ๐ Finetuning notebook
Highlights
- โก Speed: 5ร faster than dots.ocr, 2ร faster than PaddleOCR-VL-0.9B, 1.73ร faster than DeepSeekOCR
- ๐ธ Efficiency: Processes 5.71 pages/s on a single H100 (~493k pages/day) for <$0.01 per 1,000 pages
- ๐ง End-to-End: Fully differentiable, no external OCR pipeline
- ๐งพ Versatile: Handles tables, receipts, forms, multi-column layouts, and math notation
- ๐ Compact variants: 32k and 16k vocab options for European languages
Model Overview
LightOnOCR combines a Vision Transformer encoder(Pixtral-based) with a lightweight text decoder(Qwen3-based) distilled from high-quality open VLMs. It is optimized for document parsing tasks, producing accurate, layout-aware text extraction from high-resolution pages.
Benchmarks
All benchmarks evaluated using vLLM on the Olmo-Bench.
Installation
VLLM
[2025/11/24] ๐ LightOnOCR is now officially supported in vLLM v0.11.1 ๐
uv venv --python 3.12 --seed
source .venv/bin/activate
uv pip install vllm==0.11.2
uv pip install pypdfium2 pillow requests
Start Server
vllm serve lightonai/LightOnOCR-1B-1025 \
--limit-mm-per-prompt '{"image": 1}' --mm-processor-cache-gb 0 --no-enable-prefix-caching
PDF Inference
import base64
import requests
import pypdfium2 as pdfium
import io
ENDPOINT = "http://localhost:8000/v1/chat/completions"
MODEL = "lightonai/LightOnOCR-1B-1025"
pdf_url = "https://arxiv.org/pdf/2412.13663"
pdf_data = requests.get(pdf_url).content
pdf = pdfium.PdfDocument(pdf_data)
page = pdf[0]
pil_image = page.render(scale=2.77).to_pil()
buffer = io.BytesIO()
pil_image.save(buffer, format="PNG")
image_base64 = base64.b64encode(buffer.getvalue()).decode('utf-8')
payload = {
"model": MODEL,
"messages": [{
"role": "user",
"content": [{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{image_base64}"}
}]
}],
"max_tokens": 4096,
"temperature": 0.2,
"top_p": 0.9,
}
response = requests.post(ENDPOINT, json=payload)
text = response.json()['choices'][0]['message']['content']
print(text)
Transformers
Note: LightOnOCR-2 requires transformers installed from source (not yet in a stable release).
uv pip install git+https://github.com/huggingface/transformers
import torch
from transformers import LightOnOcrForConditionalGeneration, LightOnOcrProcessor
device = "mps" if torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float32 if device == "mps" else torch.bfloat16
model = LightOnOcrForConditionalGeneration.from_pretrained("lightonai/LightOnOCR-2-1B-base", torch_dtype=dtype).to(device)
processor = LightOnOcrProcessor.from_pretrained("lightonai/LightOnOCR-2-1B-base")
url = "https://huggingface.co/datasets/hf-internal-testing/fixtures_ocr/resolve/main/SROIE-receipt.jpeg"
conversation = [{"role": "user", "content": [{"type": "image", "url": url}]}]
inputs = processor.apply_chat_template(
conversation,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
)
inputs = {k: v.to(device=device, dtype=dtype) if v.is_floating_point() else v.to(device) for k, v in inputs.items()}
output_ids = model.generate(**inputs, max_new_tokens=1024)
generated_ids = output_ids[0, inputs["input_ids"].shape[1]:]
output_text = processor.decode(generated_ids, skip_special_tokens=True)
print(output_text)
Rendering and Preprocessing Tips
- Render PDFs to PNG or JPEG at a target longest dimension of 1540px
- Maintain aspect ratio to preserve text geometry
- Use one image per page; batching supported by vLLM
Variants
| Variant | Description |
|---|---|
| LightOnOCR-1B-1025 | Full multilingual model (default) |
| LightOnOCR-1B-32k | Fastest pruned-vocabulary version (32k tokens) optimized for European languages |
| LightOnOCR-1B-16k | Most compact variant with smallest vocabulary |
Fine-tuning
Transformers integration is coming soon for training and inference.
LightOnOCR is fully differentiable and supports:
- LoRA fine-tuning
- Domain adaptation (receipts, scientific articles, forms, etc.)
- Multilingual fine-tuning with task-specific corpora
๐ Finetuning notebook
Data
Trained on a diverse large-scale PDF corpus covering:
- Scientific papers, books, receipts, invoices, tables, forms, and handwritten text
- Multiple languages (Latin alphabet dominant)
- Real and synthetic document scans
The dataset will be released under an open license.
License
Apache License 2.0
Citation
@misc{lightonocr2025,
title = {LightOnOCR-1B: End-to-End and Efficient Domain-Specific Vision-Language Models for OCR},
author = {Said Taghadouini and Baptiste Aubertin and Adrien Cavaillรจs},
year = {2025},
howpublished = {\url{https://huggingface.co/blog/lightonai/lightonocr}}
}