GLM OCR 4bit

GLM OCR 4bit by mlx-community, a image-to-text model with OCR, multimodal capabilities. Understand and compare OCR, multimodal features, benchmarks, and capabilities.

Comparison

Feature	GLM OCR 4bit	Interfaze
Input Modalities	image, document	image, text, audio, video, document
Native OCR	Yes	Yes
Long Document Processing	No	Yes
Language Support	8 partial	162+
Native Speech-to-Text	No	Yes
Native Object Detection	No	Yes
Guardrail Controls	No	Yes
Context Input Size	unknown	1M
Tool Calling	No	Tool calling supported + built in browser, code execution and web search

OCR Capabilities

Feature	GLM OCR 4bit	Interfaze
Text Bounding Boxes	No	Yes
Confidence Scores	No	Yes
Dense Image Processing	No	Yes
Low Quality Images	No	Yes
Handwritten Text	No	Yes
Charts, Tables & Equations	No	Yes

Scaling

Feature	GLM OCR 4bit	Interfaze
Scaling	Self-hosted/Provider-hosted with quantization	Unlimited

View model card on Hugging Face

This model was converted to MLX format from zai-org/GLM-OCR using mlx-vlm version 0.3.10. Refer to the original model card for more details on the model.

Use with mlx

pip install -U mlx-vlm

python -m mlx_vlm.generate --model mlx-community/GLM-OCR-4bit --max-tokens 100 --temperature 0.0 --prompt "Describe this image." --image <path_to_image>

GLM OCR 4bit

Comparison

OCR Capabilities

Scaling

Use with mlx

Want more deterministic results?