Interfaze

logo

Beta

pricing

docs

blog

sign in

GLM OCR 4bit

GLM OCR 4bit by mlx-community, a image-to-text model with OCR, multimodal capabilities. Understand and compare OCR, multimodal features, benchmarks, and capabilities.

Comparison

FeatureGLM OCR 4bitInterfaze
Input Modalities

image, document

image, text, audio, video, document

Native OCRYesYes
Long Document ProcessingNoYes
Language Support

8 partial

162+

Native Speech-to-TextNoYes
Native Object DetectionNoYes
Guardrail ControlsNoYes
Context Input Size

unknown

1M

Tool CallingNo

Tool calling supported + built in browser, code execution and web search

OCR Capabilities

FeatureGLM OCR 4bitInterfaze
Text Bounding BoxesNoYes
Confidence ScoresNoYes
Dense Image ProcessingNoYes
Low Quality ImagesNoYes
Handwritten TextNoYes
Charts, Tables & EquationsNoYes

Scaling

FeatureGLM OCR 4bitInterfaze
Scaling

Self-hosted/Provider-hosted with quantization

Unlimited

View model card on Hugging Face

This model was converted to MLX format from zai-org/GLM-OCR using mlx-vlm version 0.3.10. Refer to the original model card for more details on the model.

Use with mlx

pip install -U mlx-vlm
python -m mlx_vlm.generate --model mlx-community/GLM-OCR-4bit --max-tokens 100 --temperature 0.0 --prompt "Describe this image." --image <path_to_image>

Want more deterministic results?