GLM OCR 4bit
GLM OCR 4bit by mlx-community, a image-to-text model with OCR, multimodal capabilities. Understand and compare OCR, multimodal features, benchmarks, and capabilities.
Comparison
| Feature | GLM OCR 4bit | Interfaze |
|---|---|---|
| Input Modalities | image, document | image, text, audio, video, document |
| Native OCR | Yes | Yes |
| Long Document Processing | No | Yes |
| Language Support | 8 partial | 162+ |
| Native Speech-to-Text | No | Yes |
| Native Object Detection | No | Yes |
| Guardrail Controls | No | Yes |
| Context Input Size | unknown | 1M |
| Tool Calling | No | Tool calling supported + built in browser, code execution and web search |
OCR Capabilities
| Feature | GLM OCR 4bit | Interfaze |
|---|---|---|
| Text Bounding Boxes | No | Yes |
| Confidence Scores | No | Yes |
| Dense Image Processing | No | Yes |
| Low Quality Images | No | Yes |
| Handwritten Text | No | Yes |
| Charts, Tables & Equations | No | Yes |
Scaling
| Feature | GLM OCR 4bit | Interfaze |
|---|---|---|
| Scaling | Self-hosted/Provider-hosted with quantization | Unlimited |
View model card on Hugging Face
This model was converted to MLX format from zai-org/GLM-OCR using mlx-vlm version 0.3.10.
Refer to the original model card for more details on the model.
Use with mlx
pip install -U mlx-vlm
python -m mlx_vlm.generate --model mlx-community/GLM-OCR-4bit --max-tokens 100 --temperature 0.0 --prompt "Describe this image." --image <path_to_image>