Viki Clip Models
Viki Clip Models by jnurik, a image-to-text model. Understand and compare features, benchmarks, and capabilities.
Comparison
| Feature | Viki Clip Models | Interfaze |
|---|---|---|
| Input Modalities | image | image, text, audio, video, document |
| Native OCR | No | Yes |
| Long Document Processing | No | Yes |
| Language Support | unknown | 162+ |
| Native Speech-to-Text | No | Yes |
| Native Object Detection | No | Yes |
| Guardrail Controls | No | Yes |
| Context Input Size | unknown | 1M |
| Tool Calling | No | Tool calling supported + built in browser, code execution and web search |
Scaling
| Feature | Viki Clip Models | Interfaze |
|---|---|---|
| Scaling | Self-hosted/Provider-hosted with quantization | Unlimited |
View model card on Hugging Face
π Live App: WikiLens Space
ΠΡΠΎΡ ΡΠ΅ΠΏΠΎΠ·ΠΈΡΠΎΡΠΈΠΉ ΡΠΎΠ΄Π΅ΡΠΆΠΈΡ Π²Π΅ΡΠ° ΠΈ FAISS-ΠΈΠ½Π΄Π΅ΠΊΡΡ Π΄Π»Ρ ΠΏΡΠΈΠ»ΠΎΠΆΠ΅Π½ΠΈΡ WikiLens.
- Base Model:
openai/clip-vit-base-patch32 - Training Data: 120k Wikipedia photo-article pairs.
- Methods: Zero-shot, Frozen Encoders, DoRA fine-tuning.