Interfaze

logo

Beta

pricing

docs

blog

sign in

Viki Clip Models

Viki Clip Models by jnurik, a image-to-text model. Understand and compare features, benchmarks, and capabilities.

Comparison

FeatureViki Clip ModelsInterfaze
Input Modalities

image

image, text, audio, video, document

Native OCRNoYes
Long Document ProcessingNoYes
Language Support

unknown

162+

Native Speech-to-TextNoYes
Native Object DetectionNoYes
Guardrail ControlsNoYes
Context Input Size

unknown

1M

Tool CallingNo

Tool calling supported + built in browser, code execution and web search

Scaling

FeatureViki Clip ModelsInterfaze
Scaling

Self-hosted/Provider-hosted with quantization

Unlimited

View model card on Hugging Face

πŸš€ Live App: WikiLens Space

Π­Ρ‚ΠΎΡ‚ Ρ€Π΅ΠΏΠΎΠ·ΠΈΡ‚ΠΎΡ€ΠΈΠΉ содСрТит вСса ΠΈ FAISS-индСксы для прилоТСния WikiLens.

  • Base Model: openai/clip-vit-base-patch32
  • Training Data: 120k Wikipedia photo-article pairs.
  • Methods: Zero-shot, Frozen Encoders, DoRA fine-tuning.

Want more deterministic results?