Interfaze

logo

Beta

pricing

docs

blog

sign in

All leaderboards

Object detection (NL prompts)

RefCOCO

Visual grounding: given a free-form natural-language description, the model must return the exact bounding box of the object referenced — not classify it, but locate it.

Acc@0.5 — fraction of predicted bounding boxes whose IoU with the ground truth is ≥ 0.5. Higher is better. Hover a bar to reveal the exact score.

Model rankings

Scores

Every model evaluated on RefCOCO, ranked highest to lowest.

#ModelScore
1Interfaze82.1%
2Claude-Sonnet-4.675.5%
3Gemini-3-Flash75.2%
4GPT-5.4-Mini67.0%
5Grok-4.325.0%