Interfaze

logo

Beta

pricing

help

docs

blog

sign in

All leaderboards

Text-to-SQL

Spider-2.0-Lite

Natural-language to SQL on real warehouse-scale schemas. The lite track focuses on multi-step queries against a single database, where the model has to pick the right tables, joins, and filters.

Execution accuracy — fraction of generated SQL queries that return the correct result against the live DB. Higher is better. Hover a bar to reveal the exact score.

Model rankings

Scores

Every model evaluated on Spider-2.0-Lite, ranked highest to lowest.

#ModelScore
1Interfaze52.9%
2Claude-Sonnet-4.649.6%
3Gemini-3.5-Flash46.7%
4Grok-4.345.9%
5Gemini-3-Flash45.2%
6GPT-5.4-Mini26.7%