Interfaze

logo

Beta

pricing

docs

blog

sign in

All leaderboards

Text-to-SQL

Spider-2.0-Lite

Natural-language to SQL on real warehouse-scale schemas. The lite track focuses on multi-step queries against a single database, where the model has to pick the right tables, joins, and filters.

Execution accuracy — fraction of generated SQL queries that return the correct result against the live DB. Higher is better. Hover a bar to reveal the exact score.

Model rankings

Scores

Every model evaluated on Spider-2.0-Lite, ranked highest to lowest.

#ModelScore
1Interfaze52.9%
2Claude-Sonnet-4.649.6%
3Grok-4.345.9%
4Gemini-3-Flash45.2%
5GPT-5.4-Mini26.7%