Trusted by builders at
Chat completion API compatible, works with every AI SDK or framework out of the box
OpenAI SDK
Vercel AI SDK
Langchain SDK
import OpenAI from "openai";
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";
const interfaze = new OpenAI({
baseURL: "https://api.interfaze.ai/v1",
apiKey: "<your-api-key>"
});
const IDSchema = z.object({
first_name: z.string().describe("First name on the ID"),
last_name: z.string().describe("Last name on the ID"),
dob: z.string().describe("Date of birth on the ID"),
driver_licence_number: z.string().describe("Driver licence number on the ID"),
});
const response = await interfaze.chat.completions.create({
model: "interfaze-beta",
messages: [
{
role: "user",
content: [
{ type: "text", text: "Extract the details from this ID" },
{
type: "image_url",
image_url: {
url: "https://r2public.jigsawstack.com/interfaze/examples/id.jpg",
},
},
],
},
],
response_format: zodResponseFormat(IDSchema, "id_schema"),
});
console.log(response.choices[0].message.content);Full breakdown ->
| Benchmark | Interfaze | Gemini-3-Flash | Gemini-3.5-Flash | Claude-Sonnet-4.6 | GPT-5.4-Mini | Grok-4.3 |
|---|---|---|---|---|---|---|
OCRBench V2 Native OCR | 70.7% | 55.8% | 63.9% | 54.7% | 52.7% | 54.7% |
olmOCR Complex document processing | 85.7% | 75.3% | 82.3% | 73.9% | 80.1% | 81.9% |
RefCOCO Object detection (NL prompts) | 82.1% | 75.2% | 80.9% | 75.5% | 67.0% | 25.0% |
VoxPopuli-Cleaned-AA ASR (speech recognition) | 2.4% | 4.0% | 4.0% | — | — | — |
SOB Value Acc Structured output | 80.5% | 77.3% | 80.2% | 77.9% | 75.1% | 78.4% |
Spider-2.0-Lite Text-to-SQL | 52.9% | 45.2% | 46.7% | 49.6% | 26.7% | 45.9% |
GPQA Diamond PhD-level problem solving | 92.4% | 88.5% | 91.4% | 89.9% | 82.8% | 73.6% |
MMMLU Multilingual Q&A | 90.9% | 88.7% | 88.1% | 84.9% | 75.3% | 89.7% |
MMMU-Pro Multimodal understanding | 71.1% | 67.6% | 64.2% | 46.3% | 40.4% | 68.7% |
*Bold cells mark the leader on each benchmark. '—' means the model doesn't support that modality.
OCR docs ->
Data you can verify and build rule based systems on with confidence scores, bounding boxes and more

{
"first_name": {
"value": "WESTON COLE",
"confidence": 0.99,
"bounds": {
"top_left": { "x": 866, "y": 701 },
"bottom_right": { "x": 992, "y": 737 }
}
},
"last_name": {
"value": "BAILEY",
"confidence": 1.0,
"bounds": {
"top_left": { "x": 861, "y": 739 },
"bottom_right": { "x": 991, "y": 774 }
}
},
"age": {
"value": 61,
"note": "Derived from date of birth 05/01/1965 as of 2026-06-28",
"confidence": 0.98,
"bounds": {
"top_left": { "x": 865, "y": 1008 },
"bottom_right": { "x": 1063, "y": 1044 }
}
},
"eye_color": {
"value": "BLU",
"confidence": 1.0,
"bounds": {
"top_left": { "x": 1030, "y": 1078 },
"bottom_right": { "x": 1095, "y": 1111 }
}
}
}Translation docs ->
Extract and understand text, audio, images in over 100+ languages
zh: 英国每天饮用约100–160百万杯茶,有98%的茶饮者在茶中加入牛奶。
hi: यूके हर दिन लगभग 100–160 मिलियन कप चाय पीता है, और 98% चाय पीने वाले अपनी चाय में दूध मिलाते हैं।
es: El Reino Unido bebe alrededor de 100–160 millones de tazas de té cada día, y el 98 % de los consumidores de té añade leche a su té.
fr: Le Royaume-Uni boit environ 100–160 millions de tasses de thé chaque jour, et 98 % des buveurs de thé ajoutent du lait à leur thé.
de: Das Vereinigte Königreich trinkt etwa 100–160 Millionen Tassen Tee pro Tag, und 98 % der Teetrinker fügen ihrem Tee Milch hinzu.
it: Il Regno Unito beve circa 100–160 milioni di tazze di tè ogni giorno e il 98% degli amanti del tè aggiunge latte al proprio tè.
ja: イギリスでは毎日約100~160百万杯の紅茶が飲まれており、紅茶を飲む人の98%が紅茶に牛乳を加えます。
ko: 영국에서는 매일 약 1억 ~ 1억 6천만 잔의 차를 마시며, 차를 마시는 사람의 98%가 차에 우유를 넣습니다.STT docs ->
Compute with Sandboxes and browser the web with headless browserss

Guardrails docs ->
Fully configurable guardrails for text and images
S1: Violent Crimes
S2: Non-Violent Crimes
S3: Sex-Related Crimes
S4: Child Sexual Exploitation
S5: Defamation
S6: Specialized Advice
S7: Privacy
S8: Intellectual Property
S9: Indiscriminate Weapons
S10: Hate
S11: Suicide & Self-Harm
S12: Sexual Content
S12_IMAGE: Sexual Content (Image)
S13: Elections
S14: Code Interpreter Abuse
Read paper ->
This architecture combines a suite of small specialized models supported with custom tools and infrastructure while automatically routing to the best model for the task that prioritizes accuracy and speed.

Context window
1m tokens
Max output tokens
32k tokens
Input modalities
Text, Images, Audio, File, Video
Reasoning
Available
Pricing details ->
Input tokens
$1.50 / MTok
Output tokens
$3.50 / MTok
Caching
Included
Observability & Logging
Coming soon
All FAQs ->
Have more questions? Talk to a founder.
All blogs ->
We are a team of ML, Software and Infrastructure engineers engrossed in the fact that a hybrid model architecture can do a lot more when specialized compared to pure transformer models. Our goals is to make AI available in every dev workflow with no human-in-the-loop.