Get Started
Examples
Concepts
Resources
Projects
Integrations
API Reference
copy markdown
Interfaze architecture allows you to programmatically run parts of the model or built-in tools without activating the full model making it significantly faster and cheaper.
| Task Name | Description |
|---|---|
ocr | Optical character recognition on images and documents |
object_detection | Detect objects in images |
gui_detection | Detect GUI elements in images |
web_search | Web search |
scraper | Extract structured data from web pages |
speech_to_text | Speech to text transcription |
translate | Translation |
<task>task_name</task>any ot empty schema.Example of system prompt:
<task>web_search</task>OpenAI SDK
Vercel AI SDK
LangChain SDK
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";
const response = await interfaze.chat.completions.create({
model: "interfaze-beta",
messages: [
{
role: "system",
content: "<task>speech_to_text</task>",
},
{
role: "user",
content: [
{ type: "text", text: "Transcribe the audio file https://r2public.jigsawstack.com/interfaze/examples/stt_long_audio_sample_3.mp3" },
],
},
],
response_format: zodResponseFormat(z.any(), "empty_schema"),
});
console.log(response.choices[0].message.content);Output
name of the task and the raw resultresult schema is different depending on the taskThe output is truncated for this example.
Each task uses the same structure shown above — set the system prompt to <task>task_name</task> and pass an empty (any) response format. The user message carries the input for the task.
ocr)Extract text from images, scanned documents and PDFs. Learn more.
OpenAI SDK
Vercel AI SDK
LangChain SDK
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";
const response = await interfaze.chat.completions.create({
model: "interfaze-beta",
messages: [
{ role: "system", content: "<task>ocr</task>" },
{
role: "user",
content: [
{ type: "text", text: "Extract all text from this ID" },
{
type: "image_url",
image_url: { url: "https://r2public.jigsawstack.com/interfaze/examples/id.jpg" },
},
],
},
],
response_format: zodResponseFormat(z.any(), "empty_schema"),
});
console.log(response.choices[0].message.content);Output
{
"object": {
"name": "ocr",
"result": {
"extracted_text": "California\nUSA\nDRIVER LICENSE\nDL Y4067081\nCLASS C\nEXP 09/12/2027\nEN MUÑOZ ESTRADA\nFN IVÁN ICHET\n14223 BELGATE ST\nBALDWIN PARK CA 91706\nDOB 09/12/1987\nSEX M HAIR BLK EYES BLK\nHGT 5-02\" WGT 185lb",
"sections": [
{
"text": "DRIVER LICENSE",
"lines": [
{
"text": "DRIVER LICENSE",
"bounds": {
"top_left": { "x": 63, "y": 89 },
"top_right": { "x": 268, "y": 89 },
"bottom_right": { "x": 268, "y": 129 },
"bottom_left": { "x": 63, "y": 129 },
"width": 205,
"height": 40
},
"average_confidence": 0.99
}
]
}
],
"language": "en"
}
}
}The output is truncated for this example.
object_detection)Detect objects in images and return their bounding boxes. Learn more.
OpenAI SDK
Vercel AI SDK
LangChain SDK
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";
const response = await interfaze.chat.completions.create({
model: "interfaze-beta",
messages: [
{ role: "system", content: "<task>object_detection</task>" },
{
role: "user",
content: [
{ type: "text", text: "Get the position of the crane in the image and any text" },
{
type: "image_url",
image_url: { url: "https://r2public.jigsawstack.com/interfaze/examples/construction.png" },
},
],
},
],
response_format: zodResponseFormat(z.any(), "empty_schema"),
});
console.log(response.choices[0].message.content);Output
{
"object": {
"name": "object_detection",
"result": {
"detected_objects": [
{
"bounds": {
"top_left": { "x": 630, "y": 139 },
"top_right": { "x": 769, "y": 139 },
"bottom_left": { "x": 630, "y": 225 },
"bottom_right": { "x": 769, "y": 225 },
"width": 139,
"height": 86
},
"label": "crane"
}
],
"gui_elements": [
{
"type": "text",
"bounds": {
"top_left": { "x": 1140, "y": 722 },
"top_right": { "x": 1232, "y": 722 },
"bottom_left": { "x": 1140, "y": 752 },
"bottom_right": { "x": 1232, "y": 752 },
"width": 92,
"height": 30
},
"interactivity": false,
"content": "tower"
}
]
}
}
}gui_detection)Detect interactive UI elements in screenshots — useful for computer-use and agent workflows. Learn more.
OpenAI SDK
Vercel AI SDK
LangChain SDK
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";
const response = await interfaze.chat.completions.create({
model: "interfaze-beta",
messages: [
{ role: "system", content: "<task>gui_detection</task>" },
{
role: "user",
content: [
{ type: "text", text: "Detect all interactive UI elements on this screen" },
{
type: "image_url",
image_url: { url: "https://r2public.jigsawstack.com/interfaze/examples/computer_use.jpg" },
},
],
},
],
response_format: zodResponseFormat(z.any(), "empty_schema"),
});
console.log(response.choices[0].message.content);Output
{
"object": {
"name": "gui_detection",
"result": {
"gui_elements": [
{
"type": "button",
"top_left_x": 1120,
"top_left_y": 18,
"bottom_right_x": 1192,
"bottom_right_y": 44
},
{
"type": "input",
"top_left_x": 312,
"top_left_y": 12,
"bottom_right_x": 692,
"bottom_right_y": 42
},
{
"type": "link",
"top_left_x": 72,
"top_left_y": 64,
"bottom_right_x": 116,
"bottom_right_y": 88
},
{
"type": "dropdown",
"top_left_x": 720,
"top_left_y": 64,
"bottom_right_x": 820,
"bottom_right_y": 90
}
]
}
}
}web_search)Search the web and return ranked results with titles, descriptions and URLs. Learn more.
OpenAI SDK
Vercel AI SDK
LangChain SDK
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";
const response = await interfaze.chat.completions.create({
model: "interfaze-beta",
messages: [
{ role: "system", content: "<task>web_search</task>" },
{ role: "user", content: "GLP-1 research paper" },
],
response_format: zodResponseFormat(z.any(), "empty_schema"),
});
console.log(response.choices[0].message.content);Output
{
"object": {
"name": "web_search",
"result": [
{
"title": "Glucagon-like peptide 1 (GLP-1) - PubMed",
"description": "The glucagon-like peptide-1 (GLP-1) is a multifaceted hormone with broad pharmacological potential.",
"content": "The glucagon-like peptide-1 (GLP-1) is a multifaceted hormone with broad pharmacological potential.",
"url": "https://pubmed.ncbi.nlm.nih.gov/31767182/"
},
{
"title": "Mapping the effectiveness and risks of GLP-1 receptor agonists - PubMed",
"description": "Glucagon-like peptide 1 receptor agonists (GLP-1RAs) are increasingly being used to treat diabetes and obesity.",
"content": "Glucagon-like peptide 1 receptor agonists (GLP-1RAs) are increasingly being used to treat diabetes and obesity.",
"url": "https://pubmed.ncbi.nlm.nih.gov/39833406/"
}
]
}
}scraper)Extract structured content from a URL. Learn more.
OpenAI SDK
Vercel AI SDK
LangChain SDK
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";
const response = await interfaze.chat.completions.create({
model: "interfaze-beta",
messages: [
{ role: "system", content: "<task>scraper</task>" },
{ role: "user", content: "Extract post titles and points from https://news.ycombinator.com" },
],
response_format: zodResponseFormat(z.any(), "empty_schema"),
});
console.log(response.choices[0].message.content);Output
{
"object": {
"name": "ai_scraper",
"result": {
"scraped_content": {
"title": ["Google releases Gemma 4 open models", "Tailscale's new macOS home", "Cursor 3", "Artemis II's toilet is a moon mission milestone"],
"points": ["962 points", "238 points", "221 points", "67 points"]
},
"scraped_elements": [
{
"selector": "#hnmain .hnname a",
"results": [
{
"html": "Hacker News",
"text": "Hacker News",
"attributes": [{ "name": "href", "value": "news" }]
}
],
"key": "title"
}
]
}
}
}The output is truncated for this example.
translate)Translate text between languages with context-aware accuracy. Learn more.
OpenAI SDK
Vercel AI SDK
LangChain SDK
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";
const response = await interfaze.chat.completions.create({
model: "interfaze-beta",
messages: [
{ role: "system", content: "<task>translate</task>" },
{
role: "user",
content:
"Translate the following text into French: 'The UK drinks about 100–160 million cups of tea every day, and 98% of tea drinkers add milk to their tea.'",
},
],
response_format: zodResponseFormat(z.any(), "empty_schema"),
});
console.log(response.choices[0].message.content);Output
{
"object": {
"name": "translate",
"result": {
"translated_text": "Le Royaume-Uni boit environ 100–160 millions de tasses de thé chaque jour, et 98 % des buveurs de thé ajoutent du lait à leur thé.",
"source_language": "en",
"target_language": "fr"
}
}
}<task> tag is parsed from the system message (the first match). One task can only be run at a time.<task> tag, it will result in 400 status code error.