Interfaze

logo

Beta

pricing

help

docs

blog

sign in

Get Started

Introduction

Examples

Vision

Concepts

Resources

Projects

Integrations

API Reference

Run Tasks

copy markdown

Interfaze architecture allows you to programmatically run parts of the model or built-in tools without activating the full model making it significantly faster and cheaper.

Available tasks

Task NameDescription
ocrOptical character recognition on images and documents
object_detectionDetect objects in images
gui_detectionDetect GUI elements in images
web_searchWeb search
scraperExtract structured data from web pages
speech_to_textSpeech to text transcription
translateTranslation

Limits

  • Only one task can be run at a time.
  • The structured output is fixed for the task and cannot be customized.

How to run a task

  • The system system prompt has to contain the task name in the format <task>task_name</task>
  • The structured output response format has to be a type of any ot empty schema.

Example of system prompt:

<task>web_search</task>

Example of running a task

OpenAI SDK

Vercel AI SDK

LangChain SDK

import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const response = await interfaze.chat.completions.create({
	model: "interfaze-beta",
	messages: [
		{
			role: "system",
			content: "<task>speech_to_text</task>",
		},
		{
			role: "user",
			content: [
				{ type: "text", text: "Transcribe the audio file https://r2public.jigsawstack.com/interfaze/examples/stt_long_audio_sample_3.mp3" },
			],
		},
	],
	response_format: zodResponseFormat(z.any(), "empty_schema"),
});

console.log(response.choices[0].message.content);

Output

  • The output will always be a structured output with name of the task and the raw result
  • The result schema is different depending on the task
  • The result is the raw result of the specific model layer or tool
  • Each task will have a consistent structure on evert run

The output is truncated for this example.

Examples for other tasks

Each task uses the same structure shown above — set the system prompt to <task>task_name</task> and pass an empty (any) response format. The user message carries the input for the task.

OCR (ocr)

Extract text from images, scanned documents and PDFs. Learn more.

OpenAI SDK

Vercel AI SDK

LangChain SDK

import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const response = await interfaze.chat.completions.create({
	model: "interfaze-beta",
	messages: [
		{ role: "system", content: "<task>ocr</task>" },
		{
			role: "user",
			content: [
				{ type: "text", text: "Extract all text from this ID" },
				{
					type: "image_url",
					image_url: { url: "https://r2public.jigsawstack.com/interfaze/examples/id.jpg" },
				},
			],
		},
	],
	response_format: zodResponseFormat(z.any(), "empty_schema"),
});

console.log(response.choices[0].message.content);

Output

{
  "object": {
    "name": "ocr",
    "result": {
      "extracted_text": "California\nUSA\nDRIVER LICENSE\nDL Y4067081\nCLASS C\nEXP 09/12/2027\nEN MUÑOZ ESTRADA\nFN IVÁN ICHET\n14223 BELGATE ST\nBALDWIN PARK CA 91706\nDOB 09/12/1987\nSEX M HAIR BLK EYES BLK\nHGT 5-02\" WGT 185lb",
      "sections": [
        {
          "text": "DRIVER LICENSE",
          "lines": [
            {
              "text": "DRIVER LICENSE",
              "bounds": {
                "top_left": { "x": 63, "y": 89 },
                "top_right": { "x": 268, "y": 89 },
                "bottom_right": { "x": 268, "y": 129 },
                "bottom_left": { "x": 63, "y": 129 },
                "width": 205,
                "height": 40
              },
              "average_confidence": 0.99
            }
          ]
        }
      ],
      "language": "en"
    }
  }
}

The output is truncated for this example.

Object detection (object_detection)

Detect objects in images and return their bounding boxes. Learn more.

OpenAI SDK

Vercel AI SDK

LangChain SDK

import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const response = await interfaze.chat.completions.create({
	model: "interfaze-beta",
	messages: [
		{ role: "system", content: "<task>object_detection</task>" },
		{
			role: "user",
			content: [
				{ type: "text", text: "Get the position of the crane in the image and any text" },
				{
					type: "image_url",
					image_url: { url: "https://r2public.jigsawstack.com/interfaze/examples/construction.png" },
				},
			],
		},
	],
	response_format: zodResponseFormat(z.any(), "empty_schema"),
});

console.log(response.choices[0].message.content);

Output

{
  "object": {
    "name": "object_detection",
    "result": {
      "detected_objects": [
        {
          "bounds": {
            "top_left": { "x": 630, "y": 139 },
            "top_right": { "x": 769, "y": 139 },
            "bottom_left": { "x": 630, "y": 225 },
            "bottom_right": { "x": 769, "y": 225 },
            "width": 139,
            "height": 86
          },
          "label": "crane"
        }
      ],
      "gui_elements": [
        {
          "type": "text",
          "bounds": {
            "top_left": { "x": 1140, "y": 722 },
            "top_right": { "x": 1232, "y": 722 },
            "bottom_left": { "x": 1140, "y": 752 },
            "bottom_right": { "x": 1232, "y": 752 },
            "width": 92,
            "height": 30
          },
          "interactivity": false,
          "content": "tower"
        }
      ]
    }
  }
}

GUI detection (gui_detection)

Detect interactive UI elements in screenshots — useful for computer-use and agent workflows. Learn more.

OpenAI SDK

Vercel AI SDK

LangChain SDK

import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const response = await interfaze.chat.completions.create({
	model: "interfaze-beta",
	messages: [
		{ role: "system", content: "<task>gui_detection</task>" },
		{
			role: "user",
			content: [
				{ type: "text", text: "Detect all interactive UI elements on this screen" },
				{
					type: "image_url",
					image_url: { url: "https://r2public.jigsawstack.com/interfaze/examples/computer_use.jpg" },
				},
			],
		},
	],
	response_format: zodResponseFormat(z.any(), "empty_schema"),
});

console.log(response.choices[0].message.content);

Output

{
  "object": {
    "name": "gui_detection",
    "result": {
      "gui_elements": [
        {
          "type": "button",
          "top_left_x": 1120,
          "top_left_y": 18,
          "bottom_right_x": 1192,
          "bottom_right_y": 44
        },
        {
          "type": "input",
          "top_left_x": 312,
          "top_left_y": 12,
          "bottom_right_x": 692,
          "bottom_right_y": 42
        },
        {
          "type": "link",
          "top_left_x": 72,
          "top_left_y": 64,
          "bottom_right_x": 116,
          "bottom_right_y": 88
        },
        {
          "type": "dropdown",
          "top_left_x": 720,
          "top_left_y": 64,
          "bottom_right_x": 820,
          "bottom_right_y": 90
        }
      ]
    }
  }
}

Search the web and return ranked results with titles, descriptions and URLs. Learn more.

OpenAI SDK

Vercel AI SDK

LangChain SDK

import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const response = await interfaze.chat.completions.create({
	model: "interfaze-beta",
	messages: [
		{ role: "system", content: "<task>web_search</task>" },
		{ role: "user", content: "GLP-1 research paper" },
	],
	response_format: zodResponseFormat(z.any(), "empty_schema"),
});

console.log(response.choices[0].message.content);

Output

{
  "object": {
    "name": "web_search",
    "result": [
      {
        "title": "Glucagon-like peptide 1 (GLP-1) - PubMed",
        "description": "The glucagon-like peptide-1 (GLP-1) is a multifaceted hormone with broad pharmacological potential.",
        "content": "The glucagon-like peptide-1 (GLP-1) is a multifaceted hormone with broad pharmacological potential.",
        "url": "https://pubmed.ncbi.nlm.nih.gov/31767182/"
      },
      {
        "title": "Mapping the effectiveness and risks of GLP-1 receptor agonists - PubMed",
        "description": "Glucagon-like peptide 1 receptor agonists (GLP-1RAs) are increasingly being used to treat diabetes and obesity.",
        "content": "Glucagon-like peptide 1 receptor agonists (GLP-1RAs) are increasingly being used to treat diabetes and obesity.",
        "url": "https://pubmed.ncbi.nlm.nih.gov/39833406/"
      }
    ]
  }
}

Scraper (scraper)

Extract structured content from a URL. Learn more.

OpenAI SDK

Vercel AI SDK

LangChain SDK

import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const response = await interfaze.chat.completions.create({
	model: "interfaze-beta",
	messages: [
		{ role: "system", content: "<task>scraper</task>" },
		{ role: "user", content: "Extract post titles and points from https://news.ycombinator.com" },
	],
	response_format: zodResponseFormat(z.any(), "empty_schema"),
});

console.log(response.choices[0].message.content);

Output

{
  "object": {
    "name": "ai_scraper",
    "result": {
      "scraped_content": {
        "title": ["Google releases Gemma 4 open models", "Tailscale's new macOS home", "Cursor 3", "Artemis II's toilet is a moon mission milestone"],
        "points": ["962 points", "238 points", "221 points", "67 points"]
      },
      "scraped_elements": [
        {
          "selector": "#hnmain .hnname a",
          "results": [
            {
              "html": "Hacker News",
              "text": "Hacker News",
              "attributes": [{ "name": "href", "value": "news" }]
            }
          ],
          "key": "title"
        }
      ]
    }
  }
}

The output is truncated for this example.

Translate (translate)

Translate text between languages with context-aware accuracy. Learn more.

OpenAI SDK

Vercel AI SDK

LangChain SDK

import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const response = await interfaze.chat.completions.create({
	model: "interfaze-beta",
	messages: [
		{ role: "system", content: "<task>translate</task>" },
		{
			role: "user",
			content:
				"Translate the following text into French: 'The UK drinks about 100–160 million cups of tea every day, and 98% of tea drinkers add milk to their tea.'",
		},
	],
	response_format: zodResponseFormat(z.any(), "empty_schema"),
});

console.log(response.choices[0].message.content);

Output

{
  "object": {
    "name": "translate",
    "result": {
      "translated_text": "Le Royaume-Uni boit environ 100–160 millions de tasses de thé chaque jour, et 98 % des buveurs de thé ajoutent du lait à leur thé.",
      "source_language": "en",
      "target_language": "fr"
    }
  }
}

Common issues faced

  • Only one <task> tag is parsed from the system message (the first match). One task can only be run at a time.
  • If a non-empty schema is provided alongside a <task> tag, it will result in 400 status code error.
  • The response is the raw task output, not a natural language summary — plan your downstream processing accordingly.

Previous

Precontext

Next

Structured Outputs