Chat Completion API Reference

Interfaze

Beta

pricing

docs

blog

Get Started

Introduction

Examples

Vision

OCR (Image & Document)

Object Detection

GUI Detection

Web

Scraping

Audio

Speech-to-Text (STT)

Speaker Diarization

Translation

Code Sandboxing

Guardrails

Concepts

Precontext

Run Tasks

Structured Outputs

Reasoning

Streaming

Function Calling

Handling Files

Resources

Lowering costs & improving speed

Limits

Security

Supported Languages

FAQs

Projects

Interfaze as tools

Postgres LLM

Integrations

OpenAI SDK

Vercel AI SDK

Langchain SDK

n8n Integration

API Reference

Chat Completion API

copy markdown

Base URL

https://api.interfaze.ai/v1

Authentication

All requests must be authenticated with an API key passed in the Authorization header as a Bearer token.

Authorization: Bearer <your-api-key>

Create and manage API keys from the dashboard.

Create chat completion

Creates a model response for the given chat conversation.

POST https://api.interfaze.ai/v1/chat/completions

Request headers

Header	Required	Description
`Authorization`	Yes	`Bearer <your-api-key>`
`Content-Type`	Yes	`application/json`
`x-show-additional-info`	No	Set to `true` to include precontext inline at the start of a streamed response. Defaults to `false`.

Example request

cURL

curl https://api.interfaze.ai/v1/chat/completions \
  -H "Authorization: Bearer $INTERFAZE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "interfaze-beta",
    "messages": [
      { "role": "user", "content": "Who is the founder of Interfaze?" }
    ]
  }'

Body parameters

Summary of fields accepted on the request body. Each parameter is fully documented below.

Parameter	Type	Required	Default
`model`	string	Yes	—
`messages`	array	Yes	—
`stream`	boolean	No	`false`
`response_format`	object	No	`{type: "text"}`
`tools`	array	No	—
`tool_choice`	string \| object	No	`"auto"`
`reasoning_effort`	string	No	off
`max_tokens`	integer	No	`32000`
`temperature`	number	No	`1`
`top_p`	number	No	`1`

`model`

Type: string · Required

The id of the model to use. Currently the only supported value is interfaze-beta.

{ "model": "interfaze-beta" }

`messages`

Type: array · Required

A list of messages making up the conversation so far. Each item is a message object.

Field	Type	Required	Description
`role`	string	Yes	One of `system`, `user`, `assistant`, or `tool`.
`content`	string \| array	Yes	Either a plain string or an array of content parts for multimodal input.
`name`	string	No	Optional name of the participant.
`tool_calls`	array	Assistant only	Tool calls the model wants the caller to execute. See Function calling.
`tool_call_id`	string	Tool messages only	The id of the tool call this message is responding to.

{
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Who is the founder of Interfaze?" }
  ]
}

Content parts

When content is an array, each item is a typed part. Mix and match to send multimodal input (text + images + files in a single message).

Text part

Field	Type	Required	Description
`type`	string	Yes	Must be `"text"`.
`text`	string	Yes	The text content.

{
  "type": "text",
  "text": "Extract the details from this ID"
}

Image part

Field	Type	Required	Description
`type`	string	Yes	Must be `"image_url"`.
`image_url`	object	Yes	Wrapper object containing the image source.
`image_url.url`	string	Yes	Publicly accessible URL or base64 data URL (`data:image/jpeg;base64,...`).

{
  "type": "image_url",
  "image_url": {
    "url": "https://r2public.jigsawstack.com/interfaze/examples/id.jpg"
  }
}

File part

For PDFs, audio, video, and other documents.

Field	Type	Required	Description
`type`	string	Yes	Must be `"file"`.
`file`	object	Yes	Wrapper object containing the file source.
`file.filename`	string	Yes	Filename including extension. Used for MIME inference.
`file.file_data`	string	Yes	Publicly accessible URL or base64 data URL (`data:<mime>;base64,...`).

{
  "type": "file",
  "file": {
    "filename": "report.pdf",
    "file_data": "https://example.com/report.pdf"
  }
}

See Handling files for size limits and SDK-specific helpers.

`stream`

Type: boolean · Default: false

If true, partial message deltas are sent as server-sent events as they are generated. The stream terminates with data: [DONE]. See the Streaming section for the chunk format and Streaming for SDK examples.

{ "stream": true }

`response_format`

Type: object · Default: plain text

Constrains the model output to a specific format. Follows the OpenAI structured output specification.

Field	Type	Required	Description
`type`	string	Yes	`"text"` (default) or `"json_schema"`.
`json_schema`	object	When `type` is `"json_schema"`	The JSON schema configuration.

When type is "json_schema", json_schema accepts:

Field	Type	Required	Description
`name`	string	Yes	Identifier for the schema (e.g. `"id_schema"`).
`schema`	object	Yes	A valid JSON Schema definition describing the output shape.
`strict`	boolean	No	Whether to strictly enforce the schema. Defaults to `false`.

{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "id_schema",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "first_name": { "type": "string" },
          "last_name": { "type": "string" }
        },
        "required": ["first_name", "last_name"],
        "additionalProperties": false
      }
    }
  }
}

See Structured Outputs for full examples.

`tools`

Type: array · Optional

A list of tools (functions) the model may call. Follows the OpenAI function calling schema.

Each tool object:

Field	Type	Required	Description
`type`	string	Yes	Always `"function"`.
`function`	object	Yes	The function definition.
`function.name`	string	Yes	Unique function name (snake_case recommended).
`function.description`	string	No	What the function does. Helps the model decide when to call it.
`function.parameters`	object	No	JSON Schema describing the function's arguments.

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_horoscope",
        "description": "Get today's horoscope for an astrological sign.",
        "parameters": {
          "type": "object",
          "properties": {
            "sign": {
              "type": "string",
              "description": "An astrological sign like Taurus or Aquarius"
            }
          },
          "required": ["sign"]
        }
      }
    }
  ]
}

See Function calling for the full multi-turn flow.

`tool_choice`

Type: string | object · Default: "auto"

Controls which (if any) tool the model calls.

Value	Behavior
`"auto"`	The model decides whether and which tool to call.
`"none"`	The model will not call any tool.
`"required"`	The model must call at least one tool.
`{"type": "function", "function": {"name": "<name>"}}`	Forces the model to call the specified function.

{
  "tool_choice": { "type": "function", "function": { "name": "get_horoscope" } }
}

`reasoning_effort`

Type: string · Default: off

Enables extended reasoning. The model spends more compute and thinking tokens before producing a final answer.

Value	Description
`"low"`	Light reasoning pass.
`"medium"`	Moderate reasoning.
`"high"`	Deep reasoning. Recommended for math, science, complex agents.

When set, the response contains a reasoning field with the model's thinking trace. In streaming mode, reasoning tokens stream first inside <think>...</think> tags.

{ "reasoning_effort": "high" }

See Reasoning for details.

`max_tokens`

Type: integer · Default: 32000

The maximum number of tokens to generate in the completion. Hard upper bound of 32,000 tokens. Input tokens + max_tokens cannot exceed the 1M context window.

{ "max_tokens": 1024 }

`temperature`

Type: number · Default: 1

Sampling temperature between 0 and 2. Higher values produce more random output, lower values make the output more focused and deterministic. Use 0 together with a seed for the most reproducible results.

{ "temperature": 0.2 }

`top_p`

Type: number · Default: 1

Nucleus sampling. The model considers only the tokens whose cumulative probability mass is top_p. 0.1 means only the top 10% of probability mass is sampled from. Generally only adjust one of temperature or top_p.

{ "top_p": 0.9 }

System prompt extensions

Interfaze recognizes two special XML-style directives inside the system message that change how the model behaves.

`<task>` — run a single built-in task

Run a specialized part of the model directly without invoking the full LLM. Faster, cheaper, and returns a fixed precontext schema. The response_format must be an open/empty JSON schema (any type).

<task>ocr</task>

Task	Description
`ocr`	Optical character recognition on images and documents
`object_detection`	Detect objects in images
`gui_detection`	Detect GUI elements in images
`web_search`	Web search
`scraper`	Extract structured data from web pages
`speech_to_text`	Speech to text transcription
`translate`	Translation

See Run Tasks.

`<guard>` — content safety guardrails

Block or flag unsafe content matching one or more safety codes.

<guard>S1, S2, S3, S10, S12_IMAGE</guard>

Supported codes: S1, S1_IMAGE, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S12_IMAGE, S13, S14, S15_IMAGE, ALL.

See Guardrails for the full list and behavior.

Response object

A successful non-streaming response returns a chat completion object that follows the OpenAI shape with one Interfaze-specific addition: the top-level precontext field.

Field	Type	Description
`id`	string	Unique identifier for the completion.
`object`	string	Always `chat.completion`.
`model`	string	The model used.
`choices`	array	A list of completion choices. Each contains an `index`, `message`, and `finish_reason`.
`usage`	object	Token usage: `prompt_tokens`, `completion_tokens`, `total_tokens`.
`reasoning`	string	Present when `reasoning_effort` is set. The model's thinking trace.
`precontext`	array	Raw outputs from any internal tasks the model ran (OCR, STT, web search, etc.). See precontext.
`vcache`	boolean	Whether the response was served from the model's verified cache.

Example response

{
  "id": "interfaze-1775270750639",
  "object": "chat.completion",
  "model": "interfaze-beta",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\"name\":\"Yoeven D Khemlani\"}"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10512,
    "completion_tokens": 7056,
    "total_tokens": 17568
  },
  "vcache": false,
  "precontext": [
    {
      "name": "web_search",
      "result": [
        {
          "title": "Interfaze | Y Combinator",
          "url": "https://www.ycombinator.com/companies/interfaze",
          "description": "AI model built for deterministic developer tasks."
        }
      ]
    }
  ]
}

Streaming

Set "stream": true in the request body to receive deltas as server-sent events. Each chunk follows the OpenAI streaming format:

data: {"id":"...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hel"},"finish_reason":null}]}

data: {"id":"...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"lo"},"finish_reason":null}]}

data: [DONE]

Streaming with precontext

By default, streamed responses do not include precontext. To opt in, add the header:

x-show-additional-info: true

When enabled, a single chunk containing the precontext is emitted before the main response begins. It is wrapped in XML tags inside the content delta so you can parse it from the stream:

<precontext>
{
  "name": "ocr",
  "result": { ... }
}
</precontext>

Streaming with reasoning

When reasoning_effort is set and stream is true, reasoning tokens are delivered first, wrapped in <think> tags:

<think>
Thinking step 1...
Thinking step 2...
</think>

Final answer begins here...

See Reasoning and Streaming for more.

Errors

Errors follow the OpenAI error shape and use standard HTTP status codes.

{
  "error": {
    "message": "Invalid API key provided.",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

Status	Type	Meaning
400	`invalid_request_error`	The request body is malformed or missing required fields.
401	`authentication_error`	The API key is missing, invalid, or revoked.
402	`insufficient_quota`	Your account has insufficient credits. Top up from the dashboard.
413	`payload_too_large`	Request body or input file exceeds size limits. See Limits.
429	`rate_limit_error`	You exceeded the rate limit. Retry with exponential backoff.
500	`internal_error`	Server-side failure. Safe to retry.
503	`service_unavailable`	Temporary capacity issue. Retry with backoff.

n8n Integration

Chat Completion API

model

messages

stream

response_format

tools

tool_choice

reasoning_effort

max_tokens

temperature

top_p

<task> — run a single built-in task

<guard> — content safety guardrails