Interfaze

logo

Beta

pricing

docs

blog

sign in

Get Started

Introduction

Examples

Vision

Concepts

Resources

Projects

Integrations

Precontext

copy markdown

When Interfaze processes a request, specialized parts of the models handle tasks like OCR, speech-to-text, and translation before generating the final response. The raw output from these are returned as precontext, giving you structured metadata like confidence scores, bounding boxes, timestamps, and more.

Use precontext to validate accuracy, build conditional logic around confidence thresholds and dive deeper into the data.

Supported tasks

precontext is available when the model performs any of these tasks:

Taskname in precontextWhat it returns
OCRocrExtracted text, lines, words, bounding boxes, confidence scores
Object Detectionobject_detectionDetected objects with labels, bounding boxes, confidence scores
Speech-to-TextsttTranscribed text with timestamped chunks
TranslationtranslateTranslated text, source/target languages
Web Searchweb_searchSearch results with sources
ScrapingscraperExtracted structured data from web pages
Code Sandboxingcode_sandboxExecution output
GuardrailsguardrailsSafety analysis results

If you only need the precontext data without a final model response, check out run tasks.

Response structure

The precontext field lives inside the chat completion response body. It is an array of objects, where each object represents one task the model performed.

{
  "precontext": [
    {
      "name": "ocr",
      "result": { ... }
    },
    {
      "name": "translate",
      "result": { ... }
    }
  ]
}
  • name: The task that produced this result (e.g., ocr, stt, translate).
  • result: The structured metadata for that task. Each task type has a fixed, predictable schema.
  • Multiple entries: If the model performs the same task more than once, precontext will contain multiple items with the same name.

OCR precontext example

When the model performs OCR (e.g., reading a driver's license), the precontext includes the full extracted text along with per-line and per-word bounding boxes and confidence scores.

{
  "precontext": [
    {
      "name": "ocr",
      "result": {
        "extracted_text": "California\nUSA\nDRIVER LICENSE\nDL Y4067081\n...",
        "sections": [
          {
            "text": "California\nUSA\nDRIVER LICENSE\n...",
            "lines": [
              {
                "text": "California",
                "bounds": {
                  "top_left": { "x": 63, "y": 89 },
                  "top_right": { "x": 268, "y": 89 },
                  "bottom_right": { "x": 268, "y": 129 },
                  "bottom_left": { "x": 63, "y": 129 },
                  "width": 205,
                  "height": 40
                },
                "average_confidence": 0.99,
                "words": [
                  {
                    "text": "California",
                    "bounds": {
                      "top_left": { "x": 64, "y": 90 },
                      "top_right": { "x": 267, "y": 89 },
                      "bottom_right": { "x": 267, "y": 130 },
                      "bottom_left": { "x": 63, "y": 130 },
                      "width": 203.5,
                      "height": 40.5
                    },
                    "confidence": 0.99
                  }
                ]
              },
              {
                "text": "DL Y4067081",
                "bounds": { "...": "..." },
                "average_confidence": 0.93,
                "words": [
                  { "text": "DL", "confidence": 0.92, "bounds": { "...": "..." } },
                  { "text": "Y4067081", "confidence": 0.95, "bounds": { "...": "..." } }
                ]
              }
            ]
          }
        ],
        "width": 698,
        "height": 525
      }
    }
  ]
}

Each OCR result contains:

  • extracted_text: The full text extracted from the image.
  • sections: An array of text sections found in the image.
  • lines: Each line includes bounds (bounding box coordinates), average_confidence, and a words array.
  • words: Each word includes its own bounds and confidence score (0 to 1).
  • width/height: The dimensions of the source image in pixels.

STT & translation precontext example

This example transcribes an audio file and translates the result to Chinese. The precontext array contains two entries: one for stt and one for translate.

Note: LangChain SDK does not provide an official way to access raw API responses, so precontext is not available for that SDK.

OpenAI SDK

Vercel AI SDK

LangChain SDK

import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const STTSchema = z.object({
	translated_text: z.string().describe("translated text"),
	original_language_code: z.string(),
	translated_language_code: z.string(),
});

const response = await interfaze.chat.completions.create({
	model: "interfaze-beta",
	messages: [
		{
			role: "user",
			content: "Transcribe the audio file and translate it to chinese https://r2public.jigsawstack.com/interfaze/examples/stt_medical_short.mp4",
		},
	],
	response_format: zodResponseFormat(STTSchema, "stt_schema"),
});

console.log(response.choices[0].message.content);

//@ts-expect-error precontext is not typed
const precontext = response.precontext;
console.log("STT Results:", precontext?.[0]?.result);

Response

The response contains your structured output in object and the raw precontext data in response.body.precontext:

{
  "object": {
    "translated_text": "我刚开始服用一轮阿莫西林,想问一下把它和我目前的螺内酯处方一起服用是否安全。",
    "original_language_code": "en",
    "translated_language_code": "zh"
  },
  "response": {
    "id": "interfaze-1775088833045",
    "modelId": "interfaze-beta",
    "body": {
      "precontext": [
        {
          "name": "stt",
          "result": {
            "text": "I just started a round of amoxicillin and I wanted to ask if it was safe to take that with my current spironolactone prescription.",
            "chunks": [
              {
                "timestamp": [0.28, 4],
                "text": "I just started a round of amoxicillin and I wanted to ask"
              },
              {
                "timestamp": [4, 7.72],
                "text": "if it was safe to take that with my current spironolactone prescription."
              }
            ]
          }
        },
        {
          "name": "translate",
          "result": {
            "translated_text": "我刚开始服用一轮阿莫西林,想问一下把它和我目前的螺内酯处方一起服用是否安全。",
            "source_language": "auto-detected",
            "target_language": "zh",
            "batch_size": 1
          }
        }
      ]
    }
  }
}

The stt result provides the transcribed text with timestamped chunks (start and end times in seconds). The translate result includes the translated text along with detected source and target languages.

Streaming

By default, streaming responses do not include precontext. To enable it, add the x-show-additional-info header:

headers: {
  "x-show-additional-info": "true"
}

With this header, the precontext data is sent as a single chunk before the main response begins streaming. It is wrapped in XML tags:

<precontext>
{
  "name": "process_name",
  "result": { ... }
}
</precontext>

Parse the <precontext> block from the stream before processing the rest of the response tokens.

Limits

  • LangChain SDK does not provide an official way to access raw API responses, so the precontext field is not natively available.

Only need precontext data?

If you don't need a final model response and only want the raw task output, use run tasks. It runs a single task without the full model, making it faster and cheaper.

Previous

Guardrails

Next

Run Tasks