Get Started
Examples
Concepts
Resources
Projects
Integrations
copy markdown
When Interfaze processes a request, specialized parts of the models handle tasks like OCR, speech-to-text, and translation before generating the final response. The raw output from these are returned as precontext, giving you structured metadata like confidence scores, bounding boxes, timestamps, and more.
Use precontext to validate accuracy, build conditional logic around confidence thresholds and dive deeper into the data.
precontext is available when the model performs any of these tasks:
| Task | name in precontext | What it returns |
|---|---|---|
| OCR | ocr | Extracted text, lines, words, bounding boxes, confidence scores |
| Object Detection | object_detection | Detected objects with labels, bounding boxes, confidence scores |
| Speech-to-Text | stt | Transcribed text with timestamped chunks |
| Translation | translate | Translated text, source/target languages |
| Web Search | web_search | Search results with sources |
| Scraping | scraper | Extracted structured data from web pages |
| Code Sandboxing | code_sandbox | Execution output |
| Guardrails | guardrails | Safety analysis results |
If you only need the precontext data without a final model response, check out run tasks.
The precontext field lives inside the chat completion response body. It is an array of objects, where each object represents one task the model performed.
{
"precontext": [
{
"name": "ocr",
"result": { ... }
},
{
"name": "translate",
"result": { ... }
}
]
}name: The task that produced this result (e.g., ocr, stt, translate).result: The structured metadata for that task. Each task type has a fixed, predictable schema.precontext will contain multiple items with the same name.When the model performs OCR (e.g., reading a driver's license), the precontext includes the full extracted text along with per-line and per-word bounding boxes and confidence scores.
{
"precontext": [
{
"name": "ocr",
"result": {
"extracted_text": "California\nUSA\nDRIVER LICENSE\nDL Y4067081\n...",
"sections": [
{
"text": "California\nUSA\nDRIVER LICENSE\n...",
"lines": [
{
"text": "California",
"bounds": {
"top_left": { "x": 63, "y": 89 },
"top_right": { "x": 268, "y": 89 },
"bottom_right": { "x": 268, "y": 129 },
"bottom_left": { "x": 63, "y": 129 },
"width": 205,
"height": 40
},
"average_confidence": 0.99,
"words": [
{
"text": "California",
"bounds": {
"top_left": { "x": 64, "y": 90 },
"top_right": { "x": 267, "y": 89 },
"bottom_right": { "x": 267, "y": 130 },
"bottom_left": { "x": 63, "y": 130 },
"width": 203.5,
"height": 40.5
},
"confidence": 0.99
}
]
},
{
"text": "DL Y4067081",
"bounds": { "...": "..." },
"average_confidence": 0.93,
"words": [
{ "text": "DL", "confidence": 0.92, "bounds": { "...": "..." } },
{ "text": "Y4067081", "confidence": 0.95, "bounds": { "...": "..." } }
]
}
]
}
],
"width": 698,
"height": 525
}
}
]
}Each OCR result contains:
extracted_text: The full text extracted from the image.sections: An array of text sections found in the image.lines: Each line includes bounds (bounding box coordinates), average_confidence, and a words array.words: Each word includes its own bounds and confidence score (0 to 1).width/height: The dimensions of the source image in pixels.This example transcribes an audio file and translates the result to Chinese. The precontext array contains two entries: one for stt and one for translate.
Note: LangChain SDK does not provide an official way to access raw API responses, so precontext is not available for that SDK.
OpenAI SDK
Vercel AI SDK
LangChain SDK
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";
const STTSchema = z.object({
translated_text: z.string().describe("translated text"),
original_language_code: z.string(),
translated_language_code: z.string(),
});
const response = await interfaze.chat.completions.create({
model: "interfaze-beta",
messages: [
{
role: "user",
content: "Transcribe the audio file and translate it to chinese https://r2public.jigsawstack.com/interfaze/examples/stt_medical_short.mp4",
},
],
response_format: zodResponseFormat(STTSchema, "stt_schema"),
});
console.log(response.choices[0].message.content);
//@ts-expect-error precontext is not typed
const precontext = response.precontext;
console.log("STT Results:", precontext?.[0]?.result);The response contains your structured output in object and the raw precontext data in response.body.precontext:
{
"object": {
"translated_text": "我刚开始服用一轮阿莫西林,想问一下把它和我目前的螺内酯处方一起服用是否安全。",
"original_language_code": "en",
"translated_language_code": "zh"
},
"response": {
"id": "interfaze-1775088833045",
"modelId": "interfaze-beta",
"body": {
"precontext": [
{
"name": "stt",
"result": {
"text": "I just started a round of amoxicillin and I wanted to ask if it was safe to take that with my current spironolactone prescription.",
"chunks": [
{
"timestamp": [0.28, 4],
"text": "I just started a round of amoxicillin and I wanted to ask"
},
{
"timestamp": [4, 7.72],
"text": "if it was safe to take that with my current spironolactone prescription."
}
]
}
},
{
"name": "translate",
"result": {
"translated_text": "我刚开始服用一轮阿莫西林,想问一下把它和我目前的螺内酯处方一起服用是否安全。",
"source_language": "auto-detected",
"target_language": "zh",
"batch_size": 1
}
}
]
}
}
}The stt result provides the transcribed text with timestamped chunks (start and end times in seconds). The translate result includes the translated text along with detected source and target languages.
By default, streaming responses do not include precontext. To enable it, add the x-show-additional-info header:
headers: {
"x-show-additional-info": "true"
}With this header, the precontext data is sent as a single chunk before the main response begins streaming. It is wrapped in XML tags:
<precontext>
{
"name": "process_name",
"result": { ... }
}
</precontext>Parse the <precontext> block from the stream before processing the rest of the response tokens.
precontext field is not natively available.If you don't need a final model response and only want the raw task output, use run tasks. It runs a single task without the full model, making it faster and cheaper.