Get Started
Examples
Concepts
Resources
Integrations
copy markdown
Transcribe and diarize audio files of multiple speakers and languages at blazing fast speeds.
OpenAI SDK
Vercel AI SDK
LangChain SDK
JSON output
Running STT as a single task with <task>speech_to_text</task> in the system message makes it significantly faster and cheaper with a fixed structured output that's pre-defined.
Learn more about running a task.
OpenAI SDK
Vercel AI SDK
LangChain SDK
Note how the URL is passed in the prompt instead of in the file object. This is another way to pass files to the model which has a marginal speed increase.
JSON output
Translate any audio or text to over 100+ languages while maintaining the original meaning and context.
OpenAI SDK
Vercel AI SDK
LangChain SDK
JSON output
You can reference the precontext to get the raw results from the model for both the STT and translation processes.
Automatically de-noise low-quality enhance audio for better transcription.
OpenAI SDK
Vercel AI SDK
LangChain SDK
JSON output
OpenAI SDK
Vercel AI SDK
LangChain SDK
JSON output
To get the best performance with long audio file is to use run task with the <task>speech_to_text</task> in the system prompt, this only activates a part of the model used for audio.
OpenAI SDK
Vercel AI SDK
LangChain SDK
This took 50s to transcribe a 1hr and 35min audio file.
JSON output
The output is truncated for this example.
Check out how to perform speaker diarization here.