
A weakly supervised pre-trained version of the Whisper model, optimized for high-speed Automatic Speech Recognition (ASR) and speech translation. By significantly reducing the number of decoder layers to 4 while maintaining the robust large-v3 encoder architecture, this 'Turbo' variant offers an 8.8x speedup compared to large-v3 with minimal degradation in Word Error Rate (WER). It is specifically designed as a high-efficiency alternative for low-latency production environments.
Please upload an audio file
This document specifies the API for transcribing audio files using the hosted Whisper large-v3-turbo model, following the OpenAI-compatible schema. This "Turbo" variant is a weakly supervised pre-trained model featuring a 4-layer decoder, offering 8.8x faster inference than the standard large-v3 while maintaining near-identical accuracy.
| Method | URL | Summary |
|---|---|---|
POST |
/v1/audio/transcriptions |
Transcribe audio/video using the Whisper model. |
The API utilizes Bearer Token authentication. A valid API_KEY must be included in the header for all requests.
| Header | Example | Description |
|---|---|---|
Authorization |
Bearer YOUR_API_KEY |
Your server-provided API Key. |
x-request-id |
UUID_string |
Optional. Unique identifier for tracking. Generated by server if omitted. |
Requests must be sent as multipart/form-data.
| Parameter | Type | Required | Description |
|---|---|---|---|
file |
File / URL |
Yes | The audio file object or a direct URL to an audio/video file (mp3, mp4, mpeg, mpga, m4a, wav, webm, aac). |
model |
string |
Yes | Use "openai/whisper-large-v3-turbo" for high-speed ASR. |
response_format |
string |
No | Formats: json, text, srt, vtt, verbose_json. Default: json. |
temperature |
number |
No | 0.0 to 1.0. Controls randomness. Higher values increase variability. Default: 0.0. |
language |
string |
No | ISO-639-1 code (e.g., en, zh, ja) to improve transcription accuracy. |
prompt |
string |
No | Optional text to guide the model's style or vocabulary. |
timestamp_granularities |
array |
No | Only for verbose_json. Can include word or segment. |
Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.