
A fine-tuned version of the Whisper large-v3 model designed for near real-time Automatic Speech Recognition (ASR) and speech translation. By reducing the number of decoder layers while maintaining the robust encoder architecture, this "Turbo" variant offers a significant speedup (up to 8x faster) with minimal degradation in Word Error Rate (WER). Ideal for low-latency production
Please upload an audio file
This document specifies the API for transcribing audio files using a hosted Whisper model, mimicking the OpenAI Whisper API. The endpoint handles multipart form data for both file uploads and URL-based audio transcription.
| Method | URL | Summary |
|---|---|---|
POST |
/v1/audio/transcriptions |
Transcribe audio using the Whisper model. |
The API uses Bearer Token authentication via the Authorization header. If an API_KEY is set on the server, a valid token must be provided.
| Header | Example | Description |
|---|---|---|
Authorization |
Bearer YOUR_API_KEY |
The API_KEY provided by the server. |
x-request-id |
UUID_string |
An optional unique identifier for the request, for logging and tracking. If not provided, the server will generate one. |
The request must be of type multipart/form-data. It requires a file or a URL and several optional parameters.
| Parameter | Type | Required | Description |
|---|---|---|---|
file |
File or string |
Yes | The audio file to transcribe. It can be a direct file upload or a URL to an audio file. |
model |
string |
Yes | The name of the transcription model. Must match the model name served by the API (e.g., "openai/whisper-large-v3-turbo"). |
response_format |
string |
No | The format of the response. Supported formats are json, text, srt, vtt, and verbose_json. Defaults to json. |
temperature |
number |
No | A value from 0.0 to 1.0 that controls randomness. Defaults to 0.0. |
language |
string |
No | The language of the audio to assist with transcription. |
prompt |
string |
No | An optional initial prompt to guide the model. |
condition_on_previous_text |
boolean |
No | Whether to condition the transcription on previous text. Defaults to True. |
curl --location --request POST 'https://hk-01.bytecompute.ai/v1/audio/transcriptions' \
--header 'Authorization: Bearer ' \
--form 'file=@"path/to/file.aac"' \
--form 'model="openai/whisper-large-v3-turbo"' \
--form 'response_format="verbose_json"'
curl --raw -s \
-H "Authorization: Bearer " \
-F "model=openai/whisper-large-v3-turbo" \
-F "file=https://cdn-global.hellotalk8.com/ht-global-1312929133/mmnt/2/250418/1/148239271/0/0/10e8a25a44ad37efd36155c1f447b0b5.aac" \
-F "response_format=verbose_json" \
"https://hk-01.bytecompute.ai/v1/audio/transcriptions"
200 OK
json format: Returns a JSON object with a single text field.{
"task":"transcribe",
"language":"english",
"text":"You know the rules and so do I. A full commitment's what I'm thinking of",
"segments":[
{
"id":0,
"seek":0,
"start":0.4,
"end":5.3,
"text":"You know the rules and so do I",
"tokens":[
50385,
...
],
"temperature":0.0,
"avg_logprob":-0.18997467888726127,
"compression_ratio":1.054945054945055,
"no_speech_prob":1.1973037145063259e-11
},
...
],
"duration":18.900000000000002,
"usage":{
"type":"duration",
"seconds":18.900000000000002
}
}
| Status Code | Description | Detail |
|---|---|---|
400 Bad Request |
The request is malformed or invalid. | "No header part in the request", "No file or url in the request", "Invalid 'model'", or "Unsupported response_format". |
401 Unauthorized |
The API key is missing or invalid. | "Authorization header is missing or invalid." or "Invalid API key." |
500 Internal Server Error |
A server-side issue occurred during transcription. |
Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.