
The Whisper large-v3 is a pre-trained model for Automatic Speech Recognition (ASR) and speech translation. It features a robust Transformer encoder-decoder architecture designed for state-of-the-art accuracy across a wide range of languages and audio conditions.
Please upload an audio file
This document specifies the API for transcribing audio files using the hosted Whisper large-v3 model. This endpoint mimics the OpenAI Whisper API while extending support for URL-based transcriptions.
| Method | URL | Summary |
|---|---|---|
POST |
/v1/audio/transcriptions |
Transcribe audio using the Whisper large-v3 model. |
The API uses Bearer Token authentication via the Authorization header. If an API_KEY is set on the server, a valid token must be provided.
| Header | Example | Description |
|---|---|---|
Authorization |
Bearer YOUR_API_KEY |
The API_KEY provided by the server. |
x-request-id |
UUID_string |
An optional unique identifier for request tracking. |
The request must be of type multipart/form-data. It requires a file or a URL and several optional parameters.
| Parameter | Type | Required | Description |
|---|---|---|---|
file |
File or string |
Yes | The audio file to transcribe. It can be a direct file upload or a URL to an audio file. |
model |
string |
Yes | Use"openai/whisper-large-v3". |
response_format |
string |
No | json(default), text, srt, vtt, or verbose_json. |
temperature |
number |
No | 0.0 to 1.0. Defaults to 0.0. |
language |
string |
No | ISO-639-1 code. |
prompt |
string |
No | Text to guide the model's style or continue a previous segment. |
curl --location --request POST 'https://us-01.bytecompute.ai/v1/audio/transcriptions' \
--header 'Authorization: Bearer ' \
--form 'file=@"path/to/file.aac"' \
--form 'model="openai/whisper-large-v3"' \
--form 'response_format="verbose_json"'
curl --raw -s \
-H "Authorization: Bearer " \
-F "model=openai/whisper-large-v3" \
-F "file=https://cdn-global.hellotalk8.com/ht-global-1312929133/mmnt/2/250418/1/148239271/0/0/10e8a25a44ad37efd36155c1f447b0b5.aac" \
-F "response_format=verbose_json" \
"https://us-01.bytecompute.ai/v1/audio/transcriptions"
200 OK
json format: Returns a JSON object with a single text field.{
"task":"transcribe",
"language":"english",
"text":"You know the rules and so do I. A full commitment's what I'm thinking of",
"segments":[
{
"id":0,
"seek":0,
"start":0.4,
"end":5.3,
"text":"You know the rules and so do I",
"tokens":[
50385,
...
],
"temperature":0.0,
"avg_logprob":-0.18997467888726127,
"compression_ratio":1.054945054945055,
"no_speech_prob":1.1973037145063259e-11
},
...
],
"duration":18.900000000000002,
"usage":{
"type":"duration",
"seconds":18.900000000000002
}
}
| Status Code | Description | Detail |
|---|---|---|
400 Bad Request |
The request is malformed or invalid. | "No header part in the request", "No file or url in the request", "Invalid 'model'", or "Unsupported response_format". |
401 Unauthorized |
The API key is missing or invalid. | "Authorization header is missing or invalid." or "Invalid API key." |
500 Internal Server Error |
A server-side issue occurred during transcription. |
Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.