• featured openai/whisper-large-v3-turbo Robot

openai/whisper-large-v3-turbo

A fine-tuned version of the Whisper large-v3 model designed for near real-time Automatic Speech Recognition (ASR) and speech translation. By reducing the number of decoder layers while maintaining the robust encoder architecture, this "Turbo" variant offers a significant speedup (up to 8x faster) with minimal degradation in Word Error Rate (WER). Ideal for low-latency production

$0.00045 per minute

Input

You need to login to use this model.

Please upload an audio file

Output

API Documentation: Whisper Audio Transcription 🎙️

This document specifies the API for transcribing audio files using a hosted Whisper model, mimicking the OpenAI Whisper API. The endpoint handles multipart form data for both file uploads and URL-based audio transcription.


Endpoint

Method URL Summary
POST /v1/audio/transcriptions Transcribe audio using the Whisper model.

Authentication

The API uses Bearer Token authentication via the Authorization header. If an API_KEY is set on the server, a valid token must be provided.

Header Example Description
Authorization Bearer YOUR_API_KEY The API_KEY provided by the server.
x-request-id UUID_string An optional unique identifier for the request, for logging and tracking. If not provided, the server will generate one.

Request

The request must be of type multipart/form-data. It requires a file or a URL and several optional parameters.

Parameter Type Required Description
file File or string Yes The audio file to transcribe. It can be a direct file upload or a URL to an audio file.
model string Yes The name of the transcription model. Must match the model name served by the API (e.g., "openai/whisper-large-v3-turbo").
response_format string No The format of the response. Supported formats are json, text, srt, vtt, and verbose_json. Defaults to json.
temperature number No A value from 0.0 to 1.0 that controls randomness. Defaults to 0.0.
language string No The language of the audio to assist with transcription.
prompt string No An optional initial prompt to guide the model.
condition_on_previous_text boolean No Whether to condition the transcription on previous text. Defaults to True.

Example Requests

1. Transcribing a File Upload

bash Copy
curl --location --request POST 'https://hk-01.bytecompute.ai/v1/audio/transcriptions' \
--header 'Authorization: Bearer ' \
--form 'file=@"path/to/file.aac"' \
--form 'model="openai/whisper-large-v3-turbo"' \
--form 'response_format="verbose_json"'

2. Transcribing a URL

bash Copy
curl --raw -s \
     -H "Authorization: Bearer " \
     -F "model=openai/whisper-large-v3-turbo" \
     -F "file=https://cdn-global.hellotalk8.com/ht-global-1312929133/mmnt/2/250418/1/148239271/0/0/10e8a25a44ad37efd36155c1f447b0b5.aac" \
     -F "response_format=verbose_json" \
     "https://hk-01.bytecompute.ai/v1/audio/transcriptions"

Response

Successful Responses

  • 200 OK
    • json format: Returns a JSON object with a single text field.
json Copy
{
  "task":"transcribe",
  "language":"english",
  "text":"You know the rules and so do I. A full commitment's what I'm thinking of",
  "segments":[
    {
      "id":0,
      "seek":0,
      "start":0.4,
      "end":5.3,
      "text":"You know the rules and so do I",
      "tokens":[
        50385,
        ...
      ],
      "temperature":0.0,
      "avg_logprob":-0.18997467888726127,
      "compression_ratio":1.054945054945055,
      "no_speech_prob":1.1973037145063259e-11
    },
    ...
  ],
  "duration":18.900000000000002,
  "usage":{
    "type":"duration",
    "seconds":18.900000000000002
  }
}

Error Responses

Status Code Description Detail
400 Bad Request The request is malformed or invalid. "No header part in the request", "No file or url in the request", "Invalid 'model'", or "Unsupported response_format".
401 Unauthorized The API key is missing or invalid. "Authorization header is missing or invalid." or "Invalid API key."
500 Internal Server Error A server-side issue occurred during transcription.

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.

Contact Sales