openai/whisper-large-v3 Robot

Whisper Large V3

The Whisper large-v3 is a pre-trained model for Automatic Speech Recognition (ASR) and speech translation. It features a robust Transformer encoder-decoder architecture designed for state-of-the-art accuracy across a wide range of languages and audio conditions.

$0.0015/MINUTE

Input

Please upload an audio file

Output

API Documentation: Whisper Audio Transcription 🎙️

This document specifies the API for transcribing audio files using the hosted Whisper large-v3 model. This endpoint mimics the OpenAI Whisper API while extending support for URL-based transcriptions.


Endpoint

Method URL Summary
POST /v1/audio/transcriptions Transcribe audio using the Whisper large-v3 model.

Authentication

The API uses Bearer Token authentication via the Authorization header. If an API_KEY is set on the server, a valid token must be provided.

Header Example Description
Authorization Bearer YOUR_API_KEY The API_KEY provided by the server.
x-request-id UUID_string An optional unique identifier for request tracking.

Request

The request must be of type multipart/form-data. It requires a file or a URL and several optional parameters.

Parameter Type Required Description
file File or string Yes The audio file to transcribe. It can be a direct file upload or a URL to an audio file.
model string Yes Use"openai/whisper-large-v3".
response_format string No json(default), text, srt, vtt, or verbose_json.
temperature number No 0.0 to 1.0. Defaults to 0.0.
language string No ISO-639-1 code.
prompt string No Text to guide the model's style or continue a previous segment.

Example Requests

1. Transcribing a File Upload

bash Copy
curl --location --request POST 'https://us-01.bytecompute.ai/v1/audio/transcriptions' \
--header 'Authorization: Bearer ' \
--form 'file=@"path/to/file.aac"' \
--form 'model="openai/whisper-large-v3"' \
--form 'response_format="verbose_json"'

2. Transcribing a URL

bash Copy
curl --raw -s \
     -H "Authorization: Bearer " \
     -F "model=openai/whisper-large-v3" \
     -F "file=https://cdn-global.hellotalk8.com/ht-global-1312929133/mmnt/2/250418/1/148239271/0/0/10e8a25a44ad37efd36155c1f447b0b5.aac" \
     -F "response_format=verbose_json" \
     "https://us-01.bytecompute.ai/v1/audio/transcriptions"

Response

Successful Responses

  • 200 OK
    • json format: Returns a JSON object with a single text field.
json Copy
{
  "task":"transcribe",
  "language":"english",
  "text":"You know the rules and so do I. A full commitment's what I'm thinking of",
  "segments":[
    {
      "id":0,
      "seek":0,
      "start":0.4,
      "end":5.3,
      "text":"You know the rules and so do I",
      "tokens":[
        50385,
        ...
      ],
      "temperature":0.0,
      "avg_logprob":-0.18997467888726127,
      "compression_ratio":1.054945054945055,
      "no_speech_prob":1.1973037145063259e-11
    },
    ...
  ],
  "duration":18.900000000000002,
  "usage":{
    "type":"duration",
    "seconds":18.900000000000002
  }
}

Error Responses

Status Code Description Detail
400 Bad Request The request is malformed or invalid. "No header part in the request", "No file or url in the request", "Invalid 'model'", or "Unsupported response_format".
401 Unauthorized The API key is missing or invalid. "Authorization header is missing or invalid." or "Invalid API key."
500 Internal Server Error A server-side issue occurred during transcription.

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.

Contact Sales