openai/whisper-large-v3 - bytecompute.ai

API Documentation: Whisper Audio Transcription 🎙️

This document specifies the API for transcribing audio files using the hosted Whisper large-v3 model. This endpoint mimics the OpenAI Whisper API while extending support for URL-based transcriptions.

Endpoint

Method	URL	Summary
`POST`	`/v1/audio/transcriptions`	Transcribe audio using the Whisper large-v3 model.

Authentication

The API uses Bearer Token authentication via the Authorization header. If an API_KEY is set on the server, a valid token must be provided.

Header	Example	Description
`Authorization`	`Bearer YOUR_API_KEY`	The `API_KEY` provided by the server.
`x-request-id`	`UUID_string`	An optional unique identifier for request tracking.

Request

The request must be of type multipart/form-data. It requires a file or a URL and several optional parameters.

Parameter	Type	Required	Description
`file`	`File` or `string`	Yes	The audio file to transcribe. It can be a direct file upload or a URL to an audio file.
`model`	`string`	Yes	Use`"openai/whisper-large-v3"`.
`response_format`	`string`	No	`json`(default), `text`, `srt`, `vtt`, or `verbose_json`.
`temperature`	`number`	No	`0.0` to `1.0`. Defaults to `0.0`.
`language`	`string`	No	ISO-639-1 code.
`prompt`	`string`	No	Text to guide the model's style or continue a previous segment.

Example Requests

1. Transcribing a File Upload

bash Copy

curl --location --request POST 'https://us-01.bytecompute.ai/v1/audio/transcriptions' \
--header 'Authorization: Bearer ' \
--form 'file=@"path/to/file.aac"' \
--form 'model="openai/whisper-large-v3"' \
--form 'response_format="verbose_json"'

2. Transcribing a URL

bash Copy

curl --raw -s \
     -H "Authorization: Bearer " \
     -F "model=openai/whisper-large-v3" \
     -F "file=https://cdn-global.hellotalk8.com/ht-global-1312929133/mmnt/2/250418/1/148239271/0/0/10e8a25a44ad37efd36155c1f447b0b5.aac" \
     -F "response_format=verbose_json" \
     "https://us-01.bytecompute.ai/v1/audio/transcriptions"

Response

Successful Responses

200 OK
- json format: Returns a JSON object with a single text field.

json Copy

{
  "task":"transcribe",
  "language":"english",
  "text":"You know the rules and so do I. A full commitment's what I'm thinking of",
  "segments":[
    {
      "id":0,
      "seek":0,
      "start":0.4,
      "end":5.3,
      "text":"You know the rules and so do I",
      "tokens":[
        50385,
        ...
      ],
      "temperature":0.0,
      "avg_logprob":-0.18997467888726127,
      "compression_ratio":1.054945054945055,
      "no_speech_prob":1.1973037145063259e-11
    },
    ...
  ],
  "duration":18.900000000000002,
  "usage":{
    "type":"duration",
    "seconds":18.900000000000002
  }
}

Error Responses

Status Code	Description	Detail
`400 Bad Request`	The request is malformed or invalid.	"No header part in the request", "No file or url in the request", "Invalid 'model'", or "Unsupported response_format".
`401 Unauthorized`	The API key is missing or invalid.	"Authorization header is missing or invalid." or "Invalid API key."
`500 Internal Server Error`	A server-side issue occurred during transcription.

Whisper Large V3

Input

Output