API Documentation: Whisper Audio Transcription 🎙️

This document specifies the API for transcribing audio files using the hosted Whisper large-v3-turbo model, following the OpenAI-compatible schema. This "Turbo" variant is a weakly supervised pre-trained model featuring a 4-layer decoder, offering 8.8x faster inference than the standard large-v3 while maintaining near-identical accuracy.

Endpoint

Method	URL	Summary
`POST`	`/v1/audio/transcriptions`	Transcribe audio/video using the Whisper model.

Authentication

The API utilizes Bearer Token authentication. A valid API_KEY must be included in the header for all requests.

Header	Example	Description
`Authorization`	`Bearer YOUR_API_KEY`	Your server-provided API Key.
`x-request-id`	`UUID_string`	Optional. Unique identifier for tracking. Generated by server if omitted.

Request Parameters

Requests must be sent as multipart/form-data.

Parameter	Type	Required	Description
`file`	`File` / `URL`	Yes	The audio file object or a direct URL to an audio/video file (mp3, mp4, mpeg, mpga, m4a, wav, webm, aac).
`model`	`string`	Yes	Use `"openai/whisper-large-v3-turbo"` for high-speed ASR.
`response_format`	`string`	No	Formats: `json`, `text`, `srt`, `vtt`, `verbose_json`. Default: `json`.
`temperature`	`number`	No	`0.0` to `1.0`. Controls randomness. Higher values increase variability. Default: `0.0`.
`language`	`string`	No	ISO-639-1 code (e.g., `en`, `zh`, `ja`) to improve transcription accuracy.
`prompt`	`string`	No	Optional text to guide the model's style or vocabulary.
`timestamp_granularities`	`array`	No	Only for `verbose_json`. Can include `word` or `segment`.

Whisper Large-Turbo

Input

Output

API Documentation: Whisper Audio Transcription 🎙️

Endpoint

Authentication

Request Parameters

Unlock the most affordable AI hosting