Higgs Audio V3 — Speech to Text

Overview

Higgs Audio V3 is a multilingual speech recognition model designed for high-accuracy transcription and speech translation.

It converts spoken audio into text across a wide range of languages and supports automatic language detection.

Automatic Speech Recognition (ASR):
Convert audio into text with high accuracy.
Multilingual Support:
Supports approximately 90+ languages.
Automatic Language Detection:
Detects the spoken language when not specified.
Speech Translation (AST):
Supports translating speech into another language.

To use the Higgs-Audio v3 STT model, send a POST request to the /v1/audio/transcriptions endpoint with the following parameters.

Parameter	Type	Required	Description
`file`	`binary`	Yes	The audio file to transcribe (mp3, wav, flac, m4a). Max 25MB.
`model`	`string`	Yes	Use `"bosonai-higgs-audio-v3-stt"`.
`language`	`string`	No	Optional, ISO 639-1,auto-detect