Higgs-Audio-v3-Speech-to-Text is a high-performance automatic speech recognition (ASR) model developed by BosonAI. Built on a 1.7B parameter architecture, it delivers accurate transcription across 60+ languages with an OpenAI Whisper-compatible API interface.
Please upload an audio file
Higgs Audio V3 is a multilingual speech recognition model designed for high-accuracy transcription and speech translation.
It converts spoken audio into text across a wide range of languages and supports automatic language detection.
Automatic Speech Recognition (ASR):
Convert audio into text with high accuracy.
Multilingual Support:
Supports approximately 90+ languages.
Automatic Language Detection:
Detects the spoken language when not specified.
Speech Translation (AST):
Supports translating speech into another language.
To use the Higgs-Audio v3 STT model, send a POST request to the /v1/audio/transcriptions endpoint with the following parameters.
| Parameter | Type | Required | Description |
|---|---|---|---|
file |
binary |
Yes | The audio file to transcribe (mp3, wav, flac, m4a). Max 25MB. |
model |
string |
Yes | Use "bosonai-higgs-audio-v3-stt". |
language |
string |
No | Optional, ISO 639-1,auto-detect |
Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.
