Higgs Audio V2.5 (TTS) API Documentation

Higgs Audio V2.5 is a 1B parameter autoregressive audio transformer distilled from the 3B V2 model. Featuring the DualFFN architecture and trained on 1M+ hours of AudioWeb data with GRPO alignment, it delivers human-like naturalness and low-latency speech synthesis.

This endpoint follows the OpenAI-compatible /v1/chat/completions schema, outputting high-fidelity 24kHz audio via a unified 25 FPS tokenizer.

Billing

$0.045 / audio minute

Billed based on the total duration of generated audio.

Parameters Detail

Request Body (JSON)

Parameter	Type	Required	Description
`model`	`string`	Yes	Must be `"boson/higgs-audio-v2.5"`.
`messages`	`array`	Yes	Chat-formatted input. The model generates audio based on the last `user` message.
`modalities`	`array`	Yes	Must include `["text", "audio"]` to enable TTS mode.
`audio`	`object`	No	Specifies audio details such as `voice` (e.g., `alloy`, `echo`) and `format` (`wav`, `mp3`).
`temperature`	`float`	No	Controls randomness in prosody and naturalness. Recommended: `1.0`.
`top_p`	`float`	No	Nucleus sampling threshold. Recommended: `0.95` for optimal acoustic stability.
`max_completion_tokens`	`integer`	No	Limits the total tokens generated (indirectly limits audio duration).
`extra_body`	`object`	No	Model-specific parameters, such as `{"top_k": 50}` for decoding efficiency.
`stop`	`array`	No	Stop sequences, e.g., `["<

Response Fields (verbose_json)

Top-level Object

Field	Type	Description
`id`	`string`	Unique identifier for the request.
`object`	`string`	Always `"chat.completion"`.
`created`	`integer`	Unix timestamp of the request.
`model`	`string`	The exact model version executed.
`choices`	`array`	List of generated outputs (typically contains one item).
`usage`	`object`	Statistics including `prompt_tokens` and `completion_tokens`.

choices[].message.audio

Field	Type	Description
`data`	`string`	Base64-encoded audio data. Default sampling rate is 24kHz.
`id`	`string`	Unique ID for the audio resource.
`expires_at`	`integer`	Expiration timestamp for the audio data (if cached).
`transcript`	`string`	The normalized text content corresponding to the generated audio.

Higgs Audio V2.5

Input

Output