boson-audio-multimodal-checkpoint-1200 Robot

Higgs Audio V2.5

Higgs Audio V2.5 is a 1B parameter autoregressive audio transformer distilled from the 3B V2 model, featuring the DualFFN architecture for efficient acoustic token modeling. It uses a unified audio tokenizer running at 25 FPS with 12 codebooks at 2000 bps, outputting 24kHz audio. Trained on 1M+ hours of audio data (AudioWeb dataset) with GRPO alignment for human-like naturalness.

$0.045/MINUTE

Input

Template

Output

Generate speech to see the audio player.

Higgs Audio V2.5 (TTS) API Documentation

Higgs Audio V2.5 is a 1B parameter autoregressive audio transformer distilled from the 3B V2 model. Featuring the DualFFN architecture and trained on 1M+ hours of AudioWeb data with GRPO alignment, it delivers human-like naturalness and low-latency speech synthesis.

This endpoint follows the OpenAI-compatible /v1/chat/completions schema, outputting high-fidelity 24kHz audio via a unified 25 FPS tokenizer.


Billing

$0.045 / audio minute

Billed based on the total duration of generated audio.


Parameters Detail

Request Body (JSON)

Parameter Type Required Description
model string Yes Must be "boson/higgs-audio-v2.5".
messages array Yes Chat-formatted input. The model generates audio based on the last user message.
modalities array Yes Must include ["text", "audio"] to enable TTS mode.
audio object No Specifies audio details such as voice (e.g., alloy, echo) and format (wav, mp3).
temperature float No Controls randomness in prosody and naturalness. Recommended: 1.0.
top_p float No Nucleus sampling threshold. Recommended: 0.95 for optimal acoustic stability.
max_completion_tokens integer No Limits the total tokens generated (indirectly limits audio duration).
extra_body object No Model-specific parameters, such as {"top_k": 50} for decoding efficiency.
stop array No Stop sequences, e.g., `["<

Response Fields (verbose_json)

Top-level Object

Field Type Description
id string Unique identifier for the request.
object string Always "chat.completion".
created integer Unix timestamp of the request.
model string The exact model version executed.
choices array List of generated outputs (typically contains one item).
usage object Statistics including prompt_tokens and completion_tokens.

choices[].message.audio

Field Type Description
data string Base64-encoded audio data. Default sampling rate is 24kHz.
id string Unique ID for the audio resource.
expires_at integer Expiration timestamp for the audio data (if cached).
transcript string The normalized text content corresponding to the generated audio.

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.

Contact Sales