
Higgs Audio V2.5 is a 1B parameter autoregressive audio transformer distilled from the 3B V2 model, featuring the DualFFN architecture for efficient acoustic token modeling. It uses a unified audio tokenizer running at 25 FPS with 12 codebooks at 2000 bps, outputting 24kHz audio. Trained on 10M+ hours of audio data (AudioVerse dataset) with GRPO alignment for naturalness.
Template
Generate speech to see the audio player.
Higgs Audio V2.5 is a high-performance Text-to-Speech (TTS) model designed for low-latency audio generation and seamless integration with OpenAI-compatible APIs. It enables developers to generate speech directly from text using the /v1/chat/completions endpoint.
$0.045/audio min
Billed based on generated audio duration.
Audio generation cost depends on the total length of the synthesized audio output.
curl --location --request POST 'https://us-01.bytecompute.ai/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer ' \
--data-raw '{
"model": "boson-audio-multimodal-checkpoint-1200",
"messages": [
{
"role": "user",
"content": "Welcome to ByteCompute! Explore our AI models, run them on powerful GPUs, and start creating your own AI projects today."
}
],
"modalities": ["text", "audio"],
"max_completion_tokens": 500,
"temperature": 1.0,
"top_p": 0.95,
"extra_body": {"top_k": 50},
"stop": ["<|eot_id|>", "<|end_of_text|>", "<|audio_eos|>"]
}'
{
"model": "boson-audio-multimodal-checkpoint-1200",
"messages": [
{
"role": "user",
"content": "Welcome to ByteCompute! Explore our AI models, run them on powerful GPUs, and start creating your own AI projects today."
}
],
"modalities": ["text", "audio"],
"max_completion_tokens": 500,
"temperature": 1.0,
"top_p": 0.95,
"extra_body": {
"top_k": 50
},
"stop": ["<|eot_id|>", "<|end_of_text|>", "<|audio_eos|>"]
}
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Must be "boson-audio-multimodal-checkpoint-1200" |
messages |
array | Yes | Chat-style input text to convert into speech |
modalities |
array | No | Output types, must include "audio" for TTS |
max_completion_tokens |
integer | No | Maximum tokens to generate |
temperature |
float | No | Controls randomness of generation |
top_p |
float | No | Nucleus sampling threshold |
extra_body |
object | No | Additional model-specific parameters (e.g., top_k) |
stop |
array | No | Stop sequences for generation termination |
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"allowed_token_ids": null,
"audio": {
"data": "UklGRnIiAwBX......",
"expires_at": 0,
"id": "audio-80f2d7a5edb34bb88705b6ff914689de",
"transcript": ""
},
"bad_words": [],
"content": "",
"mm_token_ids": null,
"reasoning_content": null,
"role": "assistant",
"tool_calls": []
},
"stop_reason": 1025,
"token_ids": null
}
],
"created": 1773887861,
"id": "chatcmpl-c398928c30e644c0b662b1bb7c030684",
"kv_transfer_params": null,
"model": "boson-audio-multimodal-checkpoint-1200",
"object": "chat.completion",
"prompt_logprobs": null,
"prompt_token_ids": null,
"service_tier": null,
"system_fingerprint": null,
"usage": {
"completion_tokens": 117,
"prompt_tokens": 16,
"prompt_tokens_details": null,
"total_tokens": 133
}
}
| Field | Type | Description |
|---|---|---|
id |
string | Unique request ID |
object |
string | Always "chat.completion" |
created |
integer | Unix timestamp |
model |
string | Model name |
choices |
array | List of generated outputs |
usage |
object | Token usage statistics |
| Field | Type | Description |
|---|---|---|
index |
integer | Output index |
finish_reason |
string | Reason for completion (stop) |
stop_reason |
integer | Internal stop code |
| Field | Type | Description |
|---|---|---|
role |
string | Always "assistant" |
content |
string | Empty for TTS |
audio |
object | Generated audio payload |
| Field | Type | Description |
|---|---|---|
data |
string | Base64-encoded audio data |
id |
string | Audio resource ID |
expires_at |
integer | Expiration timestamp |
transcript |
string | Optional transcript text |
message.audio.data<audio controls src="data:audio/wav;base64,UklGRnIiAwBX..."></audio>
content field is empty because this is a pure TTS responsetranscript may be empty or contain normalized text depending on implementationHiggs Audio V2.5 provides a minimal and efficient TTS interface using an OpenAI-compatible /v1/chat/completions API. It supports extended generation controls such as sampling parameters and stop sequences, while returning audio data directly in base64 format for easy integration with web and mobile applications.
Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.