boson-audio-multimodal-checkpoint-1200 Robot

Higgs Audio V2.5

Higgs Audio V2.5 is a 1B parameter autoregressive audio transformer distilled from the 3B V2 model, featuring the DualFFN architecture for efficient acoustic token modeling. It uses a unified audio tokenizer running at 25 FPS with 12 codebooks at 2000 bps, outputting 24kHz audio. Trained on 10M+ hours of audio data (AudioVerse dataset) with GRPO alignment for naturalness.

audio generation : $0.045/audio min

Input

Template

Output

Generate speech to see the audio player.

Higgs Audio V2.5 (TTS)

Higgs Audio V2.5 is a high-performance Text-to-Speech (TTS) model designed for low-latency audio generation and seamless integration with OpenAI-compatible APIs. It enables developers to generate speech directly from text using the /v1/chat/completions endpoint.


Billing

$0.045/audio min

Billed based on generated audio duration.
Audio generation cost depends on the total length of the synthesized audio output.


Quick Start

bash Copy
curl --location --request POST 'https://us-01.bytecompute.ai/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer ' \
--data-raw '{
    "model": "boson-audio-multimodal-checkpoint-1200",
    "messages": [
        {
            "role": "user",
            "content": "Welcome to ByteCompute! Explore our AI models, run them on powerful GPUs, and start creating your own AI projects today."
        }
    ],
    "modalities": ["text", "audio"],
    "max_completion_tokens": 500,
    "temperature": 1.0,
    "top_p": 0.95,
    "extra_body": {"top_k": 50},
    "stop": ["<|eot_id|>", "<|end_of_text|>", "<|audio_eos|>"]
}'

Request Parameters

Basic Request Example

json Copy
{
  "model": "boson-audio-multimodal-checkpoint-1200",
  "messages": [
    {
      "role": "user",
      "content": "Welcome to ByteCompute! Explore our AI models, run them on powerful GPUs, and start creating your own AI projects today."
    }
  ],
  "modalities": ["text", "audio"],
  "max_completion_tokens": 500,
  "temperature": 1.0,
  "top_p": 0.95,
  "extra_body": {
    "top_k": 50
  },
  "stop": ["<|eot_id|>", "<|end_of_text|>", "<|audio_eos|>"]
}

Supported Parameters

Parameter Type Required Description
model string Yes Must be "boson-audio-multimodal-checkpoint-1200"
messages array Yes Chat-style input text to convert into speech
modalities array No Output types, must include "audio" for TTS
max_completion_tokens integer No Maximum tokens to generate
temperature float No Controls randomness of generation
top_p float No Nucleus sampling threshold
extra_body object No Additional model-specific parameters (e.g., top_k)
stop array No Stop sequences for generation termination

Response Format

Standard Response

json Copy
{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "allowed_token_ids": null,
        "audio": {
          "data": "UklGRnIiAwBX......",
          "expires_at": 0,
          "id": "audio-80f2d7a5edb34bb88705b6ff914689de",
          "transcript": ""
        },
        "bad_words": [],
        "content": "",
        "mm_token_ids": null,
        "reasoning_content": null,
        "role": "assistant",
        "tool_calls": []
      },
      "stop_reason": 1025,
      "token_ids": null
    }
  ],
  "created": 1773887861,
  "id": "chatcmpl-c398928c30e644c0b662b1bb7c030684",
  "kv_transfer_params": null,
  "model": "boson-audio-multimodal-checkpoint-1200",
  "object": "chat.completion",
  "prompt_logprobs": null,
  "prompt_token_ids": null,
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 117,
    "prompt_tokens": 16,
    "prompt_tokens_details": null,
    "total_tokens": 133
  }
}

Response Fields

Top-level

Field Type Description
id string Unique request ID
object string Always "chat.completion"
created integer Unix timestamp
model string Model name
choices array List of generated outputs
usage object Token usage statistics

choices[]

Field Type Description
index integer Output index
finish_reason string Reason for completion (stop)
stop_reason integer Internal stop code

message

Field Type Description
role string Always "assistant"
content string Empty for TTS
audio object Generated audio payload

audio

Field Type Description
data string Base64-encoded audio data
id string Audio resource ID
expires_at integer Expiration timestamp
transcript string Optional transcript text

Usage Notes

  • Audio is returned as base64-encoded data in message.audio.data
  • To play audio in browsers:
html Copy
<audio controls src="data:audio/wav;base64,UklGRnIiAwBX..."></audio>
  • content field is empty because this is a pure TTS response
  • transcript may be empty or contain normalized text depending on implementation

Summary

Higgs Audio V2.5 provides a minimal and efficient TTS interface using an OpenAI-compatible /v1/chat/completions API. It supports extended generation controls such as sampling parameters and stop sequences, while returning audio data directly in base64 format for easy integration with web and mobile applications.

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.

Contact Sales