
The latest flagship MoE (Mixture-of-Experts) model from the Qwen team. With a total of 122B parameters and 10B active parameters per token, it strikes an elite balance between reasoning throughput and model capacity. This build utilizes FP8 quantization, significantly reducing VRAM requirements and leveraging hardware acceleration on modern GPUs (H100/L40S) for high-performance inference in complex logic and coding tasks.
Start a conversation by sending a message
Qwen3.5-122B-A10B is a high-efficiency Sparse Mixture-of-Experts (MoE) model. With 122 billion total parameters and 10 billion active parameters, it provides the intelligence of a large-scale model with the inference speed of a much smaller one. The FP8 precision is optimized for NVIDIA H100/L40S and Blackwell architectures, ensuring low-latency responses for complex reasoning, long-context understanding, and multi-turn conversations.
Billed per 1M tokens (Input + Output).
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string |
Yes | Must be Qwen3.5-122B-A10B-FP8. |
messages |
array |
Yes | Standard chat message objects. |
max_tokens |
integer |
No | Maximum tokens to generate (Up to 128K context). |
temperature |
float |
No | Controls randomness (0.0 - 2.0). |
top_p |
float |
No | Nucleus sampling threshold. Default: 0.9. |
presence_penalty |
float |
No | Penalizes tokens based on presence. |
stream |
boolean |
No | Whether to stream tokens in real-time. |
| Scenario | Recommended Params | Purpose |
|---|---|---|
| Logical Reasoning | temperature: 0.2, top_p: 0.95 |
Best for math, coding, and logical step-by-step analysis. |
| Creative Writing | temperature: 0.9, presence_penalty: 0.4 |
Encourages diverse vocabulary and more engaging storytelling. |
| Document Summarization | temperature: 0.3, max_tokens: 4096 |
Ensures concise and accurate extraction from long texts. |
| Chat & Assistance | temperature: 0.7, stream: true |
Provides a natural, responsive conversational experience. |
| Code Generation | temperature: 0.1, top_p: 1.0 |
Minimizes syntax errors and ensures deterministic code structure. |
"Qwen3.5-122B-A10B-FP8".0.7.0.9.Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.