
The flagship Mixture-of-Experts (MoE) model from the Qwen3.5 series, featuring 122B total and 10B active parameters. This unified vision-language foundation excels in multimodal reasoning, complex coding, and native 'thinking mode' tasks. Utilizing fine-grained FP8 quantization, it offers exceptional throughput and reduced VRAM footprint on H100/L40S GPUs, while supporting a massive 262K context window for long-horizon agentic applications.
Qwen3.5-122B-A10B-FP8 is the flagship Sparse Mixture-of-Experts (MoE) multimodal foundation model from the Qwen3.5 series. By integrating 122 billion total parameters with 10 billion active parameters per token, it delivers elite-level reasoning capacity with the inference efficiency of a much smaller model.
This version is optimized with fine-grained FP8 quantization, specifically engineered for NVIDIA Blackwell and Hopper (H100/L40S) architectures to maximize throughput in complex multimodal reasoning and long-context applications.
/v1/chat/completions)| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string |
Yes | Must be "Qwen3.5-122B-A10B-FP8". |
messages |
array |
Yes | Array of message objects. Supports text and image inputs. |
max_tokens |
integer |
No | Maximum tokens to generate. Suggested range: 1 to 8,192. |
enable_thinking |
boolean |
No | Default: true. Enables the <think> reasoning block. |
temperature |
float |
No | Controls randomness. 0.2 for logic, 0.7 for general chat. |
top_p |
float |
No | Nucleus sampling threshold. Default: 0.95. |
stream |
boolean |
No | Whether to stream tokens in real-time via SSE. |
| Scenario | Recommended Params | Purpose |
|---|---|---|
| Deep Reasoning | enable_thinking: true, temp: 0.2 |
Best for complex math, logic, and multi-step coding problems. |
| Multimodal Analysis | messages: [with image], temp: 0.4 |
Ideal for document OCR, chart reasoning, and visual QA. |
| Long Doc Synthesis | max_tokens: 4096+, top_p: 0.9 |
Leverages the 262K context for accurate, long-form extraction. |
| Global Translation | temperature: 0.3 |
Utilizes the 201-language support for nuanced, cultural translation. |
Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.