Qwen3.5-122B-A10B-FP8 API Documentation

Qwen3.5-122B-A10B-FP8 is the flagship Sparse Mixture-of-Experts (MoE) multimodal foundation model from the Qwen3.5 series. By integrating 122 billion total parameters with 10 billion active parameters per token, it delivers elite-level reasoning capacity with the inference efficiency of a much smaller model.

This version is optimized with fine-grained FP8 quantization, specifically engineered for NVIDIA Blackwell and Hopper (H100/L40S) architectures to maximize throughput in complex multimodal reasoning and long-context applications.

Key Capabilities

Unified Vision-Language: Natively processes both text and high-resolution images within a single transformer foundation.
Native Thinking Mode: Features an internal reasoning loop (Chain-of-Thought) for superior performance in math, coding, and complex logic.
Massive 262K Context: Natively supports up to 262,144 tokens, extensible to 1.01 million tokens for massive document or codebase analysis.
Global Linguistic Coverage: Advanced optimization for 201+ languages and dialects, including industry-leading CJK and English support.
FP8 Efficiency: Significant VRAM reduction and hardware-accelerated throughput on modern data center GPUs.

Request Parameters (`/v1/chat/completions`)

Parameter	Type	Required	Description
`model`	`string`	Yes	Must be `"Qwen3.5-122B-A10B-FP8"`.
`messages`	`array`	Yes	Array of message objects. Supports text and image inputs.
`max_tokens`	`integer`	No	Maximum tokens to generate. Suggested range: `1` to `8,192`.
`enable_thinking`	`boolean`	No	Default: true. Enables the `<think>` reasoning block.
`temperature`	`float`	No	Controls randomness. `0.2` for logic, `0.7` for general chat.
`top_p`	`float`	No	Nucleus sampling threshold. Default: `0.95`.
`stream`	`boolean`	No	Whether to stream tokens in real-time via SSE.

Optimization Scenarios

Scenario	Recommended Params	Purpose
Deep Reasoning	`enable_thinking: true`, `temp: 0.2`	Best for complex math, logic, and multi-step coding problems.
Multimodal Analysis	`messages: [with image]`, `temp: 0.4`	Ideal for document OCR, chart reasoning, and visual QA.
Long Doc Synthesis	`max_tokens: 4096+`, `top_p: 0.9`	Leverages the 262K context for accurate, long-form extraction.
Global Translation	`temperature: 0.3`	Utilizes the 201-language support for nuanced, cultural translation.

Qwen3.5 122B

Input

Output

Qwen3.5-122B-A10B-FP8 API Documentation

Key Capabilities

Request Parameters (`/v1/chat/completions`)

Optimization Scenarios

Unlock the most affordable AI hosting

Qwen3.5 122B

Input

Output

Qwen3.5-122B-A10B-FP8 API Documentation

Key Capabilities

Request Parameters (/v1/chat/completions)

Optimization Scenarios

Unlock the most affordable AI hosting

Request Parameters (`/v1/chat/completions`)