Qwen3.5-122B-A10B-FP8 Robot

Qwen3.5-122B-A10B-FP8

The latest flagship MoE (Mixture-of-Experts) model from the Qwen team. With a total of 122B parameters and 10B active parameters per token, it strikes an elite balance between reasoning throughput and model capacity. This build utilizes FP8 quantization, significantly reducing VRAM requirements and leveraging hardware acceleration on modern GPUs (H100/L40S) for high-performance inference in complex logic and coding tasks.

$0.4/M input tokens ; $3.2/M output tokens

Input

You need to login to use this model.

Output

Start a conversation by sending a message

Qwen3.5-122B-A10B-FP8

Qwen3.5-122B-A10B is a high-efficiency Sparse Mixture-of-Experts (MoE) model. With 122 billion total parameters and 10 billion active parameters, it provides the intelligence of a large-scale model with the inference speed of a much smaller one. The FP8 precision is optimized for NVIDIA H100/L40S and Blackwell architectures, ensuring low-latency responses for complex reasoning, long-context understanding, and multi-turn conversations.

Key Capabilities

  • Efficient Intelligence: Outperforms many dense models while using a fraction of the compute per token.
  • Extended Context: Natively supports up to 128,000 tokens, perfect for analyzing large codebases or multiple documents.
  • Multilingual Excellence: Advanced optimization for 50+ languages, especially strong in CJK (Chinese, Japanese, Korean) and English.
  • FP8 Performance: 2x faster throughput compared to BF16 versions on supported hardware.

Billing

Billed per 1M tokens (Input + Output).

Request Parameters

Parameter Type Required Description
model string Yes Must be Qwen3.5-122B-A10B-FP8.
messages array Yes Standard chat message objects.
max_tokens integer No Maximum tokens to generate (Up to 128K context).
temperature float No Controls randomness (0.0 - 2.0).
top_p float No Nucleus sampling threshold. Default: 0.9.
presence_penalty float No Penalizes tokens based on presence.
stream boolean No Whether to stream tokens in real-time.

Optional Parameters (Qwen3.5 Optimization)

Scenario Recommended Params Purpose
Logical Reasoning temperature: 0.2, top_p: 0.95 Best for math, coding, and logical step-by-step analysis.
Creative Writing temperature: 0.9, presence_penalty: 0.4 Encourages diverse vocabulary and more engaging storytelling.
Document Summarization temperature: 0.3, max_tokens: 4096 Ensures concise and accurate extraction from long texts.
Chat & Assistance temperature: 0.7, stream: true Provides a natural, responsive conversational experience.
Code Generation temperature: 0.1, top_p: 1.0 Minimizes syntax errors and ensures deterministic code structure.

Parameters Summary

  • model: Must be "Qwen3.5-122B-A10B-FP8".
  • messages: Array of message objects (role & content).
  • temperature: Default 0.7.
  • top_p: Default 0.9.

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.

Contact Sales