Qwen3-VL-235B-A22B-Instruct-AWQ

Qwen3-VL-235B-A22B-Instruct-AWQ is the state-of-the-art (SOTA) multimodal MoE model from the Qwen3 series. Optimized with AWQ 4-bit quantization, this model allows for high-performance visual reasoning and text generation on a more accessible hardware footprint. It supports ultra-long contexts of up to 256K tokens, enabling it to analyze massive documents, long videos (hours long), and complex multi-image dialogues with second-level precision.

Key Capabilities

Unified Vision-Language Mastery: Seamlessly integrates text, image, and video understanding.
MoE Efficiency: 235B total parameters with only 22B activated per token, balancing power and speed.
Advanced OCR & Grounding: Supports 32+ languages for OCR and provides precise 2D/3D object grounding coordinates.
Agentic Interaction: Capable of acting as a visual agent for PC/Mobile GUI automation.
Long-form Video Reasoning: Uses Interleaved-MRoPE for robust temporal modeling across long video sequences.

Quantization Details

Type: AWQ (Activation-aware Weight Quantization) 4-bit.
Benefits: ~3x reduction in VRAM usage and ~3x speedup in throughput compared to FP16.]

Request Parameters

Parameter	Type	Required	Description
`model`	`string`	Yes	`Qwen3-VL-235B-A22B-Instruct-AWQ`
`messages`	`array`	Yes	Support for `text`, `image_url`, and `video_url` types.
`max_tokens`	`integer`	No	Max response length. Supports up to 262,144 tokens.
`temperature`	`float`	No	Recommended: 0.6 (for reasoning/thinking mode).
`top_p`	`float`	No	Recommended: 0.95.
`repetition_penalty`	`float`	No	Recommended: 1.05 - 1.1 to prevent MoE loops.

Qwen3-VL-235B-A22B-Instruct-AWQ

Qwen3-VL-235B-A22B-Instruct-AWQ

Key Capabilities

Quantization Details

Request Parameters

Unlock the most affordable AI hosting