Qwen3-VL-235B-A22B-Instruct-AWQ Robot

Qwen3-VL-235B-A22B-Instruct-AWQ

A massive-scale Vision-Language (VL) MoE model designed for complex multimodal instruction following. Featuring a total of 235B parameters with 22B active parameters per token, it delivers top-tier performance in image understanding, document parsing, and visual reasoning. Quantized via AWQ (Activation-aware Weight Quantization), it is optimized for 4-bit weight compression to enable large-scale multimodal deployment with high throughput.

$ 0.071/M input tokens;$0.10/M output tokens

Qwen3-VL-235B-A22B-Instruct-AWQ

Qwen3-VL-235B-A22B-Instruct-AWQ is the state-of-the-art (SOTA) multimodal MoE model from the Qwen3 series. Optimized with AWQ 4-bit quantization, this model allows for high-performance visual reasoning and text generation on a more accessible hardware footprint. It supports ultra-long contexts of up to 256K tokens, enabling it to analyze massive documents, long videos (hours long), and complex multi-image dialogues with second-level precision.

Key Capabilities

  • Unified Vision-Language Mastery: Seamlessly integrates text, image, and video understanding.
  • MoE Efficiency: 235B total parameters with only 22B activated per token, balancing power and speed.
  • Advanced OCR & Grounding: Supports 32+ languages for OCR and provides precise 2D/3D object grounding coordinates.
  • Agentic Interaction: Capable of acting as a visual agent for PC/Mobile GUI automation.
  • Long-form Video Reasoning: Uses Interleaved-MRoPE for robust temporal modeling across long video sequences.

Quantization Details

  • Type: AWQ (Activation-aware Weight Quantization) 4-bit.
  • Benefits: ~3x reduction in VRAM usage and ~3x speedup in throughput compared to FP16.]

Request Parameters

Parameter Type Required Description
model string Yes Qwen3-VL-235B-A22B-Instruct-AWQ
messages array Yes Support for text, image_url, and video_url types.
max_tokens integer No Max response length. Supports up to 262,144 tokens.
temperature float No Recommended: 0.6 (for reasoning/thinking mode).
top_p float No Recommended: 0.95.
repetition_penalty float No Recommended: 1.05 - 1.1 to prevent MoE loops.

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.

Contact Sales