• featured Qwen3.5-27B-FP8 Robot

Qwen3.5-27B-FP8

A lean, high-intelligence dense model optimized for complex instruction following and precise API calling. With 27B parameters compressed via FP8 quantization, it offers a superior logic-to-memory ratio, enabling elite-level performance on consumer-grade or mid-tier enterprise GPUs.

$0.30/M input tokens $2.4/M output tokens

Qwen3.5-27B-FP8 Documentation

The Qwen3.5-27B-FP8 is a dense, multimodal foundation model released by Alibaba Cloud in February 2026. This specific variant utilizes FP8 (8-bit Floating Point) quantization, allowing the 27-billion-parameter model to achieve high-density reasoning while maintaining low latency and a smaller memory footprint compared to its BF16 counterpart.

Key Features

  • Dense Power: Unlike MoE siblings, it activates all 27B parameters for every token, yielding industry-leading scores on SWE-bench (72.4%).
  • Hybrid Attention: Uses a Gated DeltaNet (linear + full attention) architecture for efficient long-context processing.
  • 256K Context Window: Native support for 262,144 tokens, extensible up to 1M tokens.
  • Native Multimodal: Seamlessly processes text and high-resolution images within the same context.
  • Quantization Mastery: FP8 precision offers ~1.5x throughput increase on NVIDIA Blackwell and Hopper architectures.

Request Parameters

The following parameters must be included in the body of your POST request to the /v1/chat/completions endpoint.

Parameter Type Required Description
model string Yes Use "Qwen3.5-27B-FP8".
messages array Yes A list of message objects (role: system/user/assistant, content: string/array).
max_tokens integer No Limits the length of generated response. Default: 4096. Max: 262,144.
temperature float No Controls randomness. 0.6 is recommended for standard tasks; 1.0 for reasoning.
stream boolean No If true, tokens are sent as Server-Sent Events (SSE) as they are generated.

Optional Parameters

These parameters allow for fine-grained control over the model's behavior and output quality.

Parameter Type Default Description
top_p float 0.95 Nucleus sampling. Limits the next token choice to the top 95% cumulative probability.
top_k integer 40 Limits choice to the Top-K most likely tokens. Reduces rare word "loops."
presence_penalty float 0.0 Range: -2.0 to 2.0. Increases likelihood of talking about new topics.
stop string/array null Up to 4 sequences where the API will stop generating further tokens.
enable_thinking boolean false (Specific to reasoning providers) Enables the <think> reasoning block.

Parameters Summary

Metric Specification
Supported Modalities Text, Image
Default Context 262,144 Tokens
Output Type Text, Markdown, Structured JSON
Language Support 201+ Languages

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.

Contact Sales