
A lean, high-intelligence dense model optimized for complex instruction following and precise API calling. With 27B parameters compressed via FP8 quantization, it offers a superior logic-to-memory ratio, enabling elite-level performance on consumer-grade or mid-tier enterprise GPUs.
The Qwen3.5-27B-FP8 is a dense, multimodal foundation model released by Alibaba Cloud in February 2026. This specific variant utilizes FP8 (8-bit Floating Point) quantization, allowing the 27-billion-parameter model to achieve high-density reasoning while maintaining low latency and a smaller memory footprint compared to its BF16 counterpart.
The following parameters must be included in the body of your POST request to the /v1/chat/completions endpoint.
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string |
Yes | Use "Qwen3.5-27B-FP8". |
messages |
array |
Yes | A list of message objects (role: system/user/assistant, content: string/array). |
max_tokens |
integer |
No | Limits the length of generated response. Default: 4096. Max: 262,144. |
temperature |
float |
No | Controls randomness. 0.6 is recommended for standard tasks; 1.0 for reasoning. |
stream |
boolean |
No | If true, tokens are sent as Server-Sent Events (SSE) as they are generated. |
These parameters allow for fine-grained control over the model's behavior and output quality.
| Parameter | Type | Default | Description |
|---|---|---|---|
top_p |
float |
0.95 |
Nucleus sampling. Limits the next token choice to the top 95% cumulative probability. |
top_k |
integer |
40 |
Limits choice to the Top-K most likely tokens. Reduces rare word "loops." |
presence_penalty |
float |
0.0 |
Range: -2.0 to 2.0. Increases likelihood of talking about new topics. |
stop |
string/array |
null |
Up to 4 sequences where the API will stop generating further tokens. |
enable_thinking |
boolean |
false |
(Specific to reasoning providers) Enables the <think> reasoning block. |
| Metric | Specification |
|---|---|
| Supported Modalities | Text, Image |
| Default Context | 262,144 Tokens |
| Output Type | Text, Markdown, Structured JSON |
| Language Support | 201+ Languages |
Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.