Documentation

Serverless models

Chat models

In the table below, models marked as "Turbo" are quantized to FP8 and those marked as "Lite" are INT4. All our other models are at full precision (FP16).

If you're not sure which chat model to use, we currently recommend Llama 3.3 70B Turbo (meta-llama/Llama-3.3-70B-Instruct-Turbo) to get started.

Organization Model Name API Model String Context length Quantization
Moonshot Kimi K2 Instruct moonshotai/Kimi-K2-Instruct 128000 FP8
Z.ai GLM 4.5 Air zai-org/GLM-4.5-Air-FP8 131072 FP8
Qwen Qwen3 235B-A22B Thinking 2507 Qwen/Qwen3-235B-A22B-Thinking-2507 262144 FP8
Qwen Qwen3-Coder 480B-A35B Instruct Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 256000 FP8
Qwen Qwen3 235B-A22B Instruct 2507 Qwen/Qwen3-235B-A22B-Instruct-2507-tput 262144 FP8
DeepSeek DeepSeek-R1-0528 deepseek-ai/DeepSeek-R1 163839 FP8
DeepSeek DeepSeek-V3-0324 deepseek-ai/DeepSeek-V3 163839 FP8
Meta Llama 4 Maverick
(17Bx128E)
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 1048576 FP8
Meta Llama 4 Scout
(17Bx16E)
meta-llama/Llama-4-Scout-17B-16E-Instruct 1048576 FP16
Meta Llama 3.3 70B Instruct Turbo meta-llama/Llama-3.3-70B-Instruct-Turbo 131072 FP8
Perplexity AI Perplexity AI R1-1776 perplexity-ai/r1-1776 163840 FP16
Mistral AI Magistral Small 2506 API mistralai/Magistral-Small-2506 40960 BF16
DeepSeek DeepSeek-R1-0528 Throughput deepseek-ai/DeepSeek-R1-0528-tput 163839 FP8
DeepSeek DeepSeek R1 Distill Llama 70B deepseek-ai/DeepSeek-R1-Distill-Llama-70B 131072 FP16
DeepSeek DeepSeek R1 Distill Qwen 1.5B deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B* 131072 FP16
DeepSeek DeepSeek R1 Distill Qwen 14B deepseek-ai/DeepSeek-R1-Distill-Qwen-14B 131072 FP16
Marin Community Marin 8B Instruct marin-community/marin-8b-instruct 4096 FP16
Mistral AI Mistral Small 3 Instruct (24B) mistralai/Mistral-Small-24B-Instruct-2501 32768 FP16
Meta Llama 3.1 8B Instruct Turbo meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo 131072 FP8
Meta Llama 3.3 70B Instruct Turbo (Free)** meta-llama/Llama-3.3-70B-Instruct-Turbo-Free 131072 FP8
Nvidia Llama 3.1 Nemotron 70B nvidia/Llama-3.1-Nemotron-70B-Instruct-HF 32768 FP16
Qwen Qwen 2.5 7B Instruct Turbo Qwen/Qwen2.5-7B-Instruct-Turbo 32768 FP8
Qwen Qwen 2.5 72B Instruct Turbo Qwen/Qwen2.5-72B-Instruct-Turbo 32768 FP8
Qwen Qwen2.5 Vision Language 72B Instruct Qwen/Qwen2.5-VL-72B-Instruct 32768 FP8
Qwen Qwen 2.5 Coder 32B Instruct Qwen/Qwen2.5-Coder-32B-Instruct 32768 FP16
Qwen QwQ-32B Qwen/QwQ-32B 32768 FP16
Qwen Qwen 2 Instruct (72B) Qwen/Qwen2-72B-Instruct 32768 FP16
Qwen Qwen2 VL 72B Instruct Qwen/Qwen2-VL-72B-Instruct 32768 FP16
Qwen Qwen3 235B A22B Throughput Qwen/Qwen3-235B-A22B-fp8-tput 40960 FP8
Arcee Arcee AI Virtuoso Medium arcee-ai/virtuoso-medium-v2 128000 -
Arcee Arcee AI Coder-Large arcee-ai/coder-large 32768 -
Arcee Arcee AI Virtuoso-Large arcee-ai/virtuoso-large 128000 -
Arcee Arcee AI Maestro arcee-ai/maestro-reasoning 128000 -
Arcee Arcee AI Caller arcee-ai/caller 32768 -
Arcee Arcee AI Blitz arcee-ai/arcee-blitz 32768 -
Meta Llama 3.1 405B Instruct Turbo meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo 130815 FP8
Meta Llama 3.2 3B Instruct Turbo meta-llama/Llama-3.2-3B-Instruct-Turbo 131072 FP16
Meta Llama 3 8B Instruct Lite meta-llama/Meta-Llama-3-8B-Instruct-Lite 8192 INT4
Meta Llama 3 8B Instruct Reference meta-llama/Llama-3-8b-chat-hf* 8192 FP16
Meta Llama 3 70B Instruct Reference meta-llama/Llama-3-70b-chat-hf 8192 FP16
Google Gemma 2 27B google/gemma-2-27b-it 8192 FP16
Google Gemma Instruct (2B) google/gemma-2b-it* 8192 FP16
Google Gemma 3N E4B Instruct google/gemma-3n-E4B-it 32768 FP8
Gryphe MythoMax-L2 (13B) Gryphe/MythoMax-L2-13b* 4096 FP16
Mistral AI Mistral (7B) Instruct mistralai/Mistral-7B-Instruct-v0.1 8192 FP16
Mistral AI Mistral (7B) Instruct v0.2 mistralai/Mistral-7B-Instruct-v0.2 32768 FP16
Mistral AI Mistral (7B) Instruct v0.3 mistralai/Mistral-7B-Instruct-v0.3 32768 FP16
NousResearch Nous Hermes 2 - Mixtral 8x7B-DPO (46.7B) NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO* 32768 FP16

* The Free version of Llama 3.3 70B Instruct Turbo has a reduced rate limit of 6 requests/minute for users on the free tier and 10 requests/minute for all other build tiers.

Image models

Use our Images endpoint for Image Models.

Organization Model Name Model String for API Default steps
Black Forest Labs Flux.1 \[schnell] (free)* black-forest-labs/FLUX.1-schnell-Free N/A
Black Forest Labs Flux.1 \[schnell] (Turbo) black-forest-labs/FLUX.1-schnell 4
Black Forest Labs Flux.1 Dev black-forest-labs/FLUX.1-dev 28
Black Forest Labs Flux.1 Canny black-forest-labs/FLUX.1-canny* 28
Black Forest Labs Flux.1 Depth black-forest-labs/FLUX.1-depth* 28
Black Forest Labs Flux.1 Redux black-forest-labs/FLUX.1-redux* 28
Black Forest Labs Flux1.1 \[pro] black-forest-labs/FLUX.1.1-pro -
Black Forest Labs Flux.1 \[pro] black-forest-labs/FLUX.1-pro 28
Black Forest Labs Flux .1 Kontext \[pro] black-forest-labs/FLUX.1-kontext-pro 28
Black Forest Labs Flux .1 Kontext \[max] black-forest-labs/FLUX.1-kontext-max 28
Black Forest Labs Flux .1 Kontext \[dev] black-forest-labs/FLUX.1-kontext-dev 28
Black Forest Labs FLUX .1 Krea \[dev] black-forest-labs/FLUX.1-krea-dev 28

Note: Due to high demand, FLUX.1 \[schnell] Free has a model specific rate limit of 10 img/min. Flux Pro 1, Flux Pro 1.1, Flux .1 Kontext \[pro], and Flux .1 Kontext \[max] are limited to users Build Tier 2 and above. Flux models can also only be used with credits. Users are unable to call Flux with a zero or negative balance.

*Free model has reduced rate limits and performance compared to our paid Turbo endpoint for Flux Shnell named black-forest-labs/FLUX.1-schnell