Serverless models

Chat models

In the table below, models marked as "Turbo" are quantized to FP8 and those marked as "Lite" are INT4. All our other models are at full precision (FP16).

If you're not sure which chat model to use, we currently recommend Llama 3.3 70B Turbo (meta-llama/Llama-3.3-70B-Instruct-Turbo) to get started.

Organization	Model Name	API Model String	Context length	Quantization
Moonshot	Kimi K2 Instruct	moonshotai/Kimi-K2-Instruct	128000	FP8
Z.ai	GLM 4.5 Air	zai-org/GLM-4.5-Air-FP8	131072	FP8
Qwen	Qwen3 235B-A22B Thinking 2507	Qwen/Qwen3-235B-A22B-Thinking-2507	262144	FP8
Qwen	Qwen3-Coder 480B-A35B Instruct	Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8	256000	FP8
Qwen	Qwen3 235B-A22B Instruct 2507	Qwen/Qwen3-235B-A22B-Instruct-2507-tput	262144	FP8
DeepSeek	DeepSeek-R1-0528	deepseek-ai/DeepSeek-R1	163839	FP8
DeepSeek	DeepSeek-V3-0324	deepseek-ai/DeepSeek-V3	163839	FP8
Meta	Llama 4 Maverick (17Bx128E)	meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8	1048576	FP8
Meta	Llama 4 Scout (17Bx16E)	meta-llama/Llama-4-Scout-17B-16E-Instruct	1048576	FP16
Meta	Llama 3.3 70B Instruct Turbo	meta-llama/Llama-3.3-70B-Instruct-Turbo	131072	FP8
Perplexity AI	Perplexity AI R1-1776	perplexity-ai/r1-1776	163840	FP16
Mistral AI	Magistral Small 2506 API	mistralai/Magistral-Small-2506	40960	BF16
DeepSeek	DeepSeek-R1-0528 Throughput	deepseek-ai/DeepSeek-R1-0528-tput	163839	FP8
DeepSeek	DeepSeek R1 Distill Llama 70B	deepseek-ai/DeepSeek-R1-Distill-Llama-70B	131072	FP16
DeepSeek	DeepSeek R1 Distill Qwen 1.5B	deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B*	131072	FP16
DeepSeek	DeepSeek R1 Distill Qwen 14B	deepseek-ai/DeepSeek-R1-Distill-Qwen-14B	131072	FP16
Marin Community	Marin 8B Instruct	marin-community/marin-8b-instruct	4096	FP16
Mistral AI	Mistral Small 3 Instruct (24B)	mistralai/Mistral-Small-24B-Instruct-2501	32768	FP16
Meta	Llama 3.1 8B Instruct Turbo	meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo	131072	FP8
Meta	Llama 3.3 70B Instruct Turbo (Free)**	meta-llama/Llama-3.3-70B-Instruct-Turbo-Free	131072	FP8
Nvidia	Llama 3.1 Nemotron 70B	nvidia/Llama-3.1-Nemotron-70B-Instruct-HF	32768	FP16
Qwen	Qwen 2.5 7B Instruct Turbo	Qwen/Qwen2.5-7B-Instruct-Turbo	32768	FP8
Qwen	Qwen 2.5 72B Instruct Turbo	Qwen/Qwen2.5-72B-Instruct-Turbo	32768	FP8
Qwen	Qwen2.5 Vision Language 72B Instruct	Qwen/Qwen2.5-VL-72B-Instruct	32768	FP8
Qwen	Qwen 2.5 Coder 32B Instruct	Qwen/Qwen2.5-Coder-32B-Instruct	32768	FP16
Qwen	QwQ-32B	Qwen/QwQ-32B	32768	FP16
Qwen	Qwen 2 Instruct (72B)	Qwen/Qwen2-72B-Instruct	32768	FP16
Qwen	Qwen2 VL 72B Instruct	Qwen/Qwen2-VL-72B-Instruct	32768	FP16
Qwen	Qwen3 235B A22B Throughput	Qwen/Qwen3-235B-A22B-fp8-tput	40960	FP8
Arcee	Arcee AI Virtuoso Medium	arcee-ai/virtuoso-medium-v2	128000	-
Arcee	Arcee AI Coder-Large	arcee-ai/coder-large	32768	-
Arcee	Arcee AI Virtuoso-Large	arcee-ai/virtuoso-large	128000	-
Arcee	Arcee AI Maestro	arcee-ai/maestro-reasoning	128000	-
Arcee	Arcee AI Caller	arcee-ai/caller	32768	-
Arcee	Arcee AI Blitz	arcee-ai/arcee-blitz	32768	-
Meta	Llama 3.1 405B Instruct Turbo	meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo	130815	FP8
Meta	Llama 3.2 3B Instruct Turbo	meta-llama/Llama-3.2-3B-Instruct-Turbo	131072	FP16
Meta	Llama 3 8B Instruct Lite	meta-llama/Meta-Llama-3-8B-Instruct-Lite	8192	INT4
Meta	Llama 3 8B Instruct Reference	meta-llama/Llama-3-8b-chat-hf*	8192	FP16
Meta	Llama 3 70B Instruct Reference	meta-llama/Llama-3-70b-chat-hf	8192	FP16
Google	Gemma 2 27B	google/gemma-2-27b-it	8192	FP16
Google	Gemma Instruct (2B)	google/gemma-2b-it*	8192	FP16
Google	Gemma 3N E4B Instruct	google/gemma-3n-E4B-it	32768	FP8
Gryphe	MythoMax-L2 (13B)	Gryphe/MythoMax-L2-13b*	4096	FP16
Mistral AI	Mistral (7B) Instruct	mistralai/Mistral-7B-Instruct-v0.1	8192	FP16
Mistral AI	Mistral (7B) Instruct v0.2	mistralai/Mistral-7B-Instruct-v0.2	32768	FP16
Mistral AI	Mistral (7B) Instruct v0.3	mistralai/Mistral-7B-Instruct-v0.3	32768	FP16
NousResearch	Nous Hermes 2 - Mixtral 8x7B-DPO (46.7B)	NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO*	32768	FP16

* The Free version of Llama 3.3 70B Instruct Turbo has a reduced rate limit of 6 requests/minute for users on the free tier and 10 requests/minute for all other build tiers.

Image models

Use our Images endpoint for Image Models.

Organization	Model Name	Model String for API	Default steps
Black Forest Labs	Flux.1 \[schnell] (free)*	black-forest-labs/FLUX.1-schnell-Free	N/A
Black Forest Labs	Flux.1 \[schnell] (Turbo)	black-forest-labs/FLUX.1-schnell	4
Black Forest Labs	Flux.1 Dev	black-forest-labs/FLUX.1-dev	28
Black Forest Labs	Flux.1 Canny	black-forest-labs/FLUX.1-canny*	28
Black Forest Labs	Flux.1 Depth	black-forest-labs/FLUX.1-depth*	28
Black Forest Labs	Flux.1 Redux	black-forest-labs/FLUX.1-redux*	28
Black Forest Labs	Flux1.1 \[pro]	black-forest-labs/FLUX.1.1-pro	-
Black Forest Labs	Flux.1 \[pro]	black-forest-labs/FLUX.1-pro	28
Black Forest Labs	Flux .1 Kontext \[pro]	black-forest-labs/FLUX.1-kontext-pro	28
Black Forest Labs	Flux .1 Kontext \[max]	black-forest-labs/FLUX.1-kontext-max	28
Black Forest Labs	Flux .1 Kontext \[dev]	black-forest-labs/FLUX.1-kontext-dev	28
Black Forest Labs	FLUX .1 Krea \[dev]	black-forest-labs/FLUX.1-krea-dev	28

Note: Due to high demand, FLUX.1 \[schnell] Free has a model specific rate limit of 10 img/min. Flux Pro 1, Flux Pro 1.1, Flux .1 Kontext \[pro], and Flux .1 Kontext \[max] are limited to users Build Tier 2 and above. Flux models can also only be used with credits. Users are unable to call Flux with a zero or negative balance.

*Free model has reduced rate limits and performance compared to our paid Turbo endpoint for Flux Shnell named black-forest-labs/FLUX.1-schnell

Documentation

Serverless models

Chat models

Image models

On this page