Serverless models
Chat models
In the table below, models marked as "Turbo" are quantized to FP8 and those marked as "Lite" are INT4. All our other models are at full precision (FP16).
If you're not sure which chat model to use, we currently recommend Llama 3.3 70B Turbo (meta-llama/Llama-3.3-70B-Instruct-Turbo) to get started.
| Organization | Model Name | API Model String | Context length | Quantization |
|---|---|---|---|---|
| Moonshot | Kimi K2 Instruct | moonshotai/Kimi-K2-Instruct | 128000 | FP8 |
| Z.ai | GLM 4.5 Air | zai-org/GLM-4.5-Air-FP8 | 131072 | FP8 |
| Qwen | Qwen3 235B-A22B Thinking 2507 | Qwen/Qwen3-235B-A22B-Thinking-2507 | 262144 | FP8 |
| Qwen | Qwen3-Coder 480B-A35B Instruct | Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 | 256000 | FP8 |
| Qwen | Qwen3 235B-A22B Instruct 2507 | Qwen/Qwen3-235B-A22B-Instruct-2507-tput | 262144 | FP8 |
| DeepSeek | DeepSeek-R1-0528 | deepseek-ai/DeepSeek-R1 | 163839 | FP8 |
| DeepSeek | DeepSeek-V3-0324 | deepseek-ai/DeepSeek-V3 | 163839 | FP8 |
| Meta | Llama 4 Maverick (17Bx128E) |
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 | 1048576 | FP8 |
| Meta | Llama 4 Scout (17Bx16E) |
meta-llama/Llama-4-Scout-17B-16E-Instruct | 1048576 | FP16 |
| Meta | Llama 3.3 70B Instruct Turbo | meta-llama/Llama-3.3-70B-Instruct-Turbo | 131072 | FP8 |
| Perplexity AI | Perplexity AI R1-1776 | perplexity-ai/r1-1776 | 163840 | FP16 |
| Mistral AI | Magistral Small 2506 API | mistralai/Magistral-Small-2506 | 40960 | BF16 |
| DeepSeek | DeepSeek-R1-0528 Throughput | deepseek-ai/DeepSeek-R1-0528-tput | 163839 | FP8 |
| DeepSeek | DeepSeek R1 Distill Llama 70B | deepseek-ai/DeepSeek-R1-Distill-Llama-70B | 131072 | FP16 |
| DeepSeek | DeepSeek R1 Distill Qwen 1.5B | deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B* | 131072 | FP16 |
| DeepSeek | DeepSeek R1 Distill Qwen 14B | deepseek-ai/DeepSeek-R1-Distill-Qwen-14B | 131072 | FP16 |
| Marin Community | Marin 8B Instruct | marin-community/marin-8b-instruct | 4096 | FP16 |
| Mistral AI | Mistral Small 3 Instruct (24B) | mistralai/Mistral-Small-24B-Instruct-2501 | 32768 | FP16 |
| Meta | Llama 3.1 8B Instruct Turbo | meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo | 131072 | FP8 |
| Meta | Llama 3.3 70B Instruct Turbo (Free)** | meta-llama/Llama-3.3-70B-Instruct-Turbo-Free | 131072 | FP8 |
| Nvidia | Llama 3.1 Nemotron 70B | nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | 32768 | FP16 |
| Qwen | Qwen 2.5 7B Instruct Turbo | Qwen/Qwen2.5-7B-Instruct-Turbo | 32768 | FP8 |
| Qwen | Qwen 2.5 72B Instruct Turbo | Qwen/Qwen2.5-72B-Instruct-Turbo | 32768 | FP8 |
| Qwen | Qwen2.5 Vision Language 72B Instruct | Qwen/Qwen2.5-VL-72B-Instruct | 32768 | FP8 |
| Qwen | Qwen 2.5 Coder 32B Instruct | Qwen/Qwen2.5-Coder-32B-Instruct | 32768 | FP16 |
| Qwen | QwQ-32B | Qwen/QwQ-32B | 32768 | FP16 |
| Qwen | Qwen 2 Instruct (72B) | Qwen/Qwen2-72B-Instruct | 32768 | FP16 |
| Qwen | Qwen2 VL 72B Instruct | Qwen/Qwen2-VL-72B-Instruct | 32768 | FP16 |
| Qwen | Qwen3 235B A22B Throughput | Qwen/Qwen3-235B-A22B-fp8-tput | 40960 | FP8 |
| Arcee | Arcee AI Virtuoso Medium | arcee-ai/virtuoso-medium-v2 | 128000 | - |
| Arcee | Arcee AI Coder-Large | arcee-ai/coder-large | 32768 | - |
| Arcee | Arcee AI Virtuoso-Large | arcee-ai/virtuoso-large | 128000 | - |
| Arcee | Arcee AI Maestro | arcee-ai/maestro-reasoning | 128000 | - |
| Arcee | Arcee AI Caller | arcee-ai/caller | 32768 | - |
| Arcee | Arcee AI Blitz | arcee-ai/arcee-blitz | 32768 | - |
| Meta | Llama 3.1 405B Instruct Turbo | meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo | 130815 | FP8 |
| Meta | Llama 3.2 3B Instruct Turbo | meta-llama/Llama-3.2-3B-Instruct-Turbo | 131072 | FP16 |
| Meta | Llama 3 8B Instruct Lite | meta-llama/Meta-Llama-3-8B-Instruct-Lite | 8192 | INT4 |
| Meta | Llama 3 8B Instruct Reference | meta-llama/Llama-3-8b-chat-hf* | 8192 | FP16 |
| Meta | Llama 3 70B Instruct Reference | meta-llama/Llama-3-70b-chat-hf | 8192 | FP16 |
| Gemma 2 27B | google/gemma-2-27b-it | 8192 | FP16 | |
| Gemma Instruct (2B) | google/gemma-2b-it* | 8192 | FP16 | |
| Gemma 3N E4B Instruct | google/gemma-3n-E4B-it | 32768 | FP8 | |
| Gryphe | MythoMax-L2 (13B) | Gryphe/MythoMax-L2-13b* | 4096 | FP16 |
| Mistral AI | Mistral (7B) Instruct | mistralai/Mistral-7B-Instruct-v0.1 | 8192 | FP16 |
| Mistral AI | Mistral (7B) Instruct v0.2 | mistralai/Mistral-7B-Instruct-v0.2 | 32768 | FP16 |
| Mistral AI | Mistral (7B) Instruct v0.3 | mistralai/Mistral-7B-Instruct-v0.3 | 32768 | FP16 |
| NousResearch | Nous Hermes 2 - Mixtral 8x7B-DPO (46.7B) | NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO* | 32768 | FP16 |
* The Free version of Llama 3.3 70B Instruct Turbo has a reduced rate limit of 6 requests/minute for users on the free tier and 10 requests/minute for all other build tiers.
Image models
Use our Images endpoint for Image Models.
| Organization | Model Name | Model String for API | Default steps |
|---|---|---|---|
| Black Forest Labs | Flux.1 \[schnell] (free)* | black-forest-labs/FLUX.1-schnell-Free | N/A |
| Black Forest Labs | Flux.1 \[schnell] (Turbo) | black-forest-labs/FLUX.1-schnell | 4 |
| Black Forest Labs | Flux.1 Dev | black-forest-labs/FLUX.1-dev | 28 |
| Black Forest Labs | Flux.1 Canny | black-forest-labs/FLUX.1-canny* | 28 |
| Black Forest Labs | Flux.1 Depth | black-forest-labs/FLUX.1-depth* | 28 |
| Black Forest Labs | Flux.1 Redux | black-forest-labs/FLUX.1-redux* | 28 |
| Black Forest Labs | Flux1.1 \[pro] | black-forest-labs/FLUX.1.1-pro | - |
| Black Forest Labs | Flux.1 \[pro] | black-forest-labs/FLUX.1-pro | 28 |
| Black Forest Labs | Flux .1 Kontext \[pro] | black-forest-labs/FLUX.1-kontext-pro | 28 |
| Black Forest Labs | Flux .1 Kontext \[max] | black-forest-labs/FLUX.1-kontext-max | 28 |
| Black Forest Labs | Flux .1 Kontext \[dev] | black-forest-labs/FLUX.1-kontext-dev | 28 |
| Black Forest Labs | FLUX .1 Krea \[dev] | black-forest-labs/FLUX.1-krea-dev | 28 |
Note: Due to high demand, FLUX.1 \[schnell] Free has a model specific rate limit of 10 img/min. Flux Pro 1, Flux Pro 1.1, Flux .1 Kontext \[pro], and Flux .1 Kontext \[max] are limited to users Build Tier 2 and above. Flux models can also only be used with credits. Users are unable to call Flux with a zero or negative balance.
*Free model has reduced rate limits and performance compared to our paid Turbo endpoint for Flux Shnell named black-forest-labs/FLUX.1-schnell
