Documentation

Supported Models

The following models are available to use with our fine-tuning API. Get started with fine-tuning a model!

  • Training Precision Type indicates the precision type used during training for each model.

    • AMP (Automated Mixed Precision): AMP allows the training speed to be faster with less memory usage while preserving convergence behavior compared to using float32. Learn more about AMP in this PyTorch blog.
    • bf16 (bfloat 16): This uses bf16 for all weights. Some large models on our platform uses full bf16 training for better memory usage and training speed.
  • Long-context fine-tuning of Llama 3.1 (8B) Reference, Llama 3.1 (8B) Reference, Llama 3.1 (70B) Reference, Llama 3.1 Instruct (70B) Reference for context sizes of 32K-131K is only supported using the LoRA method.

  • For Llama 3.1 (405B) Fine-tuning, please contact us.

LoRA Fine-tuning

Organization Model Name Model String for API Context Length Max Batch Size Max Batch Size (DPO) Min Batch Size Training Precision Type*
Google google/gemma-3-27b-it google/gemma-3-27b-it 12288 8 8 8 AMP
Google google/gemma-3-27b-pt google/gemma-3-27b-pt 12288 8 8 8 AMP
Google google/gemma-3-12b-it google/gemma-3-12b-it 16384 8 8 8 AMP
Google google/gemma-3-12b-pt google/gemma-3-12b-pt 16384 8 8 8 AMP
Google google/gemma-3-4b-it google/gemma-3-4b-it 16384 8 8 8 AMP
Google google/gemma-3-4b-pt google/gemma-3-4b-pt 16384 8 8 8 AMP
Google google/gemma-3-1b-it google/gemma-3-1b-it 16384 8 8 8 AMP
Google google/gemma-3-1b-pt google/gemma-3-1b-pt 16384 8 8 8 AMP
Qwen Qwen/Qwen3-32B Qwen/Qwen3-32B 8192 16 8 8 AMP
Qwen Qwen/Qwen3-14B Qwen/Qwen3-14B 8192 24 8 8 AMP
Qwen Qwen/Qwen3-14B-Base Qwen/Qwen3-14B-Base 8192 24 8 8 AMP
Qwen Qwen/Qwen3-8B Qwen/Qwen3-8B 8192 32 16 8 AMP
Qwen Qwen/Qwen3-8B-Base Qwen/Qwen3-8B-Base 8192 32 16 8 AMP
Qwen Qwen/Qwen3-4B Qwen/Qwen3-4B 8192 32 16 8 AMP
Qwen Qwen/Qwen3-4B-Base Qwen/Qwen3-4B-Base 8192 32 16 8 AMP
Qwen Qwen/Qwen3-1.7B Qwen/Qwen3-1.7B 8192 40 16 8 AMP
Qwen Qwen/Qwen3-1.7B-Base Qwen/Qwen3-1.7B-Base 8192 40 16 8 AMP
Qwen Qwen/Qwen3-0.6B Qwen/Qwen3-0.6B 8192 40 16 8 AMP
Qwen Qwen/Qwen3-0.6B-Base Qwen/Qwen3-0.6B-Base 8192 40 16 8 AMP
Deepseek DeepSeek-R1-Distill-Llama-70B deepseek-ai/DeepSeek-R1-Distill-Llama-70B 8192 8 8 8 AMP
Deepseek DeepSeek-R1-Distill-Qwen-14B deepseek-ai/DeepSeek-R1-Distill-Qwen-14B 8192 40 16 8 AMP
Deepseek DeepSeek-R1-Distill-Qwen-1.5B deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B 8192 48 24 8 AMP
Meta Llama 3.3 Instruct (70B) Reference meta-llama/Llama-3.3-70B-Instruct-Reference 8192 8 8 8 AMP
Meta Llama 3.2 Instruct (3B) meta-llama/Llama-3.2-3B-Instruct 8192 32 16 8 AMP
Meta Llama 3.2 Instruct (1B) meta-llama/Llama-3.2-1B-Instruct 8192 32 16 8 AMP
Meta Llama 3.1 (8B) Reference meta-llama/Meta-Llama-3.1-8B-Reference 8192 32 16 8 AMP
Meta Llama 3.1 Instruct (8B) Reference meta-llama/Meta-Llama-3.1-8B-Instruct-Reference 8192 32 16 8 AMP
Meta Llama 3.1 (70B) Reference meta-llama/Meta-Llama-3.1-70B-Reference 8192 8 8 8 AMP
Meta Llama 3.1 Instruct (70B) Reference meta-llama/Meta-Llama-3.1-70B-Instruct-Reference 8192 8 8 8 AMP
Meta Llama 3 (8B) meta-llama/Meta-Llama-3-8B 8192 32 16 8 AMP
Meta Llama 3 Instruct (8B) meta-llama/Meta-Llama-3-8B-Instruct 8192 32 16 8 AMP
Meta Llama 3 Instruct (70B) meta-llama/Meta-Llama-3-70B-Instruct 8192 8 8 8 AMP
Meta Llama-2 Chat (7B) bytecomputecomputer/llama-2-7b-chat 4096 128 64 8 AMP
Meta CodeLlama (7B) codellama/CodeLlama-7b-hf 16384 32 16 8 AMP
Mistral AI Mixtral-8x7B (46.7B) mistralai/Mixtral-8x7B-v0.1 32768 16 8 8 AMP
Mistral AI Mixtral-8x7B Instruct (46.7B) mistralai/Mixtral-8x7B-Instruct-v0.1 32768 16 8 8 AMP
Mistral AI Mistral 7B Instruct v0.2 mistralai/Mistral-7B-Instruct-v0.2 32768 16 8 8 AMP
Mistral AI Mistral 7B v0.1 mistralai/Mistral-7B-v0.1 8192 64 32 8 AMP
Qwen Qwen2.5-72B Qwen/Qwen2.5-72B-Instruct 8192 16 8 8 AMP
Qwen Qwen2.5-14B Qwen/Qwen2.5-14B-Instruct 8192 40 16 8 AMP
Qwen Qwen2-1.5B Qwen/Qwen2-1.5B 8192 48 24 8 AMP
Qwen Qwen2-1.5B-Instruct Qwen/Qwen2-1.5B-Instruct 8192 48 24 8 AMP
Qwen Qwen2-7B Qwen/Qwen2-7B 8192 32 16 8 AMP
Qwen Qwen2-7B-Instruct Qwen/Qwen2-7B-Instruct 8192 32 16 8 AMP
Qwen Qwen2-72B Qwen/Qwen2-72B 8192 8 8 8 AMP
Qwen Qwen2-72B-Instruct Qwen/Qwen2-72B-Instruct 8192 8 8 8 AMP
Teknium OpenHermes 2.5 Mistral 7B teknium/OpenHermes-2p5-Mistral-7B 8192 64 32 8 AMP

LoRA Long-context Fine-tuning

Organization Model Name Model String for API Context Length Max Batch Size Max Batch Size (DPO) Min Batch Size Training Precision Type*
Deepseek deepseek-ai/DeepSeek-R1-Distill-Llama-70B deepseek-ai/DeepSeek-R1-Distill-Llama-70B-32k 32768 1* 1* 1* AMP
Meta Llama 3.3 Instruct (70B) Reference meta-llama/Llama-3.3-70B-32k-Instruct-Reference 32768 1* 1* 1* AMP
Meta Llama 3.1 (8B) Reference meta-llama/Meta-Llama-3.1-8B-32k-Reference 32768 8 8 8 AMP
Meta Llama 3.1 Instruct (8B) Reference meta-llama/Meta-Llama-3.1-8B-32k-Instruct-Reference 32768 8 8 8 AMP
Meta Llama 3.1 (70B) Reference meta-llama/Meta-Llama-3.1-70B-32k-Reference 32768 1* 1* 1* AMP
Meta Llama 3.1 Instruct (70B) Reference meta-llama/Meta-Llama-3.1-70B-32k-Instruct-Reference 32768 1* 1* 1* AMP

1* -- Gradient accumulation 8 is used, so effectively you will get batch size 8 (iteration time is slower).

Full Fine-tuning

Organization Model Name Model String for API Context Length Max Batch Size Max Batch Size (DPO) Min Batch Size Training Precision Type*
Google google/gemma-3-27b-it google/gemma-3-27b-it 12288 8 8 8 AMP
Google google/gemma-3-27b-pt google/gemma-3-27b-pt 12288 8 8 8 AMP
Google google/gemma-3-12b-it google/gemma-3-12b-it 16384 8 8 8 AMP
Google google/gemma-3-12b-pt google/gemma-3-12b-pt 16384 8 8 8 AMP
Google google/gemma-3-4b-it google/gemma-3-4b-it 16384 8 8 8 AMP
Google google/gemma-3-4b-pt google/gemma-3-4b-pt 16384 8 8 8 AMP
Google google/gemma-3-1b-it google/gemma-3-1b-it 16384 24 8 8 AMP
Google google/gemma-3-1b-pt google/gemma-3-1b-pt 16384 24 8 8 AMP
Qwen Qwen/Qwen3-32B Qwen/Qwen3-32B 8192 8 8 8 AMP
Qwen Qwen/Qwen3-14B Qwen/Qwen3-14B 8192 16 8 8 AMP
Qwen Qwen/Qwen3-14B-Base Qwen/Qwen3-14B-Base 8192 16 8 8 AMP
Qwen Qwen/Qwen3-8B Qwen/Qwen3-8B 8192 24 8 8 AMP
Qwen Qwen/Qwen3-8B-Base Qwen/Qwen3-8B-Base 8192 24 8 8 AMP
Qwen Qwen/Qwen3-4B Qwen/Qwen3-4B 8192 32 16 8 AMP
Qwen Qwen/Qwen3-4B-Base Qwen/Qwen3-4B-Base 8192 32 16 8 AMP
Qwen Qwen/Qwen3-1.7B Qwen/Qwen3-1.7B 8192 40 16 8 AMP
Qwen Qwen/Qwen3-1.7B-Base Qwen/Qwen3-1.7B-Base 8192 40 16 8 AMP
Qwen Qwen/Qwen3-0.6B Qwen/Qwen3-0.6B 8192 40 16 8 AMP
Qwen Qwen/Qwen3-0.6B-Base Qwen/Qwen3-0.6B-Base 8192 40 16 8 AMP
Deepseek DeepSeek-R1-Distill-Llama-70B deepseek-ai/DeepSeek-R1-Distill-Llama-70B 8192 16 8 16 bf16
Deepseek DeepSeek-R1-Distill-Qwen-14B deepseek-ai/DeepSeek-R1-Distill-Qwen-14B 8192 32 16 8 AMP
Deepseek DeepSeek-R1-Distill-Qwen-1.5B deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B 8192 48 24 8 AMP
Meta Llama 3.3 Instruct (70B) Reference meta-llama/Llama-3.3-70B-Instruct-Reference 8192 16 8 16 bf16
Meta Llama 3.1 (8B) Reference meta-llama/Meta-Llama-3.1-8B-Reference 8192 24 8 8 AMP
Meta Llama 3.1 Instruct (8B) Reference meta-llama/Meta-Llama-3.1-8B-Instruct-Reference 8192 24 8 8 AMP
Meta Llama 3.1 (70B) Reference meta-llama/Meta-Llama-3.1-70B-Reference 8192 16 8 16 bf16
Meta Llama 3.1 Instruct (70B) Reference meta-llama/Meta-Llama-3.1-70B-Instruct-Reference 8192 16 8 16 bf16
Meta Llama 3 (8B) meta-llama/Meta-Llama-3-8B 8192 24 8 8 AMP
Meta Llama 3 Instruct (8B) meta-llama/Meta-Llama-3-8B-Instruct 8192 24 8 8 AMP
Meta Llama 3 Instruct (70B) meta-llama/Meta-Llama-3-70B-Instruct 8192 16 8 16 bf16
Meta Llama-2 Chat (7B) bytecomputecomputer/llama-2-7b-chat 4096 96 48 8 AMP
Meta CodeLlama (7B) codellama/CodeLlama-7b-hf 16384 32 16 8 AMP
Mistral AI Mixtral-8x7B (46.7B) mistralai/Mixtral-8x7B-v0.1 32768 16 8 16 bf16
Mistral AI Mixtral-8x7B Instruct (46.7B) mistralai/Mixtral-8x7B-Instruct-v0.1 32768 16 8 16 bf16
Mistral AI Mistral 7B Instruct v0.2 mistralai/Mistral-7B-Instruct-v0.2 32768 16 8 8 AMP
Mistral AI Mistral 7B v0.1 mistralai/Mistral-7B-v0.1 8192 64 32 8 AMP
Qwen Qwen2-1.5B Qwen/Qwen2-1.5B 8192 48 24 8 AMP
Qwen Qwen2-1.5B-Instruct Qwen/Qwen2-1.5B-Instruct 8192 48 24 8 AMP
Qwen Qwen2-7B Qwen/Qwen2-7B 8192 24 8 8 AMP
Qwen Qwen2-7B-Instruct Qwen/Qwen2-7B-Instruct 8192 24 8 8 AMP
Teknium OpenHermes 2.5 Mistral 7B teknium/OpenHermes-2p5-Mistral-7B 8192 64 32 8 AMP

Request a model