Documentation

Dedicated Models

Chat models

Organization Model Name API Model String Context length Quantization
DeepSeek DeepSeek R1 Distill Llama 70B deepseek-ai/DeepSeek-R1-Distill-Llama-70B 4096 FP16
DeepSeek Deepseek Coder Instruct (33B) deepseek-ai/deepseek-coder-33b-instruct 16384 FP16
Google Gemma-2 Instruct (9B) google/gemma-2-9b-it 8192 FP16
Google Gemma Instruct (2B) google/gemma-2b-it 8192 FP16
Google Gemma-2 Instruct (27B) google/gemma-2-27b-it 8192 FP16
Google Gemma Instruct (7B) google/gemma-7b-it 8192 FP16
HuggingFace Zephyr-7B-? HuggingFaceH4/zephyr-7b-beta 32768 FP16
Meta Meta Llama 3.1 70B Instruct Turbo meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo 32768 FP8
Meta LLaMA-2 Chat (13B) meta-llama/Llama-2-13b-chat-hf 4096 FP16
Meta Meta Llama 3 8B Instruct Reference meta-llama/Llama-3-8b-chat-hf 8192 FP16
Meta Meta Llama 3 70B Instruct Reference meta-llama/Llama-3-70b-chat-hf 8192 FP16
Meta Meta Llama 3 8B Instruct Lite meta-llama/Meta-Llama-3-8B-Instruct-Lite 8192 INT4
Meta Meta Llama 3.1 405B Instruct Turbo meta-llama/Meta-Llama-3.1-405B-Instruct-Lite-Pro 4096 FP16
Meta LLaMA-2 Chat (7B) meta-llama/Llama-2-7b-chat-hf 4096 FP16
Meta Meta Llama 3 70B Instruct Turbo meta-llama/Meta-Llama-3-70B-Instruct-Turbo 8192 FP8
Meta Meta Llama 3.1 8B Instruct Turbo meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo 32768 FP8
Meta LLaMA-2 Chat (13B) bytecomputecomputer/llama-2-13b-chat 4096 FP16
Meta LLaMA-2 Chat (7B) bytecomputecomputer/llama-2-7b-chat 4096 FP16
Meta LLaMA-2 Chat (70B) bytecomputecomputer/llama-2-70b-chat 4096 FP16
Meta Meta Llama 3 8B Instruct meta-llama/Meta-Llama-3-8B-Instruct 8192 FP16
Meta Meta Llama 3 70B Instruct meta-llama/Meta-Llama-3-70B-Instruct 8192 FP16
Meta Code Llama Instruct (70B) codellama/CodeLlama-70b-Instruct-hf 4096 FP16
Meta LLaMA-2 Chat (70B) bytecomputecomputer/llama-2-70b-chat 4096 FP16
Meta Code Llama Instruct (7B) codellama/CodeLlama-7b-Instruct-hf 16384 FP16
Meta LLaMA-2 Chat (70B) meta-llama/Llama-2-70b-chat-hf 4096 FP16
Meta Meta Llama 3.1 8B Instruct meta-llama/Meta-Llama-3.1-8B-Instruct-Reference 16384 FP16
Meta Meta Llama 3.1 70B Instruct meta-llama/Meta-Llama-3.1-70B-Instruct-Reference 8192 FP16
microsoft WizardLM-2 (8x22B) microsoft/WizardLM-2-8x22B 65536 FP16
mistralai Mistral (7B) Instruct mistralai/Mistral-7B-Instruct-v0.1 4096 FP16
mistralai Mistral (7B) Instruct v0.2 mistralai/Mistral-7B-Instruct-v0.2 32768 FP16
mistralai Mistral (7B) Instruct v0.3 mistralai/Mistral-7B-Instruct-v0.3 32768 FP16
mistralai Mixtral-8x7B Instruct v0.1 mistralai/Mixtral-8x7B-Instruct-v0.1 32768 FP16
NousResearch Nous Hermes 2 - Mixtral 8x7B-DPO NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO 32768 FP16
NousResearch Nous Hermes 2 - Mixtral 8x7B-SFT NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT 32768 FP16
Qwen Qwen 2 Instruct (72B) Qwen/Qwen2-72B-Instruct 32768 FP16
Qwen Qwen2.5 72B Instruct Turbo Qwen/Qwen2.5-72B-Instruct-Turbo 32768 FP8
Qwen Qwen2.5 7B Instruct Turbo Qwen/Qwen2.5-7B-Instruct-Turbo 32768 FP8
Qwen Qwen 2 Instruct (1.5B) Qwen/Qwen2-1.5B-Instruct 32768 FP16
Qwen Qwen 2 Instruct (7B) Qwen/Qwen2-7B-Instruct 32768 FP16
teknium OpenHermes-2-Mistral (7B) teknium/OpenHermes-2-Mistral-7B 8192 FP16
teknium OpenHermes-2.5-Mistral (7B) teknium/OpenHermes-2p5-Mistral-7B 8192 FP16
upstage Upstage SOLAR Instruct v1 (11B) upstage/SOLAR-10.7B-Instruct-v1.0 4096 FP16
upstage Upstage SOLAR Instruct v1 (11B)-Int4 bytecomputecomputer/SOLAR-10.7B-Instruct-v1.0-int4 4096 FP16
WizardLM WizardLM v1.2 (13B) WizardLM/WizardLM-13B-V1.2 4096 FP16

Language models

Organization Model Name API Model String Context length
google Gemma 2 (9B) google/gemma-2-9b 8192
Google Gemma (7B) google/gemma-7b 8192
Google Gemma (2B) google/gemma-2b 8192
Meta Meta Llama 3 8B meta-llama/Meta-Llama-3-8B 8192
Meta LLaMA-2 (70B) meta-llama/Llama-2-70b-hf 4096
Meta LLaMA-2 (7B) bytecomputecomputer/llama-2-7b 4096
Meta LLaMA (7B) huggyllama/llama-7b 2048
Meta LLaMA (65B) huggyllama/llama-65b 2048
Meta LLaMA-2 (70B) bytecomputecomputer/llama-2-70b 4096
Meta LLaMA (13B) huggyllama/llama-13b 2048
Meta LLaMA (30B) huggyllama/llama-30b 2048
Meta Meta Llama 3 70B meta-llama/Meta-Llama-3-70B 8192
Meta LLaMA-2 (7B) meta-llama/Llama-2-7b-hf 4096
Meta Meta Llama 3 70B HF meta-llama/Llama-3-70b-hf 8192
Meta Meta Llama 3.1 8B meta-llama/Meta-Llama-3.1-8B-Reference 8192
Meta Meta Llama 3.1 70B meta-llama/Meta-Llama-3.1-70B-Reference 8192
mistralai Mixtral-8x7B v0.1 mistralai/Mixtral-8x7B-v0.1 32768
mistralai Mistral (7B) mistralai/Mistral-7B-v0.1 4096
Qwen Qwen 2 (72B) Qwen/Qwen2-72B 32768
Qwen Qwen 2 VL (72B) Qwen/Qwen2-VL-72B-Instruct
Qwen Qwen 2 (7B) Qwen/Qwen2-7B 32768
Qwen Qwen 2 (1.5B) Qwen/Qwen2-1.5B 32768
Qwen Qwen 1.5 (32B) Qwen/Qwen1.5-32B 32768
Qwen Qwen 1.5 (14B) Qwen/Qwen1.5-14B 32768
bytecompute LLaMA-2-32K (7B) bytecomputecomputer/LLaMA-2-7B-32K 32768

Code models

Organization Model Name API Model String Context length
Meta Code Llama Python (34B) codellama/CodeLlama-34b-Python-hf 16384
Meta Code Llama Python (70B) codellama/CodeLlama-70b-Python-hf 4096
Meta Code Llama Python (34B) bytecomputecomputer/CodeLlama-34b-Python 16384
Meta Code Llama (34B) bytecomputecomputer/CodeLlama-34b 16384
Meta Code Llama (13B) codellama/CodeLlama-13b-hf 16384
Meta Code Llama (34B) codellama/CodeLlama-34b-hf 16384
Meta Code Llama Python (7B) bytecomputecomputer/CodeLlama-7b-Python 16384
Meta Code Llama (70B) codellama/CodeLlama-70b-hf 16384
Meta Code Llama Python (13B) bytecomputecomputer/CodeLlama-13b-Python 16384
Meta Code Llama (7B) codellama/CodeLlama-7b-hf 16384
Meta Code Llama Python (13B) codellama/CodeLlama-13b-Python-hf 16384
Meta Code Llama Python (7B) codellama/CodeLlama-7b-Python-hf 16384
Numbers Station NSQL LLaMA-2 (7B) NumbersStation/nsql-llama-2-7B 4096
Phind Phind Code LLaMA v2 (34B) Phind/Phind-CodeLlama-34B-v2 16384
Phind Phind Code LLaMA Python v1 (34B) Phind/Phind-CodeLlama-34B-Python-v1 16384
WizardLM WizardCoder Python v1.0 (34B) WizardLM/WizardCoder-Python-34B-V1.0 8192

Moderation models

Organization Model Name API Model String Context length
Meta Meta Llama Guard 3 8B meta-llama/Meta-Llama-Guard-3-8B 8192
Meta Meta Llama Guard 2 8B meta-llama/LlamaGuard-2-8b 8192
Meta Meta Llama Guard 3 11B Vision Turbo meta-llama/Llama-Guard-3-11B-Vision-Turbo 131072
Meta Llama Guard (7B) Meta-Llama/Llama-Guard-7b 4096

Embedding models

Organization Model Name API Model String Context length
BAAI BAAI-Bge-Base-1p5 BAAI/bge-base-en-v1.5 undefined
BAAI BAAI-Bge-Large-1p5 BAAI/bge-large-en-v1.5 undefined
Google Bert Base Uncased bert-base-uncased undefined
HazyResearch M2-BERT 2K Retrieval Encoder V1 hazyresearch/M2-BERT-2k-Retrieval-Encoder-V1 2048
bytecompute M2-BERT-Retrieval-32k bytecomputecomputer/m2-bert-80M-32k-retrieval 32768
bytecompute M2-BERT-Retrieval-2K bytecomputecomputer/m2-bert-80M-2k-retrieval undefined
bytecompute M2-BERT-Retrieval-8k bytecomputecomputer/m2-bert-80M-8k-retrieval 8192
bytecompute Sentence-BERT sentence-transformers/msmarco-bert-base-dot-v5 512
WhereIsAI UAE-Large-V1 WhereIsAI/UAE-Large-V1 undefined

Rerank models

Organization Model Name API Model String Max Doc Size (tokens) Max Docs
salesforce Salesforce Llama Rank V1 (8B) Salesforce/Llama-Rank-V1 8192 1024