Dedicated Models

Chat models

Organization	Model Name	API Model String	Context length	Quantization
DeepSeek	DeepSeek R1 Distill Llama 70B	deepseek-ai/DeepSeek-R1-Distill-Llama-70B	4096	FP16
DeepSeek	Deepseek Coder Instruct (33B)	deepseek-ai/deepseek-coder-33b-instruct	16384	FP16
Google	Gemma-2 Instruct (9B)	google/gemma-2-9b-it	8192	FP16
Google	Gemma Instruct (2B)	google/gemma-2b-it	8192	FP16
Google	Gemma-2 Instruct (27B)	google/gemma-2-27b-it	8192	FP16
Google	Gemma Instruct (7B)	google/gemma-7b-it	8192	FP16
HuggingFace	Zephyr-7B-?	HuggingFaceH4/zephyr-7b-beta	32768	FP16
Meta	Meta Llama 3.1 70B Instruct Turbo	meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo	32768	FP8
Meta	LLaMA-2 Chat (13B)	meta-llama/Llama-2-13b-chat-hf	4096	FP16
Meta	Meta Llama 3 8B Instruct Reference	meta-llama/Llama-3-8b-chat-hf	8192	FP16
Meta	Meta Llama 3 70B Instruct Reference	meta-llama/Llama-3-70b-chat-hf	8192	FP16
Meta	Meta Llama 3 8B Instruct Lite	meta-llama/Meta-Llama-3-8B-Instruct-Lite	8192	INT4
Meta	Meta Llama 3.1 405B Instruct Turbo	meta-llama/Meta-Llama-3.1-405B-Instruct-Lite-Pro	4096	FP16
Meta	LLaMA-2 Chat (7B)	meta-llama/Llama-2-7b-chat-hf	4096	FP16
Meta	Meta Llama 3 70B Instruct Turbo	meta-llama/Meta-Llama-3-70B-Instruct-Turbo	8192	FP8
Meta	Meta Llama 3.1 8B Instruct Turbo	meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo	32768	FP8
Meta	LLaMA-2 Chat (13B)	bytecomputecomputer/llama-2-13b-chat	4096	FP16
Meta	LLaMA-2 Chat (7B)	bytecomputecomputer/llama-2-7b-chat	4096	FP16
Meta	LLaMA-2 Chat (70B)	bytecomputecomputer/llama-2-70b-chat	4096	FP16
Meta	Meta Llama 3 8B Instruct	meta-llama/Meta-Llama-3-8B-Instruct	8192	FP16
Meta	Meta Llama 3 70B Instruct	meta-llama/Meta-Llama-3-70B-Instruct	8192	FP16
Meta	Code Llama Instruct (70B)	codellama/CodeLlama-70b-Instruct-hf	4096	FP16
Meta	LLaMA-2 Chat (70B)	bytecomputecomputer/llama-2-70b-chat	4096	FP16
Meta	Code Llama Instruct (7B)	codellama/CodeLlama-7b-Instruct-hf	16384	FP16
Meta	LLaMA-2 Chat (70B)	meta-llama/Llama-2-70b-chat-hf	4096	FP16
Meta	Meta Llama 3.1 8B Instruct	meta-llama/Meta-Llama-3.1-8B-Instruct-Reference	16384	FP16
Meta	Meta Llama 3.1 70B Instruct	meta-llama/Meta-Llama-3.1-70B-Instruct-Reference	8192	FP16
microsoft	WizardLM-2 (8x22B)	microsoft/WizardLM-2-8x22B	65536	FP16
mistralai	Mistral (7B) Instruct	mistralai/Mistral-7B-Instruct-v0.1	4096	FP16
mistralai	Mistral (7B) Instruct v0.2	mistralai/Mistral-7B-Instruct-v0.2	32768	FP16
mistralai	Mistral (7B) Instruct v0.3	mistralai/Mistral-7B-Instruct-v0.3	32768	FP16
mistralai	Mixtral-8x7B Instruct v0.1	mistralai/Mixtral-8x7B-Instruct-v0.1	32768	FP16
NousResearch	Nous Hermes 2 - Mixtral 8x7B-DPO	NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO	32768	FP16
NousResearch	Nous Hermes 2 - Mixtral 8x7B-SFT	NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT	32768	FP16
Qwen	Qwen 2 Instruct (72B)	Qwen/Qwen2-72B-Instruct	32768	FP16
Qwen	Qwen2.5 72B Instruct Turbo	Qwen/Qwen2.5-72B-Instruct-Turbo	32768	FP8
Qwen	Qwen2.5 7B Instruct Turbo	Qwen/Qwen2.5-7B-Instruct-Turbo	32768	FP8
Qwen	Qwen 2 Instruct (1.5B)	Qwen/Qwen2-1.5B-Instruct	32768	FP16
Qwen	Qwen 2 Instruct (7B)	Qwen/Qwen2-7B-Instruct	32768	FP16
teknium	OpenHermes-2-Mistral (7B)	teknium/OpenHermes-2-Mistral-7B	8192	FP16
teknium	OpenHermes-2.5-Mistral (7B)	teknium/OpenHermes-2p5-Mistral-7B	8192	FP16
upstage	Upstage SOLAR Instruct v1 (11B)	upstage/SOLAR-10.7B-Instruct-v1.0	4096	FP16
upstage	Upstage SOLAR Instruct v1 (11B)-Int4	bytecomputecomputer/SOLAR-10.7B-Instruct-v1.0-int4	4096	FP16
WizardLM	WizardLM v1.2 (13B)	WizardLM/WizardLM-13B-V1.2	4096	FP16

Language models

Organization	Model Name	API Model String	Context length
google	Gemma 2 (9B)	google/gemma-2-9b	8192
Google	Gemma (7B)	google/gemma-7b	8192
Google	Gemma (2B)	google/gemma-2b	8192
Meta	Meta Llama 3 8B	meta-llama/Meta-Llama-3-8B	8192
Meta	LLaMA-2 (70B)	meta-llama/Llama-2-70b-hf	4096
Meta	LLaMA-2 (7B)	bytecomputecomputer/llama-2-7b	4096
Meta	LLaMA (7B)	huggyllama/llama-7b	2048
Meta	LLaMA (65B)	huggyllama/llama-65b	2048
Meta	LLaMA-2 (70B)	bytecomputecomputer/llama-2-70b	4096
Meta	LLaMA (13B)	huggyllama/llama-13b	2048
Meta	LLaMA (30B)	huggyllama/llama-30b	2048
Meta	Meta Llama 3 70B	meta-llama/Meta-Llama-3-70B	8192
Meta	LLaMA-2 (7B)	meta-llama/Llama-2-7b-hf	4096
Meta	Meta Llama 3 70B HF	meta-llama/Llama-3-70b-hf	8192
Meta	Meta Llama 3.1 8B	meta-llama/Meta-Llama-3.1-8B-Reference	8192
Meta	Meta Llama 3.1 70B	meta-llama/Meta-Llama-3.1-70B-Reference	8192
mistralai	Mixtral-8x7B v0.1	mistralai/Mixtral-8x7B-v0.1	32768
mistralai	Mistral (7B)	mistralai/Mistral-7B-v0.1	4096
Qwen	Qwen 2 (72B)	Qwen/Qwen2-72B	32768
Qwen	Qwen 2 VL (72B)	Qwen/Qwen2-VL-72B-Instruct
Qwen	Qwen 2 (7B)	Qwen/Qwen2-7B	32768
Qwen	Qwen 2 (1.5B)	Qwen/Qwen2-1.5B	32768
Qwen	Qwen 1.5 (32B)	Qwen/Qwen1.5-32B	32768
Qwen	Qwen 1.5 (14B)	Qwen/Qwen1.5-14B	32768
bytecompute	LLaMA-2-32K (7B)	bytecomputecomputer/LLaMA-2-7B-32K	32768

Code models

Organization	Model Name	API Model String	Context length
Meta	Code Llama Python (34B)	codellama/CodeLlama-34b-Python-hf	16384
Meta	Code Llama Python (70B)	codellama/CodeLlama-70b-Python-hf	4096
Meta	Code Llama Python (34B)	bytecomputecomputer/CodeLlama-34b-Python	16384
Meta	Code Llama (34B)	bytecomputecomputer/CodeLlama-34b	16384
Meta	Code Llama (13B)	codellama/CodeLlama-13b-hf	16384
Meta	Code Llama (34B)	codellama/CodeLlama-34b-hf	16384
Meta	Code Llama Python (7B)	bytecomputecomputer/CodeLlama-7b-Python	16384
Meta	Code Llama (70B)	codellama/CodeLlama-70b-hf	16384
Meta	Code Llama Python (13B)	bytecomputecomputer/CodeLlama-13b-Python	16384
Meta	Code Llama (7B)	codellama/CodeLlama-7b-hf	16384
Meta	Code Llama Python (13B)	codellama/CodeLlama-13b-Python-hf	16384
Meta	Code Llama Python (7B)	codellama/CodeLlama-7b-Python-hf	16384
Numbers Station	NSQL LLaMA-2 (7B)	NumbersStation/nsql-llama-2-7B	4096
Phind	Phind Code LLaMA v2 (34B)	Phind/Phind-CodeLlama-34B-v2	16384
Phind	Phind Code LLaMA Python v1 (34B)	Phind/Phind-CodeLlama-34B-Python-v1	16384
WizardLM	WizardCoder Python v1.0 (34B)	WizardLM/WizardCoder-Python-34B-V1.0	8192

Moderation models

Organization	Model Name	API Model String	Context length
Meta	Meta Llama Guard 3 8B	meta-llama/Meta-Llama-Guard-3-8B	8192
Meta	Meta Llama Guard 2 8B	meta-llama/LlamaGuard-2-8b	8192
Meta	Meta Llama Guard 3 11B Vision Turbo	meta-llama/Llama-Guard-3-11B-Vision-Turbo	131072
Meta	Llama Guard (7B)	Meta-Llama/Llama-Guard-7b	4096

Embedding models

Organization	Model Name	API Model String	Context length
BAAI	BAAI-Bge-Base-1p5	BAAI/bge-base-en-v1.5	undefined
BAAI	BAAI-Bge-Large-1p5	BAAI/bge-large-en-v1.5	undefined
Google	Bert Base Uncased	bert-base-uncased	undefined
HazyResearch	M2-BERT 2K Retrieval Encoder V1	hazyresearch/M2-BERT-2k-Retrieval-Encoder-V1	2048
bytecompute	M2-BERT-Retrieval-32k	bytecomputecomputer/m2-bert-80M-32k-retrieval	32768
bytecompute	M2-BERT-Retrieval-2K	bytecomputecomputer/m2-bert-80M-2k-retrieval	undefined
bytecompute	M2-BERT-Retrieval-8k	bytecomputecomputer/m2-bert-80M-8k-retrieval	8192
bytecompute	Sentence-BERT	sentence-transformers/msmarco-bert-base-dot-v5	512
WhereIsAI	UAE-Large-V1	WhereIsAI/UAE-Large-V1	undefined

Rerank models

Organization	Model Name	API Model String	Max Doc Size (tokens)	Max Docs
salesforce	Salesforce Llama Rank V1 (8B)	Salesforce/Llama-Rank-V1	8192	1024

Documentation

Dedicated Models

Chat models

Language models

Code models

Moderation models

Embedding models

Rerank models

On this page