End-to-end platform for developing your AI applications

No matter where you start, build and scale your AI with ByteCompute.

Explore AI Models Directory

All categories and models you can try out and seamlessly integrate in your projects

openai/whisper-large-v3-turboStar Featured

Whisper Large-Turbo

automatic-speech-recognition

A weakly supervised pre-trained version of the Whisper model, optimized for high-speed Automatic Speech Recognition (ASR) and speech translation. By significantly reducing the number of decoder layers to 4 while maintaining the robust large-v3 encoder architecture, this 'Turbo' variant offers an 8.8x speedup compared to large-v3 with minimal degradation in Word Error Rate (WER). It is specifically designed as a high-efficiency alternative for low-latency production environments.

Memory 2 GBSetting 809 M$0.00066/MINUTE
Qwen/Qwen3-32B-FP8Star Featured

Qwen3 32B

TEXT

A high-performance, dense Transformer model from the Qwen3 series featuring 32.8 billion parameters. Optimized with fine-grained FP8 quantization, it reduces VRAM requirements for weights to ~32.8GB while supporting a native 128K context window. Featuring an integrated 'Thinking Mode' for deep reasoning, it serves as an ideal balance between complex logic and fast inference for high-concurrency enterprise applications.

Memory 10GBSetting 32BINPUT $0.240/1M TOKENS; OUTPUT$1.800/1M TOKENS
Qwen3.5-122B-A10B-FP8

Qwen3.5 122B

TEXT

The flagship Mixture-of-Experts (MoE) model from the Qwen3.5 series, featuring 122B total and 10B active parameters. This unified vision-language foundation excels in multimodal reasoning, complex coding, and native 'thinking mode' tasks. Utilizing fine-grained FP8 quantization, it offers exceptional throughput and reduced VRAM footprint on H100/L40S GPUs, while supporting a massive 262K context window for long-horizon agentic applications.

Memory 122GBSetting 122BINPUT $0.400/1M TOKENS ;OUTPUT $3.200/1M TOKENS
Lightricks/LTX-2.3Star Featured

LTX-2.3

VIDEO

A state-of-the-art Diffusion Transformer (DiT) foundation model with 22 billion parameters. Unlike traditional video models, LTX-2 is natively designed for synchronized audio-video generation within a single unified latent space. It excels at maintaining temporal consistency and high-fidelity motion, making it a powerful backend for creative AI pipelines that require seamless audiovisual coherence.

Memory 22GBSetting 22B$0.110/SECOND
Qwen3.5-27B-FP8Star Featured

Qwen3.5 27B

TEXT

A high-intelligence sparse Mixture-of-Experts (MoE) model optimized for advanced reasoning, complex instruction following, and precise tool use. With 27B parameters and fine-grained FP8 quantization, it features a 262K native context window and native 'thinking mode' support, delivering elite-level logic and linguistic performance with exceptional inference efficiency.

Memory 16GSetting 27BINPUT $0.300/1M TOKENS;OUTPUT $2.400/1M TOKENS
boson-audio-multimodal-checkpoint-1200

Higgs Audio V2.5

AUDIO

Higgs Audio V2.5 is a 1B parameter autoregressive audio transformer distilled from the 3B V2 model, featuring the DualFFN architecture for efficient acoustic token modeling. It uses a unified audio tokenizer running at 25 FPS with 12 codebooks at 2000 bps, outputting 24kHz audio. Trained on 1M+ hours of audio data (AudioWeb dataset) with GRPO alignment for human-like naturalness.

Memory Setting 1B$0.045/MINUTE
openai/whisper-large-v3

Whisper Large V3

automatic-speech-recognition

The Whisper large-v3 is a pre-trained model for Automatic Speech Recognition (ASR) and speech translation. It features a robust Transformer encoder-decoder architecture designed for state-of-the-art accuracy across a wide range of languages and audio conditions.

Memory 2GBSetting 809M$0.0015/MINUTE
flux1-schnell

FLUX Schnell

IMAGE

A fast text-to-image model optimized for rapid image generation. FLUX.1 [schnell] delivers high-quality visual results with low latency, making it ideal for real-time creative workflows, quick prototyping, and interactive image generation.

Memory 25GBSetting 19B$0.003/IMAGE
flux2-klein-4bStar Featured

FLUX Klein

IMAGE

The FLUX.2 [klein] model family are our fastest image models to date. FLUX.2 [klein] unifies generation and editing in a single compact architecture, delivering state-of-the-art quality with end-to-end inference in as low as under a second.

Memory 13GBSetting 4B$0.003/IMAGE

Ready to Accelerate AI in Your Organization?

Contact our sales team to discuss your enterprise needs and deployment options.

Get Started