End-to-end platform for developing your AI applications

No matter where you start, build and scale your AI with ByteCompute.

Explore AI Models Directory

All categories and models you can try out and seamlessly integrate in your projects

openai/whisper-large-v3-turboStar Featured

openai/whisper-large-v3-turbo

A fine-tuned version of the Whisper large-v3 model designed for near real-time Automatic Speech Recognition (ASR) and speech translation. By reducing the number of decoder layers while maintaining the robust encoder architecture, this "Turbo" variant offers a significant speedup (up to 8x faster) with minimal degradation in Word Error Rate (WER). Ideal for low-latency production

Memory 2 GBSetting 809 MTime
Qwen3.5-122B-A10B-FP8

Qwen3.5-122B-A10B-FP8

The latest flagship MoE (Mixture-of-Experts) model from the Qwen team. With a total of 122B parameters and 10B active parameters per token, it strikes an elite balance between reasoning throughput and model capacity. This build utilizes FP8 quantization, significantly reducing VRAM requirements and leveraging hardware acceleration on modern GPUs (H100/L40S) for high-performance inference in complex logic and coding tasks.

Memory 122GBSetting 122BTime
MiniMax-M2.5-NVFP4

MiniMax-M2.5-NVFP4

A state-of-the-art Large Language Model optimized for high-concurrency deployment. This version features the cutting-edge NVFP4 (NVIDIA FP4) quantization, specifically engineered for Blackwell and late-generation Hopper architectures. It delivers maximum token-per-second throughput while maintaining 230B-class intelligence, excelling in multi-turn dialogue consistency and complex instruction following.

Memory 134GBSetting 230BTime
Qwen/Qwen3-32B-FP8Star Featured

Qwen3-32B-FP8

A high-performance, dense Transformer model from the Qwen3 series featuring 32 billion parameters. This version is optimized with FP8 quantization, allowing it to fit within a ~32GB VRAM footprint while maintaining near-lossless perplexity. It serves as an ideal "workhorse" model for developers needing a balance between high-level reasoning and fast inference speeds for enterprise-grade chat and logic applications.

Memory 10GBSetting 32BTime
Lightricks/LTX-2Star Featured

LTX-2

A state-of-the-art Diffusion Transformer (DiT) foundation model with 22 billion parameters. Unlike traditional video models, LTX-2 is natively designed for synchronized audio-video generation within a single unified latent space. It excels at maintaining temporal consistency and high-fidelity motion, making it a powerful backend for creative AI pipelines that require seamless audiovisual coherence.

Memory 22GBSetting 22BTime
Qwen3.5-27B-FP8Star Featured

Qwen3.5-27B-FP8

A lean, high-intelligence dense model optimized for complex instruction following and precise API calling. With 27B parameters compressed via FP8 quantization, it offers a superior logic-to-memory ratio, enabling elite-level performance on consumer-grade or mid-tier enterprise GPUs.

Memory 16GSetting 27BTime
Qwen3-VL-235B-A22B-Instruct-AWQ

Qwen3-VL-235B-A22B-Instruct-AWQ

A massive-scale Vision-Language (VL) MoE model designed for complex multimodal instruction following. Featuring a total of 235B parameters with 22B active parameters per token, it delivers top-tier performance in image understanding, document parsing, and visual reasoning. Quantized via AWQ (Activation-aware Weight Quantization), it is optimized for 4-bit weight compression to enable large-scale multimodal deployment with high throughput.

Memory 125GBSetting 235BTime

Ready to Accelerate AI in Your Organization?

Contact our sales team to discuss your enterprise needs and deployment options.

Get Started