Models (41)

Featured Models

Smart Router - automatically selects the best model for each request using a 15-dimension classifier.

Smart Routervaries context

autodynamic pricing

DOSNew

LLM

DOS AI (Qwen 3.5 35B-A3B)

Self-hosted MoE model with 35B total / 3B active params. Our default model - best balance of quality, speed, and cost.

35B MoE (3B active)128K context

$0.15per 1M tokens

MetaComing Soon

LLM

Llama 4 Maverick

Meta's most capable open model — 400B MoE with 17B active parameters and multimodal support.

17B-128E MoE1M context

$0.17 / $0.66per 1M tokens

MetaNew

LLM

Llama 4 Scout

Efficient Llama 4 variant — 109B MoE with 17B active. Industry-leading 10M context window.

17B-16E MoE640K context

$0.11 / $0.38per 1M tokens

QwenComing Soon

LLM

QwQ 32B

Reasoning model rivaling DeepSeek R1 — excels at math, logic, and complex problem-solving.

32B128K context

$0.40per 1M tokens

DeepSeekComing Soon

LLM

DeepSeek V3

State-of-the-art MoE model with exceptional reasoning capabilities.

671B MoE128K context

$0.25per 1M tokens

All Models

DOSNew

LLM

dos-auto

Smart Router - automatically selects the best model for each request using a 15-dimension classifier.

Smart Routervaries context

autodynamic pricing

DOSNew

LLM

DOS AI (Qwen 3.5 35B-A3B)

Self-hosted MoE model with 35B total / 3B active params. Our default model - best balance of quality, speed, and cost.

35B MoE (3B active)128K context

$0.15per 1M tokens

MetaComing Soon

LLM

Llama 4 Maverick

Meta's most capable open model — 400B MoE with 17B active parameters and multimodal support.

17B-128E MoE1M context

$0.17 / $0.66per 1M tokens

MetaNew

LLM

Llama 4 Scout

Efficient Llama 4 variant — 109B MoE with 17B active. Industry-leading 10M context window.

17B-16E MoE640K context

$0.11 / $0.38per 1M tokens

GoogleComing Soon

LLM

Gemma 3 27B

Google's latest — strong multilingual, multimodal, and reasoning with 128K context.

27B128K context

$0.30per 1M tokens

GoogleComing Soon

LLM

Gemma 3 12B

Efficient mid-size model with vision capabilities and strong benchmark scores.

12B128K context

$0.12per 1M tokens

QwenComing Soon

LLM

QwQ 32B

Reasoning model rivaling DeepSeek R1 — excels at math, logic, and complex problem-solving.

32B128K context

$0.40per 1M tokens

Mistral AIComing Soon

LLM

Mistral Small 3.1

Latest efficient Mistral with vision support and 128K context. Great speed/quality tradeoff.

24B128K context

$0.20 / $0.60per 1M tokens

DeepSeekComing Soon

LLM

DeepSeek V3

State-of-the-art MoE model with exceptional reasoning capabilities.

671B MoE128K context

$0.25per 1M tokens

DeepSeekComing Soon

LLM

DeepSeek R1

Reasoning-focused model trained with reinforcement learning for complex tasks.

671B MoE64K context

$3.00 / $7.00per 1M tokens

DeepSeekComing Soon

LLM

DeepSeek R1 Distill 70B

Llama-based distillation of R1 reasoning — 90% of R1 quality at a fraction of the cost.

70B128K context

$0.88per 1M tokens

DeepSeekComing Soon

LLM

DeepSeek R1 Distill 32B

Qwen-based R1 distillation — strong reasoning in a compact, efficient package.

32B128K context

$0.40per 1M tokens

MetaComing Soon

LLM

Llama 3.3 70B

High-performance multilingual LLM optimized for dialogue and instruction following.

70B128K context

$0.20per 1M tokens

MetaComing Soon

LLM

Llama 3.1 405B

The largest Llama 3 model for complex reasoning and generation tasks.

405B128K context

$3.50per 1M tokens

MetaComing Soon

LLM

Llama 3.1 70B

Balanced performance and efficiency for production workloads.

70B128K context

$0.88per 1M tokens

MetaComing Soon

LLM

Llama 3.1 8B

Fast and cost-effective model for simpler tasks and high-volume applications.

8B128K context

$0.05per 1M tokens

Mistral AIComing Soon

LLM

Mistral Large 2

Flagship model with strong multilingual and coding capabilities.

123B128K context

$2.00 / $6.00per 1M tokens

Mistral AIComing Soon

LLM

Mixtral 8x22B

Sparse mixture-of-experts model balancing capability and efficiency.

141B MoE64K context

$0.90per 1M tokens

QwenComing Soon

LLM

Qwen 2.5 72B

Strong multilingual model with excellent Chinese and English performance.

72B128K context

$0.90per 1M tokens

QwenComing Soon

LLM

Qwen 2.5 32B

Mid-size model with great balance of speed and capability.

32B128K context

$0.40per 1M tokens

GoogleComing Soon

LLM

Gemma 2 27B

Efficient model from Google with strong performance on diverse tasks.

27B8K context

$0.30per 1M tokens

GoogleComing Soon

LLM

Gemma 2 9B

Lightweight model ideal for on-device and edge deployments.

9B8K context

$0.10per 1M tokens

QwenComing Soon

Code

Qwen 2.5 Coder 32B

Top-performing open code model — rivals GPT-4 on coding benchmarks.

32B128K context

$0.40per 1M tokens

QwenComing Soon

Code

Qwen 2.5 Coder 7B

Fast and efficient code model for completions and generation.

7B128K context

$0.10per 1M tokens

MetaComing Soon

Code

Code Llama 70B

Specialized for code generation, completion, and understanding.

70B16K context

$0.88per 1M tokens

MetaComing Soon

Code

Code Llama 34B

Fast code generation with support for many programming languages.

34B16K context

$0.40per 1M tokens

DeepSeekComing Soon

Code

DeepSeek Coder 33B

Top-performing code model trained on 2T tokens of code.

33B16K context

$0.40per 1M tokens

QwenComing Soon

Vision

Qwen 2.5-VL 72B

Latest vision-language model with video understanding, OCR, and agentic capabilities.

72B128K context

$1.00per 1M tokens

MetaComing Soon

Vision

Llama 4 Scout Vision

Multimodal Llama 4 variant with native image understanding and 10M context.

109B MoE10M context

$0.50 / $1.50per 1M tokens

MetaComing Soon

Vision

Llama 3.2 90B Vision

Multimodal model for image understanding and visual reasoning.

90B128K context

$1.20per 1M tokens

MetaComing Soon

Vision

Llama 3.2 11B Vision

Efficient vision-language model for image analysis tasks.

11B128K context

$0.18per 1M tokens

GoogleComing Soon

Vision

Gemma 3 27B Vision

Google's multimodal model with image, video, and document understanding.

27B128K context

$0.30per 1M tokens

Stability AIComing Soon

Image

FLUX.1 Pro

State-of-the-art image generation with exceptional quality and prompt adherence.

12B

$0.05image

Stability AIComing Soon

Image

FLUX.1 Dev

High-quality image generation for development and testing.

12B

$0.03image

Stability AIComing Soon

Image

FLUX.1 Schnell

Fast image generation optimized for speed.

12B

$0.003image

Stability AIComing Soon

Image

Stable Diffusion XL

Versatile image generation with fine-tuning support.

6.6B

$0.002image

Stability AIComing Soon

Embedding

UAE Large V1

Universal embedding model with strong performance across benchmarks.

335M512 context

$0.021M tokens

Stability AIComing Soon

Embedding

BGE Large EN

High-quality English embeddings for RAG and semantic search.

335M512 context

$0.021M tokens

Stability AIComing Soon

Embedding

BGE Base EN

Fast and efficient embeddings for production use.

109M512 context

$0.0081M tokens

Mistral AIComing Soon

Embedding

E5 Mistral 7B

Large embedding model with 4096 dimensions for high-fidelity retrieval.

7B4K context

$0.021M tokens

OpenAIComing Soon

Audio

Whisper Large V3

Industry-leading speech-to-text with multilingual support.

1.5B30s context

$0.006minute

Ready to get started?

Start building with $10 in free credits. No credit card required.

Start building for free Read the docs

All Models

DOSNew

LLM

dos-auto

Smart Router - automatically selects the best model for each request using a 15-dimension classifier.

Smart Routervaries context

autodynamic pricing

DOSNew

LLM

DOS AI (Qwen 3.5 35B-A3B)

Self-hosted MoE model with 35B total / 3B active params. Our default model - best balance of quality, speed, and cost.

35B MoE (3B active)128K context

$0.15per 1M tokens

MetaComing Soon

LLM

Llama 4 Maverick

Meta's most capable open model — 400B MoE with 17B active parameters and multimodal support.

17B-128E MoE1M context

$0.17 / $0.66per 1M tokens

MetaNew

LLM

Llama 4 Scout

Efficient Llama 4 variant — 109B MoE with 17B active. Industry-leading 10M context window.

17B-16E MoE640K context

$0.11 / $0.38per 1M tokens

GoogleComing Soon

LLM

Gemma 3 27B

Google's latest — strong multilingual, multimodal, and reasoning with 128K context.

27B128K context

$0.30per 1M tokens

GoogleComing Soon

LLM

Gemma 3 12B

Efficient mid-size model with vision capabilities and strong benchmark scores.

12B128K context

$0.12per 1M tokens

QwenComing Soon

LLM

QwQ 32B

Reasoning model rivaling DeepSeek R1 — excels at math, logic, and complex problem-solving.

32B128K context

$0.40per 1M tokens

Mistral AIComing Soon

LLM

Mistral Small 3.1

Latest efficient Mistral with vision support and 128K context. Great speed/quality tradeoff.

24B128K context

$0.20 / $0.60per 1M tokens

DeepSeekComing Soon

LLM

DeepSeek V3

State-of-the-art MoE model with exceptional reasoning capabilities.

671B MoE128K context

$0.25per 1M tokens

DeepSeekComing Soon

LLM

DeepSeek R1

Reasoning-focused model trained with reinforcement learning for complex tasks.

671B MoE64K context

$3.00 / $7.00per 1M tokens

DeepSeekComing Soon

LLM

DeepSeek R1 Distill 70B

Llama-based distillation of R1 reasoning — 90% of R1 quality at a fraction of the cost.

70B128K context

$0.88per 1M tokens

DeepSeekComing Soon

LLM

DeepSeek R1 Distill 32B

Qwen-based R1 distillation — strong reasoning in a compact, efficient package.

32B128K context

$0.40per 1M tokens

MetaComing Soon

LLM

Llama 3.3 70B

High-performance multilingual LLM optimized for dialogue and instruction following.

70B128K context

$0.20per 1M tokens

MetaComing Soon

LLM

Llama 3.1 405B

The largest Llama 3 model for complex reasoning and generation tasks.

405B128K context

$3.50per 1M tokens

MetaComing Soon

LLM

Llama 3.1 70B

Balanced performance and efficiency for production workloads.

70B128K context

$0.88per 1M tokens

MetaComing Soon

LLM

Llama 3.1 8B

Fast and cost-effective model for simpler tasks and high-volume applications.

8B128K context

$0.05per 1M tokens

Mistral AIComing Soon

LLM

Mistral Large 2

Flagship model with strong multilingual and coding capabilities.

123B128K context

$2.00 / $6.00per 1M tokens

Mistral AIComing Soon

LLM

Mixtral 8x22B

Sparse mixture-of-experts model balancing capability and efficiency.

141B MoE64K context

$0.90per 1M tokens

QwenComing Soon

LLM

Qwen 2.5 72B

Strong multilingual model with excellent Chinese and English performance.

72B128K context

$0.90per 1M tokens

QwenComing Soon

LLM

Qwen 2.5 32B

Mid-size model with great balance of speed and capability.

32B128K context

$0.40per 1M tokens

GoogleComing Soon

LLM

Gemma 2 27B

Efficient model from Google with strong performance on diverse tasks.

27B8K context

$0.30per 1M tokens

GoogleComing Soon

LLM

Gemma 2 9B

Lightweight model ideal for on-device and edge deployments.

9B8K context

$0.10per 1M tokens

QwenComing Soon

Code

Qwen 2.5 Coder 32B

Top-performing open code model — rivals GPT-4 on coding benchmarks.

32B128K context

$0.40per 1M tokens

QwenComing Soon

Code

Qwen 2.5 Coder 7B

Fast and efficient code model for completions and generation.

7B128K context

$0.10per 1M tokens

MetaComing Soon

Code

Code Llama 70B

Specialized for code generation, completion, and understanding.

70B16K context

$0.88per 1M tokens

MetaComing Soon

Code

Code Llama 34B

Fast code generation with support for many programming languages.

34B16K context

$0.40per 1M tokens

DeepSeekComing Soon

Code

DeepSeek Coder 33B

Top-performing code model trained on 2T tokens of code.

33B16K context

$0.40per 1M tokens

QwenComing Soon

Vision

Qwen 2.5-VL 72B

Latest vision-language model with video understanding, OCR, and agentic capabilities.

72B128K context

$1.00per 1M tokens

MetaComing Soon

Vision

Llama 4 Scout Vision

Multimodal Llama 4 variant with native image understanding and 10M context.

109B MoE10M context

$0.50 / $1.50per 1M tokens

MetaComing Soon

Vision

Llama 3.2 90B Vision

Multimodal model for image understanding and visual reasoning.

90B128K context

$1.20per 1M tokens

MetaComing Soon

Vision

Llama 3.2 11B Vision

Efficient vision-language model for image analysis tasks.

11B128K context

$0.18per 1M tokens

GoogleComing Soon

Vision

Gemma 3 27B Vision

Google's multimodal model with image, video, and document understanding.

27B128K context

$0.30per 1M tokens

Stability AIComing Soon

Image

FLUX.1 Pro

State-of-the-art image generation with exceptional quality and prompt adherence.

12B

$0.05image

Stability AIComing Soon

Image

FLUX.1 Dev

High-quality image generation for development and testing.

12B

$0.03image

Stability AIComing Soon

Image

FLUX.1 Schnell

Fast image generation optimized for speed.

12B

$0.003image

Stability AIComing Soon

Image

Stable Diffusion XL

Versatile image generation with fine-tuning support.

6.6B

$0.002image

Stability AIComing Soon

Embedding

UAE Large V1

Universal embedding model with strong performance across benchmarks.

335M512 context

$0.021M tokens

Stability AIComing Soon

Embedding

BGE Large EN

High-quality English embeddings for RAG and semantic search.

335M512 context

$0.021M tokens

Stability AIComing Soon

Embedding

BGE Base EN

Fast and efficient embeddings for production use.

109M512 context

$0.0081M tokens

Mistral AIComing Soon

Embedding

E5 Mistral 7B

Large embedding model with 4096 dimensions for high-fidelity retrieval.

7B4K context

$0.021M tokens

OpenAIComing Soon

Audio

Whisper Large V3

Industry-leading speech-to-text with multilingual support.

1.5B30s context

$0.006minute