DOSDOS
Pricing
Get started

Providers

Models (41)

Featured Models

DOSNew
LLM

dos-auto

Smart Router - automatically selects the best model for each request using a 15-dimension classifier.

Smart Routervaries context
autodynamic pricing
DOSNew
LLM

DOS AI (Qwen 3.5 35B-A3B)

Self-hosted MoE model with 35B total / 3B active params. Our default model - best balance of quality, speed, and cost.

35B MoE (3B active)128K context
$0.15per 1M tokens
MetaComing Soon
LLM

Llama 4 Maverick

Meta's most capable open model — 400B MoE with 17B active parameters and multimodal support.

17B-128E MoE1M context
$0.17 / $0.66per 1M tokens
MetaNew
LLM

Llama 4 Scout

Efficient Llama 4 variant — 109B MoE with 17B active. Industry-leading 10M context window.

17B-16E MoE640K context
$0.11 / $0.38per 1M tokens
QwenComing Soon
LLM

QwQ 32B

Reasoning model rivaling DeepSeek R1 — excels at math, logic, and complex problem-solving.

32B128K context
$0.40per 1M tokens
DeepSeekComing Soon
LLM

DeepSeek V3

State-of-the-art MoE model with exceptional reasoning capabilities.

671B MoE128K context
$0.25per 1M tokens

All Models

DOSNew
LLM

dos-auto

Smart Router - automatically selects the best model for each request using a 15-dimension classifier.

Smart Routervaries context
autodynamic pricing
DOSNew
LLM

DOS AI (Qwen 3.5 35B-A3B)

Self-hosted MoE model with 35B total / 3B active params. Our default model - best balance of quality, speed, and cost.

35B MoE (3B active)128K context
$0.15per 1M tokens
MetaComing Soon
LLM

Llama 4 Maverick

Meta's most capable open model — 400B MoE with 17B active parameters and multimodal support.

17B-128E MoE1M context
$0.17 / $0.66per 1M tokens
MetaNew
LLM

Llama 4 Scout

Efficient Llama 4 variant — 109B MoE with 17B active. Industry-leading 10M context window.

17B-16E MoE640K context
$0.11 / $0.38per 1M tokens
GoogleComing Soon
LLM

Gemma 3 27B

Google's latest — strong multilingual, multimodal, and reasoning with 128K context.

27B128K context
$0.30per 1M tokens
GoogleComing Soon
LLM

Gemma 3 12B

Efficient mid-size model with vision capabilities and strong benchmark scores.

12B128K context
$0.12per 1M tokens
QwenComing Soon
LLM

QwQ 32B

Reasoning model rivaling DeepSeek R1 — excels at math, logic, and complex problem-solving.

32B128K context
$0.40per 1M tokens
Mistral AIComing Soon
LLM

Mistral Small 3.1

Latest efficient Mistral with vision support and 128K context. Great speed/quality tradeoff.

24B128K context
$0.20 / $0.60per 1M tokens
DeepSeekComing Soon
LLM

DeepSeek V3

State-of-the-art MoE model with exceptional reasoning capabilities.

671B MoE128K context
$0.25per 1M tokens
DeepSeekComing Soon
LLM

DeepSeek R1

Reasoning-focused model trained with reinforcement learning for complex tasks.

671B MoE64K context
$3.00 / $7.00per 1M tokens
DeepSeekComing Soon
LLM

DeepSeek R1 Distill 70B

Llama-based distillation of R1 reasoning — 90% of R1 quality at a fraction of the cost.

70B128K context
$0.88per 1M tokens
DeepSeekComing Soon
LLM

DeepSeek R1 Distill 32B

Qwen-based R1 distillation — strong reasoning in a compact, efficient package.

32B128K context
$0.40per 1M tokens
MetaComing Soon
LLM

Llama 3.3 70B

High-performance multilingual LLM optimized for dialogue and instruction following.

70B128K context
$0.20per 1M tokens
MetaComing Soon
LLM

Llama 3.1 405B

The largest Llama 3 model for complex reasoning and generation tasks.

405B128K context
$3.50per 1M tokens
MetaComing Soon
LLM

Llama 3.1 70B

Balanced performance and efficiency for production workloads.

70B128K context
$0.88per 1M tokens
MetaComing Soon
LLM

Llama 3.1 8B

Fast and cost-effective model for simpler tasks and high-volume applications.

8B128K context
$0.05per 1M tokens
Mistral AIComing Soon
LLM

Mistral Large 2

Flagship model with strong multilingual and coding capabilities.

123B128K context
$2.00 / $6.00per 1M tokens
Mistral AIComing Soon
LLM

Mixtral 8x22B

Sparse mixture-of-experts model balancing capability and efficiency.

141B MoE64K context
$0.90per 1M tokens
QwenComing Soon
LLM

Qwen 2.5 72B

Strong multilingual model with excellent Chinese and English performance.

72B128K context
$0.90per 1M tokens
QwenComing Soon
LLM

Qwen 2.5 32B

Mid-size model with great balance of speed and capability.

32B128K context
$0.40per 1M tokens
GoogleComing Soon
LLM

Gemma 2 27B

Efficient model from Google with strong performance on diverse tasks.

27B8K context
$0.30per 1M tokens
GoogleComing Soon
LLM

Gemma 2 9B

Lightweight model ideal for on-device and edge deployments.

9B8K context
$0.10per 1M tokens
QwenComing Soon
Code

Qwen 2.5 Coder 32B

Top-performing open code model — rivals GPT-4 on coding benchmarks.

32B128K context
$0.40per 1M tokens
QwenComing Soon
Code

Qwen 2.5 Coder 7B

Fast and efficient code model for completions and generation.

7B128K context
$0.10per 1M tokens
MetaComing Soon
Code

Code Llama 70B

Specialized for code generation, completion, and understanding.

70B16K context
$0.88per 1M tokens
MetaComing Soon
Code

Code Llama 34B

Fast code generation with support for many programming languages.

34B16K context
$0.40per 1M tokens
DeepSeekComing Soon
Code

DeepSeek Coder 33B

Top-performing code model trained on 2T tokens of code.

33B16K context
$0.40per 1M tokens
QwenComing Soon
Vision

Qwen 2.5-VL 72B

Latest vision-language model with video understanding, OCR, and agentic capabilities.

72B128K context
$1.00per 1M tokens
MetaComing Soon
Vision

Llama 4 Scout Vision

Multimodal Llama 4 variant with native image understanding and 10M context.

109B MoE10M context
$0.50 / $1.50per 1M tokens
MetaComing Soon
Vision

Llama 3.2 90B Vision

Multimodal model for image understanding and visual reasoning.

90B128K context
$1.20per 1M tokens
MetaComing Soon
Vision

Llama 3.2 11B Vision

Efficient vision-language model for image analysis tasks.

11B128K context
$0.18per 1M tokens
GoogleComing Soon
Vision

Gemma 3 27B Vision

Google's multimodal model with image, video, and document understanding.

27B128K context
$0.30per 1M tokens
Stability AIComing Soon
Image

FLUX.1 Pro

State-of-the-art image generation with exceptional quality and prompt adherence.

12B
$0.05image
Stability AIComing Soon
Image

FLUX.1 Dev

High-quality image generation for development and testing.

12B
$0.03image
Stability AIComing Soon
Image

FLUX.1 Schnell

Fast image generation optimized for speed.

12B
$0.003image
Stability AIComing Soon
Image

Stable Diffusion XL

Versatile image generation with fine-tuning support.

6.6B
$0.002image
Stability AIComing Soon
Embedding

UAE Large V1

Universal embedding model with strong performance across benchmarks.

335M512 context
$0.021M tokens
Stability AIComing Soon
Embedding

BGE Large EN

High-quality English embeddings for RAG and semantic search.

335M512 context
$0.021M tokens
Stability AIComing Soon
Embedding

BGE Base EN

Fast and efficient embeddings for production use.

109M512 context
$0.0081M tokens
Mistral AIComing Soon
Embedding

E5 Mistral 7B

Large embedding model with 4096 dimensions for high-fidelity retrieval.

7B4K context
$0.021M tokens
OpenAIComing Soon
Audio

Whisper Large V3

Industry-leading speech-to-text with multilingual support.

1.5B30s context
$0.006minute

Ready to get started?

Start building with $10 in free credits. No credit card required.

Start building for freeRead the docs
DOSDOS

AI infrastructure for everyone. Inference, agents, and safety — all in one platform.

Product

  • Models
  • Pricing
  • API Inference
  • DOSClaw
  • GPU Cloud

Developers

  • Documentation
  • API Reference
  • Status

DOS Ecosystem

  • DOSafe
  • DOS.Me
  • DOScan
  • DOSwap
  • MetaDOS

Company

  • About
  • Contact
  • Careers
  • Privacy
  • Terms

© 2026 All rights reserved.