Inference API for Open-Source Models

Access the latest Llama, Qwen, DeepSeek, and Mistral models through a simple, OpenAI-compatible API. No infrastructure to manage.

Get API Key API Reference

Simple Integration

Works with any OpenAI SDK. Just change the base URL and API key.

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'dos_sk_your_api_key',
  baseURL: 'https://api.dos.ai/v1',
});

const response = await client.chat.completions.create({
  model: 'Qwen3.5-35B-A3B',
  messages: [
    { role: 'user', content: 'Explain quantum computing' }
  ],
  max_tokens: 1000,
});

console.log(response.choices[0].message.content);

Built for Production

Enterprise-grade infrastructure with all the features you need.

OpenAI-Compatible API

Drop-in replacement for OpenAI API. Use your existing code with just a URL change.

Low Latency

Optimized inference infrastructure with global edge caching for sub-100ms response times.

Auto-Scaling

Automatically scales to handle traffic spikes. Pay only for what you use.

Usage Analytics

Real-time dashboards to monitor token usage, costs, and API performance.

Streaming Responses

Server-sent events for real-time streaming. Build responsive chat interfaces.

Function Calling

Built-in tool use support for building AI agents and automated workflows.

Available Models

Access the latest open-source models from leading AI labs.

Model	Provider	Type	Context	Pricing
Qwen3-VL-30B-A3B-Instruct	Alibaba	Vision-Language	128K	$0.15 / 1M tokens
Llama 3.3 70B Instruct	Meta	Text	128K	$0.20 / 1M tokens
Llama 3.1 8B Instruct	Meta	Text	128K	$0.05 / 1M tokens
DeepSeek V3	DeepSeek	Text	128K	$0.25 / 1M tokens
Qwen 2.5 72B Instruct	Alibaba	Text	128K	$0.18 / 1M tokens
Mixtral 8x7B Instruct	Mistral AI	Text	32K	$0.10 / 1M tokens

More models coming soon. View all models

What Can You Build?

From chatbots to AI agents, power any application with our API.

Chatbots & Assistants

Build conversational AI with context awareness and multi-turn dialogue capabilities.

Content Generation

Generate articles, marketing copy, product descriptions, and creative content at scale.

Code Assistance

Power code completion, review, and generation features in your developer tools.

Data Extraction

Extract structured data from documents, emails, and unstructured text sources.

Pay-As-You-Go Pricing

No monthly fees. Pay only for the tokens you use.

Simple Pricing

$0.05 - $0.25

per 1M tokens (varies by model)

No minimum commitment
Pay only for tokens used
Usage dashboard included
Rate limits scale with usage
$5 free credits to start

Get Started Free

Start Building Today

Get your API key and make your first request in under a minute.

Get API Key Read the Docs

import OpenAI from 'openai'; const client = new OpenAI({ apiKey: 'dos_sk_your_api_key', baseURL: 'https://api.dos.ai/v1', }); const response = await client.chat.completions.create({ model: 'Qwen3.5-35B-A3B', messages: [ { role: 'user', content: 'Explain quantum computing' } ], max_tokens: 1000, }); console.log(response.choices[0].message.content);

Model

Provider

Type

Context

Pricing

Qwen3-VL-30B-A3B-Instruct

Alibaba

Vision-Language

128K

$0.15 / 1M tokens

Llama 3.3 70B Instruct