Supported Models

Access the latest open-source models through a single API. Text generation, code, multimodal, speech, and embeddings - all served on vLLM v0.17.1.

Showing 33 models

DeepSeek V4 Pro

DeepSeek

Featured

Advanced reasoning, coding, and long-horizon agent workflows

text code

Parameters

MoE

Context

Input

$1.75/1M

Output

$3.50/1M

GLM-5.2

ZAI

Featured

ZAI's latest flagship with strong bilingual reasoning, long-context understanding, and tool use

text code

Parameters

MoE

Context

Input

$1.50/1M

Cached

$0.38/1M

Output

$4.50/1M

Kimi K2.6

Moonshot

Featured

Native multimodal agentic model with long-context capabilities

text code

Parameters

1T MoE

Context

256K

Input

$0.95/1M

Output

$4.00/1M

Kimi K2.7 Code

Moonshot

Featured

Coding-focused agentic Kimi model with strong tool use and long-context reasoning

code text

Parameters

1T MoE

Context

256K

Input

$1.25/1M

Cached

$0.31/1M

Output

$4.50/1M

MiniMax M3

MiniMax

Featured

1M-token context with reasoning and tool use for large-document and agentic workloads

text code

Parameters

MoE

Context

Input

$0.40/1M

Cached

$0.10/1M

Output

$2.00/1M

Qwen3.5 9B

Qwen

Compact model for fast, low-cost reasoning and instruction following

text

Parameters

Context

256K

Input

$0.15/1M

Cached

$0.04/1M

Output

$0.20/1M

Qwen3 Coder 30B A3B

Qwen

Efficient coding MoE with tool use, optimized for fast, low-cost code generation

code

Parameters

30B A3B MoE

Context

256K

Input

$0.06/1M

Cached

$0.01/1M

Output

$0.25/1M

Qwen3.5 397B A17B

Qwen

Featured

Alibaba's largest Qwen3.5 MoE model for complex reasoning

text code

Parameters

397B MoE

Context

256K

Input

$0.60/1M

Output

$3.60/1M

gpt-oss 120B

OpenAI

Featured

OpenAI open-weight 120B model with transparent weights

text code

Parameters

120B MoE

Context

128K

Input

$0.15/1M

Output

$0.60/1M

Llama 3.3 70B

DeepSeek V3.2

DeepSeek

Strong coding and reasoning performance at low cost

text code

Parameters

671B MoE

Context

128K

Input

$0.30/1M

Output

$0.45/1M

GLM-5.1

ZAI

Multimodal flagship with advanced reasoning and tool use

text code

Parameters

MoE

Context

200K

Input

$1.40/1M

Output

$4.40/1M

GLM-5

ZAI

text code

Parameters

MoE

Context

128K

Input

$1.00/1M

Output

$3.20/1M

Kimi K2.5

Moonshot

Strong long-context and reasoning capabilities

text code

Parameters

1T MoE

Context

256K

Input

$0.50/1M

Output

$2.50/1M

Qwen3 235B A22B Instruct

Qwen

High-quality reasoning and instruction following

text code

Parameters

235B MoE

Context

256K

Input

$0.20/1M

Output

$0.60/1M

Qwen3 235B Thinking

Qwen

Thinking/reasoning variant of Qwen3 235B

text

Parameters

235B MoE

Context

256K

Input

$0.50/1M

Output

$2.00/1M

Qwen3 Next 80B A3B Thinking

Qwen

text

Parameters

80B MoE

Context

256K

Input

$0.15/1M

Output

$1.20/1M

Qwen3 32B

Qwen

Compact model balancing quality and speed

text code

Parameters

32B

Context

128K

Input

$0.10/1M

Output

$0.30/1M

Qwen3 30B A3B

Qwen

Efficient MoE model for instruction following

text

Parameters

30B MoE

Context

256K

Input

$0.10/1M

Output

$0.30/1M

MiniMax M2.5

MiniMax

text code

Parameters

MoE

Context

256K

Input

$0.30/1M

Output

$1.20/1M

Gemma 3 27B

Google

Google's Gemma 3 instruction-tuned model

text

Parameters

27B

Context

128K

Input

$0.10/1M

Output

$0.30/1M

Hermes 4 405B

NousResearch

Powerful instruction-following model with long-context capabilities

text

Parameters

405B

Context

128K

Input

$1.00/1M

Output

$3.00/1M

Hermes 4 70B

NousResearch

Highly capable model fine-tuned for multi-turn conversations

text

Parameters

70B

Context

128K

Input

$0.13/1M

Output

$0.40/1M

INTELLECT-3

PrimeIntellect

Third-generation model trained via decentralized compute

text

Parameters

MoE

Context

128K

Input

$0.20/1M

Output

$1.10/1M

Nemotron 3 Ultra 550B

NVIDIA

Massive MoE model for demanding reasoning and agentic workloads

text

Parameters

550B MoE

Context

128K

Input

$1.00/1M

Output

$3.00/1M

Llama 3.1 Nemotron Ultra 253B

NVIDIA

text

Parameters

253B

Context

128K

Input

$0.60/1M

Output

$1.80/1M

Nemotron 3 Super 120B A12B

NVIDIA

Hybrid MoE model optimized for efficient multi-agent AI

text

Parameters

120B MoE

Context

128K

Input

$0.30/1M

Output

$0.90/1M

Nemotron 3 Nano 30B A3B

NVIDIA

text

Parameters

30B MoE

Context

128K

Input

$0.06/1M

Output

$0.24/1M

Nemotron 3 Nano Omni

NVIDIA

Open, efficient omni-modal reasoning model for agentic AI

text multimodal

Parameters

Nano

Context

128K

Input

$0.06/1M

Output

$0.24/1M

Cosmos 3 Super Reasoner

NVIDIA

Super reasoning model for complex multi-step tasks

text

Parameters

Reasoning

Context

128K

Input

$0.10/1M

Output

$0.30/1M

Qwen2.5 VL 72B

Qwen

Vision-language model supporting text and images

multimodal text

Parameters

72B

Context

128K

Input

$0.25/1M

Output

$0.75/1M

MiniCPM-V 4.5

OpenBMB

Efficient vision-language model with strong multimodal capabilities

multimodal

Parameters

Context

128K

Input

$0.66/1M

Output

$1.11/1M

Qwen3 Embedding 8B

Qwen

High-precision dense retrieval with multilingual coverage (4,096 dims)

embedding

Parameters

Context

32K

Input

$0.01/1M

Need a different model?

We're constantly adding new models based on customer demand. Let us know which models you'd like to see, and we'll prioritize adding them to the platform.

View documentation

Supported Models

DeepSeek V4 Pro

GLM-5.2

Kimi K2.6

Kimi K2.7 Code

MiniMax M3

Qwen3.5 9B

Qwen3 Coder 30B A3B

Qwen3.5 397B A17B

gpt-oss 120B

Llama 3.3 70B

DeepSeek V3.2

GLM-5.1

GLM-5

Kimi K2.5

Qwen3 235B A22B Instruct

Qwen3 235B Thinking

Qwen3 Next 80B A3B Thinking

Qwen3 32B

Qwen3 30B A3B

MiniMax M2.5

Gemma 3 27B

Hermes 4 405B

Hermes 4 70B

INTELLECT-3

Nemotron 3 Ultra 550B

Llama 3.1 Nemotron Ultra 253B

Nemotron 3 Super 120B A12B

Nemotron 3 Nano 30B A3B

Nemotron 3 Nano Omni

Cosmos 3 Super Reasoner

Qwen2.5 VL 72B

MiniCPM-V 4.5

Qwen3 Embedding 8B

No models found

Need a different model?

Ready to get started?

Inference

Training

Supported Models

DeepSeek V4 Pro

GLM-5.2

Kimi K2.6

Kimi K2.7 Code

MiniMax M3

Qwen3.5 9B

Qwen3 Coder 30B A3B

Qwen3.5 397B A17B

gpt-oss 120B

Llama 3.3 70B

DeepSeek V3.2

GLM-5.1

GLM-5

Kimi K2.5

Qwen3 235B A22B Instruct

Qwen3 235B Thinking

Qwen3 Next 80B A3B Thinking

Qwen3 32B

Qwen3 30B A3B

MiniMax M2.5

Gemma 3 27B

Hermes 4 405B

Hermes 4 70B

INTELLECT-3

Nemotron 3 Ultra 550B

Llama 3.1 Nemotron Ultra 253B

Nemotron 3 Super 120B A12B

Nemotron 3 Nano 30B A3B

Nemotron 3 Nano Omni

Cosmos 3 Super Reasoner

Qwen2.5 VL 72B

MiniCPM-V 4.5

Qwen3 Embedding 8B

No models found

Need a different model?

Request a Model

Ready to get started?