Pricing

Simple, transparent pricing.

Pay for what you use. Per-second billing on compute, per-token on inference. No hidden fees.

train-llama-ft
H100
Runtime
00:04:23
Rate $2.49/hr
Per second $0.00069
Seconds used 263
Current cost
$0.18
Billed exactly for time used
// Inference

Run models via API.

Pay per token. No GPU management.

Model Input / 1M Output / 1M
Llama 3.1 8B $0.10 $0.10
Llama 3.1 70B $0.35 $0.40
Llama 3.1 405B $1.00 $1.00
Mixtral 8x22B $0.50 $0.50
Qwen 2.5 72B $0.40 $0.45

More models in the dashboard.

Reserved GPU capacity. Consistent latency.

GPUs:
GPU VRAM Price
T4 16 GB $0.39/hr
L40S 48 GB $1.49/hr
A100 80GB 80 GB $1.99/hr
H100 80 GB $3.29/hr
H200 141 GB $3.69/hr
B200 192 GB $5.89/hr

Deploy any Hugging Face model.

// Training & Compute

Run code on GPUs.

Run Python or Docker. Per-second billing.

GPU VRAM Price
T4 16 GB $0.39/hr
L40S 48 GB $1.49/hr
A100 80GB 80 GB $1.99/hr
H100 80 GB $3.29/hr
H200 141 GB $3.69/hr
B200 192 GB $5.89/hr

Billed per second. No minimum.

Full root access. SSH in seconds.

GPUs:
GPU VRAM Price
T4 16 GB $0.39/hr
L40S 48 GB $1.05/hr
A100 80 GB $1.39/hr
H100 80 GB $2.49/hr
H200 141 GB $3.19/hr
B200 192 GB $4.29/hr

Storage: $0.10/GB/month.

8 to 8,000 GPUs with InfiniBand. Custom pricing.

GPU VRAM Price
NVIDIA GB300 288 GB
NVIDIA B300 288 GB
NVIDIA GB200 192 GB
NVIDIA B200 192 GB
NVIDIA H200 141 GB
NVIDIA H100 80 GB

Minimum 8 GPUs. InfiniBand included.

Start building today.

$20 free credits on approval. No credit card required.