Pricing
Simple, transparent pricing.
Pay for what you use. Per-second billing on compute, per-token on inference. No hidden fees.
train-llama-ft
H100 Runtime
00:04:23
Rate $2.49/hr
Per second $0.00069
Seconds used 263
Current cost
$0.18
Billed exactly for time used
// Inference
Run models via API.
Pay per token. No GPU management.
| Model | Input / 1M | Output / 1M |
|---|---|---|
| Llama 3.1 8B | $0.10 | $0.10 |
| Llama 3.1 70B | $0.35 | $0.40 |
| Llama 3.1 405B | $1.00 | $1.00 |
| Mixtral 8x22B | $0.50 | $0.50 |
| Qwen 2.5 72B | $0.40 | $0.45 |
More models in the dashboard.
Reserved GPU capacity. Consistent latency.
GPUs:
| GPU | VRAM | Price |
|---|---|---|
| T4 | 16 GB | $0.39/hr |
| L40S | 48 GB | $1.49/hr |
| A100 80GB | 80 GB | $1.99/hr |
| H100 | 80 GB | $3.29/hr |
| H200 | 141 GB | $3.69/hr |
| B200 | 192 GB | $5.89/hr |
Deploy any Hugging Face model.
// Training & Compute
Run code on GPUs.
Run Python or Docker. Per-second billing.
| GPU | VRAM | Price |
|---|---|---|
| T4 | 16 GB | $0.39/hr |
| L40S | 48 GB | $1.49/hr |
| A100 80GB | 80 GB | $1.99/hr |
| H100 | 80 GB | $3.29/hr |
| H200 | 141 GB | $3.69/hr |
| B200 | 192 GB | $5.89/hr |
Billed per second. No minimum.
Full root access. SSH in seconds.
GPUs:
| GPU | VRAM | Price |
|---|---|---|
| T4 | 16 GB | $0.39/hr |
| L40S | 48 GB | $1.05/hr |
| A100 | 80 GB | $1.39/hr |
| H100 | 80 GB | $2.49/hr |
| H200 | 141 GB | $3.19/hr |
| B200 | 192 GB | $4.29/hr |
Storage: $0.10/GB/month.
8 to 8,000 GPUs with InfiniBand. Custom pricing.
| GPU | VRAM | Price |
|---|---|---|
| NVIDIA GB300 | 288 GB | |
| NVIDIA B300 | 288 GB | |
| NVIDIA GB200 | 192 GB | |
| NVIDIA B200 | 192 GB | |
| NVIDIA H200 | 141 GB | |
| NVIDIA H100 | 80 GB |
Minimum 8 GPUs. InfiniBand included.
Start building today.
$20 free credits on approval. No credit card required.