Everything you need to run AI workloads.
Inference APIs, serverless execution, GPU VMs, and managed clusters - designed to work individually or together.
Serverless Inference
Coming SoonPay-per-token API access to open-source models
from lyceum import Client
client = Client("your-api-key")
response = client.chat.completions.create(
model="meta-llama/Llama-3-70B",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content) Dedicated Endpoints
Reserved GPU capacity for production models
Serverless Training
Run code on GPUs without managing infrastructure
GPU Virtual Machines
Full root access, ready in seconds
Large-Scale Clusters
8 to 8,000 GPUs with InfiniBand
Built for teams that ship.
We handle the infrastructure complexity so you can focus on building. No capacity planning, no vendor lock-in, no surprises on your bill.
Per-second billing
Pay only for compute you actually use. Jobs that finish early don't cost you for time you didn't need.
Docker-native
Any Docker container runs on Lyceum with no modifications. No proprietary SDKs, no vendor lock-in.
EU data centres
GPUs hosted in European data centres. Full GDPR compliance, data residency in the EU.
Instant availability
GPUs provision in seconds, not hours. No capacity planning, no procurement queues.
GDPR Compliant · EU Data Residency Transparent pricing.
Pay for what you use. No hidden fees.
Per-token pricing. No minimum spend.
View all modelsPer-second billing. No minimum commitment.
Long-term contracts from 3 months. We'll get back to you within 24 hours.
From Our Magazine
Insights on GPU infrastructure, cost optimization, and AI deployment.
Sovereign AI: Navigating EU Data Residency
How to build AI infrastructure that meets European data sovereignty requirements.
MigrationMigrating from AWS to Dedicated GPUs
A practical guide to moving your ML workloads from hyperscalers to dedicated GPU infrastructure.
Cost OptimizationStopping the Bleed: The $15B GPU Overprovisioning Crisis
Why most teams are paying for GPUs they don't need, and how to fix it.
Cost OptimizationHow to Right Size GPU Instances for ML Workloads
Match your GPU resources to actual workload requirements and stop overspending.
GPU MemoryEliminating CUDA OOM: Expert Memory Management for LLMs
Practical techniques to prevent out-of-memory errors when training large language models.
GPU MemoryGPU Utilization Too Low: How to Fix Compute Bottlenecks
Diagnose and fix the common causes of underutilized GPU resources.
Ready to ship faster?
Request access to run your first GPU job in under a minute. No credit card required.