Everything you need to run AI workloads.
Inference APIs, serverless execution, GPU VMs, and managed clusters - designed to work individually or together.
Serverless Inference
Coming SoonPay-per-token API access to open-source models
from lyceum import Client
client = Client("your-api-key")
response = client.chat.completions.create(
model="meta-llama/Llama-3-70B",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content) Dedicated Endpoints
Reserved GPU capacity for production models
Serverless Training
Run code on GPUs without managing infrastructure
GPU Virtual Machines
Full root access, ready in seconds
Large-Scale Clusters
8 to 8,000 GPUs with InfiniBand
Built for teams that ship.
We handle the infrastructure complexity so you can focus on building. No capacity planning, no vendor lock-in, no surprises on your bill.
Per-second billing
Pay only for compute you actually use. Jobs that finish early don't cost you for time you didn't need.
Docker-native
Any Docker container runs on Lyceum with no modifications. No proprietary SDKs, no vendor lock-in.
EU data centres
GPUs hosted in European data centres. Full GDPR compliance, data residency in the EU.
Instant availability
GPUs provision in seconds, not hours. No capacity planning, no procurement queues.
GDPR Compliant · EU Data Residency Transparent pricing.
Pay for what you use. No hidden fees.
Per-token pricing. No minimum spend.
View all modelsPer-second billing. No minimum commitment.
Long-term contracts from 3 months. We'll get back to you within 24 hours.
Ready to ship faster?
Request access to run your first GPU job in under a minute. No credit card required.