Our serverless inference stack is currently in a closed beta. Join the waitlist to get early access.
Pay-per-token API access to open-source models.
OpenAI-compatible APIs. No infrastructure to manage. Scale instantly from zero to thousands of requests per second.
No infrastructure management
Send requests, get responses. We handle everything else.
Instant scaling
From zero to thousands of requests per second, automatically.
Pay per token
No idle costs. You pay only for the tokens you process.
OpenAI-compatible
Drop-in replacement. Change one line of code.
Get started in minutes
Use the OpenAI Python library with just one line changed. Your existing code works out of the box.
from openai import OpenAI
client = OpenAI(
base_url="https://api.lyceum.technology/v1",
api_key="your-api-key"
)
response = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content) Supported models
Access the latest open-source models through a single API.
Llama 3.3 70B
MetaMistral Large
MistralDeepSeek V3
DeepSeekQwen 2.5 72B
AlibabaMixtral 8x22B
MistralLlama 3.1 405B
MetaSimple, transparent pricing
Pay only for the tokens you use. No minimum spend, no hidden fees. Volume discounts available.
View pricingReady to get started?
Join the waitlist to be the first to access serverless inference.