Coming Soon

Our serverless inference stack is currently in a closed beta. Join the waitlist to get early access.

Serverless Inference

Pay-per-token API access to open-source models.

OpenAI-compatible APIs. No infrastructure to manage. Scale instantly from zero to thousands of requests per second.

API Request Flow
Live
Your App
Lyceum
Model
Model Llama 3.3 70B
Latency 142ms TTFT

No infrastructure management

Send requests, get responses. We handle everything else.

Instant scaling

From zero to thousands of requests per second, automatically.

$0.02

Pay per token

No idle costs. You pay only for the tokens you process.

base_url= "
api.openai.com
api.lyceum.tech
"

OpenAI-compatible

Drop-in replacement. Change one line of code.

Get started in minutes

Use the OpenAI Python library with just one line changed. Your existing code works out of the box.

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.lyceum.technology/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Supported models

Access the latest open-source models through a single API.

Llama 3.3 70B

Meta
70B 128K context

Mistral Large

Mistral
123B 128K context

DeepSeek V3

DeepSeek
671B 128K context

Qwen 2.5 72B

Alibaba
72B 128K context

Mixtral 8x22B

Mistral
176B MoE 64K context

Llama 3.1 405B

Meta
405B 128K context

Simple, transparent pricing

Pay only for the tokens you use. No minimum spend, no hidden fees. Volume discounts available.

View pricing

Ready to get started?

Join the waitlist to be the first to access serverless inference.