Coming Soon

Our serverless inference stack is currently in a closed beta. Join the waitlist to get early access.

Serverless Inference

Pay-per-token API access to open-source models.

OpenAI-compatible APIs. No infrastructure to manage. Scale instantly from zero to thousands of requests per second.

Join Waitlist Talk to our Engineering team

API Request Flow

Live

Your App

Lyceum

Model

Model Llama 3.3 70B

Latency 142ms TTFT

No infrastructure management

Send requests, get responses. We handle everything else.

Instant scaling

From zero to thousands of requests per second, automatically.

$0.02

Pay per token

No idle costs. You pay only for the tokens you process.

base_url= "

api.openai.com

api.lyceum.tech

OpenAI-compatible

Drop-in replacement. Change one line of code.

Get started in minutes

Use the OpenAI Python library with just one line changed. Your existing code works out of the box.

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.lyceum.technology/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

View full API documentation →

Supported models

Access the latest open-source models through a single API.

Llama 3.3 70B

Mistral Large

Mistral

123B 128K context

DeepSeek V3

DeepSeek

671B 128K context

Qwen 2.5 72B

Alibaba

72B 128K context

Mixtral 8x22B

Mistral

176B MoE 64K context

Llama 3.1 405B

Simple, transparent pricing

Pay only for the tokens you use. No minimum spend, no hidden fees. Volume discounts available.

View pricing

Ready to get started?

Join the waitlist to be the first to access serverless inference.

Join Waitlist Talk to our Engineering team