Serverless Inference Model Library Text LLMs 8 min read read

Qwen3-235B-A22B: specs, benchmarks, and how to run it on Lyceum

A 235B MoE model optimized for multilingual instruction following and high-speed inference.

Magnus Grünewald

Magnus Grünewald

June 25, 2026 · CEO at Lyceum Technology

Qwen3-235B-A22B is a large language model developed by Alibaba Cloud's Qwen team. This Mixture-of-Experts (MoE) model houses 235 billion total parameters but activates only 22 billion per forward pass, delivering high-tier performance with exceptional efficiency. The Instruct-2507 checkpoint is optimized for general-purpose text generation, coding, and tool usage without the overhead of a "thinking" mode. Lyceum Technology serves Qwen3-235B-A22B through our OpenAI-compatible Serverless Inference API, allowing you to deploy this powerful open-weight model on GDPR-compliant, EU-sovereign infrastructure in our eu-north1 region.

Get started: call Qwen3-235B-A22B on Lyceum

Integrate Qwen3-235B-A22B into your application using the standard OpenAI SDK. Lyceum Technology provides a drop-in replacement API, meaning you only need to update your base URL and API key to start routing requests to our European data centers.

from openai import OpenAI

client = OpenAI(
 base_url="https://api.lyceum.technology/api/v2/external/serverless",
 api_key="<your lyceum api key>",
)
response = client.chat.completions.create(
 model="Qwen/Qwen3-235B-A22B-Instruct-2507",
 messages=[{"role": "user", "content": "Hello!"}],
 max_tokens=256,
)
print(response.choices[0].message.content)

Pricing and region for Qwen3-235B-A22B

When you deploy Qwen3-235B-A22B on Lyceum Technology, you benefit from transparent, per-token pricing with no hidden base fees or minimum commitments. The model is hosted in our eu-north1 region, ensuring that all data processing remains strictly within the European Union.

This model is categorized under our Standard tier, which is reserved for high-capability foundation models that handle complex reasoning and large-scale generation tasks. The pricing for the Qwen/Qwen3-235B-A22B-Instruct-2507 endpoint is set at $0.20 per million input tokens and $0.60 per million output tokens. Because Lyceum operates its own bare-metal infrastructure, we can offer these highly competitive rates while maintaining strict data sovereignty. You only pay for the exact number of tokens you process, making it highly cost-effective for both bursty workloads and sustained production traffic.

For engineers migrating from existing cloud providers, the transition requires zero code refactoring. The endpoint fully supports standard parameters like temperature, max_tokens, and stream, allowing you to maintain your existing application logic while upgrading your infrastructure to a secure, GDPR-compliant environment.

What Qwen3-235B-A22B is good at

Efficient Mixture-of-Experts architecture

Qwen3-235B-A22B represents a significant architectural leap for the Alibaba Cloud Qwen team. It utilizes a Mixture-of-Experts (MoE) design, housing a massive 235 billion total parameters. However, during any given forward pass, the model activates only 8 of its 128 experts, utilizing just 22 billion parameters per token. This sparse activation strategy allows the model to achieve the reasoning depth and knowledge retention of a massive dense model while maintaining the inference speed and computational efficiency of a much smaller 22B parameter model. For engineering teams, this translates to faster time-to-first-token (TTFT) and higher throughput in production.

Massive 256K context window

One of the standout capabilities of the Instruct-2507 checkpoint is its native support for a 262,144-token context window. This massive capacity allows the model to process hundreds of pages of text, extensive codebases, or large datasets in a single prompt. It is particularly effective for document parsing, retrieval-augmented generation (RAG) pipelines, and summarizing long-form enterprise data without requiring aggressive chunking strategies.

Multilingual proficiency and coding

Qwen3-235B-A22B excels in multilingual environments, supporting over 100 languages and dialects. Its training data heavily emphasizes cross-lingual alignment, making it an excellent choice for global applications requiring translation or localized instruction following. Furthermore, the model demonstrates exceptional proficiency in software engineering tasks. It reliably generates, debugs, and refactors code across multiple programming languages, and its robust tool-usage capabilities allow it to seamlessly integrate with external APIs and function-calling frameworks in complex agentic workflows.

Benchmarks and how it compares

Qwen3-235B-A22B benchmark results

The Qwen3-235B-A22B-Instruct-2507 model has been rigorously evaluated across industry-standard benchmarks, demonstrating performance that rivals top-tier proprietary models and significantly outpaces previous open-weight generations. The official Hugging Face model card, the model excels in general knowledge, coding, and instruction following.

Benchmark Metric Qwen3-235B-A22B-Instruct-2507
MMLU-Pro General Knowledge 83.0%
MMLU-Redux General Knowledge 93.1%
GPQA Graduate-Level Science 77.5%
SuperGPQA Advanced Science 62.6%
CSimpleQA Chinese Factuality 84.3%
SimpleQA Factual Accuracy 54.3%

Source: Alibaba Cloud Qwen Team official benchmark reports.

Compared to its predecessor, Qwen2.5-72B-Instruct, the new Qwen3-235B-A22B model offers a substantial leap in capability. While the 72B model is a dense architecture, the 235B MoE model provides a much wider breadth of knowledge and superior multilingual support while maintaining similar inference speeds due to its 22B active parameter count.

Against current proprietary models, Qwen3-235B-A22B holds its ground remarkably well. It frequently matches or exceeds the performance of models in its weight class on coding and tool-usage tasks. For European engineering teams, this means you can achieve proprietary-level performance using an open-weight model hosted securely on Lyceum Technology's infrastructure, avoiding the vendor lock-in and data privacy concerns associated with US-based providers.

Using it in production

Production configuration for Qwen3-235B-A22B

Deploying Qwen3-235B-A22B in a production environment requires understanding how to optimize its parameters and manage costs effectively. On Lyceum Technology, this model operates under our Standard tier, which designates it as a high-capability model suitable for complex, enterprise-grade workloads.

The model supports a native context window of 262,144 tokens. When processing large documents, we highly recommend enabling the stream=True parameter in your API calls. Streaming reduces perceived latency for end-users, as the model's 22B active parameters can begin returning tokens almost instantly, even when processing extensive input contexts. For tasks requiring strict formatting, such as JSON extraction or function calling, setting a lower temperature (e.g., 0.1 or 0.2) will yield the most deterministic and reliable outputs.

Cost management is straightforward with Lyceum's per-token billing model. The Qwen/Qwen3-235B-A22B-Instruct-2507 endpoint is priced at $0.20 per million input tokens and $0.60 per million output tokens. For example, if your application processes a 50,000-token legal contract and generates a 1,000-token summary, the input cost would be $0.01, and the output cost would be $0.0006, resulting in a total transaction cost of just $0.0106.

Because Lyceum Technology operates in the eu-north1 region, all inference requests are processed on European soil. This ensures that your production workloads benefit from high-throughput, low-latency connections while maintaining strict adherence to regional data protection regulations.

Running Qwen3-235B-A22B on EU-sovereign infrastructure

Why run Qwen3-235B-A22B on Lyceum

For European AI startups and enterprise engineering teams, data sovereignty is no longer an optional feature - it is a strict regulatory requirement. Running Qwen3-235B-A22B on Lyceum Technology ensures that your sensitive workloads never leave the European Union. Our eu-north1 data centers provide a fully GDPR-compliant environment, offering a clear path to AI Act and ISO 27001 compliance. Unlike many US-based API providers that route traffic through black-box infrastructure, Lyceum guarantees that your data remains secure and sovereign.

Lyceum Technology owns and operates its bare-metal GPU infrastructure. This structural advantage allows us to bypass the massive markups charged by traditional cloud providers. By leveraging our open-stack inference engine - powered by vLLM and NVIDIA GPUs - we deliver exceptional performance and pass the cost savings directly to you. You benefit from per-second, per-token billing with zero minimum commitments and absolutely no egress fees for data transfer.

Integrate this infrastructure into your stack effortlessly. Our Serverless Inference API is a 100% drop-in replacement for the OpenAI SDK. You can migrate your existing applications to Qwen3-235B-A22B in minutes simply by updating your base URL and API key. Scale a high-traffic customer support agent or processing massive batches of documents, Lyceum provides the reliability, speed, and compliance necessary to run production AI workloads with complete confidence.

Frequently Asked Questions

What is the context window of Qwen3-235B-A22B?

Qwen3-235B-A22B features a massive native context window of 262,144 tokens (256K). This allows the model to process extensive documents, large codebases, and complex datasets in a single prompt, making it highly effective for retrieval-augmented generation (RAG) and long-form analysis.

How much does the Qwen3-235B-A22B API cost on Lyceum?

On Lyceum Technology, the Qwen3-235B-A22B-Instruct-2507 model costs $0.20 per million input tokens and $0.60 per million output tokens. We utilize a transparent, pay-per-token billing model with no base fees, minimum commitments, or hidden data egress charges.

Is the Qwen3-235B-A22B API GDPR compliant?

Yes. When you access Qwen3-235B-A22B through Lyceum Technology, your requests are processed entirely within our eu-north1 region. We own our European data centers, ensuring strict data sovereignty and full GDPR compliance for your sensitive enterprise workloads.

How do I migrate to Qwen3-235B-A22B from OpenAI?

Migrating is seamless because Lyceum Technology provides an OpenAI-compatible API. You simply need to change your client's base URL to [removed], insert your Lyceum API key, and update the model string to Qwen/Qwen3-235B-A22B-Instruct-2507.

What is the difference between Qwen3-235B-A22B and Qwen2.5-72B?

While Qwen2.5-72B is a dense model, Qwen3-235B-A22B utilizes a Mixture-of-Experts (MoE) architecture. It has 235 billion total parameters for a broader knowledge base but only activates 22 billion parameters per token, resulting in faster inference speeds and higher efficiency.

Under what license is Qwen3-235B-A22B released?

Qwen3-235B-A22B is an open-weight model released by Alibaba Cloud under the permissive Apache 2.0 license. This allows for both research and commercial use, giving engineering teams the freedom to build and scale production applications without restrictive licensing fees.

Related Resources

/magazine/glm-5-2; /magazine/llama-3-3-70b; /magazine/gpt-oss-120b