Qwen3-235B-A22B: specs, benchmarks, and how to run it on Lyceum
A 235B MoE model optimized for multilingual instruction following and high-speed inference.
Magnus Grünewald
June 25, 2026 · CEO at Lyceum Technology
Qwen3-235B-A22B is a large language model developed by Alibaba Cloud's Qwen team. This Mixture-of-Experts (MoE) model houses 235 billion total parameters but activates only 22 billion per forward pass, delivering high-tier performance with exceptional efficiency. The Instruct-2507 checkpoint is optimized for general-purpose text generation, coding, and tool usage without the overhead of a "thinking" mode. Lyceum Technology serves Qwen3-235B-A22B through our OpenAI-compatible Serverless Inference API, allowing you to deploy this powerful open-weight model on GDPR-compliant, EU-sovereign infrastructure in our eu-north1 region.
Get started: call Qwen3-235B-A22B on Lyceum
Integrate Qwen3-235B-A22B into your application using the standard OpenAI SDK. Lyceum Technology provides a drop-in replacement API, meaning you only need to update your base URL and API key to start routing requests to our European data centers.
from openai import OpenAI
client = OpenAI(
base_url="https://api.lyceum.technology/api/v2/external/serverless",
api_key="<your lyceum api key>",
)
response = client.chat.completions.create(
model="Qwen/Qwen3-235B-A22B-Instruct-2507",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=256,
)
print(response.choices[0].message.content)
Pricing and region for Qwen3-235B-A22B
When you deploy Qwen3-235B-A22B on Lyceum Technology, you benefit from transparent, per-token pricing with no hidden base fees or minimum commitments. The model is hosted in our eu-north1 region, ensuring that all data processing remains strictly within the European Union.
This model is categorized under our Standard tier, which is reserved for high-capability foundation models that handle complex reasoning and large-scale generation tasks. The pricing for the Qwen/Qwen3-235B-A22B-Instruct-2507 endpoint is set at $0.20 per million input tokens and $0.60 per million output tokens. Because Lyceum operates its own bare-metal infrastructure, we can offer these highly competitive rates while maintaining strict data sovereignty. You only pay for the exact number of tokens you process, making it highly cost-effective for both bursty workloads and sustained production traffic.
For engineers migrating from existing cloud providers, the transition requires zero code refactoring. The endpoint fully supports standard parameters like temperature, max_tokens, and stream, allowing you to maintain your existing application logic while upgrading your infrastructure to a secure, GDPR-compliant environment.
What Qwen3-235B-A22B is good at
Efficient Mixture-of-Experts architecture
Qwen3-235B-A22B represents a significant architectural leap for the Alibaba Cloud Qwen team. It utilizes a Mixture-of-Experts (MoE) design, housing a massive 235 billion total parameters. However, during any given forward pass, the model activates only 8 of its 128 experts, utilizing just 22 billion parameters per token. This sparse activation strategy allows the model to achieve the reasoning depth and knowledge retention of a massive dense model while maintaining the inference speed and computational efficiency of a much smaller 22B parameter model. For engineering teams, this translates to faster time-to-first-token (TTFT) and higher throughput in production.
Massive 256K context window
One of the standout capabilities of the Instruct-2507 checkpoint is its native support for a 262,144-token context window. This massive capacity allows the model to process hundreds of pages of text, extensive codebases, or large datasets in a single prompt. It is particularly effective for document parsing, retrieval-augmented generation (RAG) pipelines, and summarizing long-form enterprise data without requiring aggressive chunking strategies.
Multilingual proficiency and coding
Qwen3-235B-A22B excels in multilingual environments, supporting over 100 languages and dialects. Its training data heavily emphasizes cross-lingual alignment, making it an excellent choice for global applications requiring translation or localized instruction following. Furthermore, the model demonstrates exceptional proficiency in software engineering tasks. It reliably generates, debugs, and refactors code across multiple programming languages, and its robust tool-usage capabilities allow it to seamlessly integrate with external APIs and function-calling frameworks in complex agentic workflows.
Benchmarks and how it compares
Qwen3-235B-A22B benchmark results
The Qwen3-235B-A22B-Instruct-2507 model has been rigorously evaluated across industry-standard benchmarks, demonstrating performance that rivals top-tier proprietary models and significantly outpaces previous open-weight generations. The official Hugging Face model card, the model excels in general knowledge, coding, and instruction following.
| Benchmark | Metric | Qwen3-235B-A22B-Instruct-2507 |
|---|---|---|
| MMLU-Pro | General Knowledge | 83.0% |
| MMLU-Redux | General Knowledge | 93.1% |
| GPQA | Graduate-Level Science | 77.5% |
| SuperGPQA | Advanced Science | 62.6% |
| CSimpleQA | Chinese Factuality | 84.3% |
| SimpleQA | Factual Accuracy | 54.3% |
Source: Alibaba Cloud Qwen Team official benchmark reports.
Compared to its predecessor, Qwen2.5-72B-Instruct, the new Qwen3-235B-A22B model offers a substantial leap in capability. While the 72B model is a dense architecture, the 235B MoE model provides a much wider breadth of knowledge and superior multilingual support while maintaining similar inference speeds due to its 22B active parameter count.
Against current proprietary models, Qwen3-235B-A22B holds its ground remarkably well. It frequently matches or exceeds the performance of models in its weight class on coding and tool-usage tasks. For European engineering teams, this means you can achieve proprietary-level performance using an open-weight model hosted securely on Lyceum Technology's infrastructure, avoiding the vendor lock-in and data privacy concerns associated with US-based providers.
Using it in production
Production configuration for Qwen3-235B-A22B
Deploying Qwen3-235B-A22B in a production environment requires understanding how to optimize its parameters and manage costs effectively. On Lyceum Technology, this model operates under our Standard tier, which designates it as a high-capability model suitable for complex, enterprise-grade workloads.
The model supports a native context window of 262,144 tokens. When processing large documents, we highly recommend enabling the stream=True parameter in your API calls. Streaming reduces perceived latency for end-users, as the model's 22B active parameters can begin returning tokens almost instantly, even when processing extensive input contexts. For tasks requiring strict formatting, such as JSON extraction or function calling, setting a lower temperature (e.g., 0.1 or 0.2) will yield the most deterministic and reliable outputs.
Cost management is straightforward with Lyceum's per-token billing model. The Qwen/Qwen3-235B-A22B-Instruct-2507 endpoint is priced at $0.20 per million input tokens and $0.60 per million output tokens. For example, if your application processes a 50,000-token legal contract and generates a 1,000-token summary, the input cost would be $0.01, and the output cost would be $0.0006, resulting in a total transaction cost of just $0.0106.
Because Lyceum Technology operates in the eu-north1 region, all inference requests are processed on European soil. This ensures that your production workloads benefit from high-throughput, low-latency connections while maintaining strict adherence to regional data protection regulations.
Running Qwen3-235B-A22B on EU-sovereign infrastructure
Why run Qwen3-235B-A22B on Lyceum
For European AI startups and enterprise engineering teams, data sovereignty is no longer an optional feature - it is a strict regulatory requirement. Running Qwen3-235B-A22B on Lyceum Technology ensures that your sensitive workloads never leave the European Union. Our eu-north1 data centers provide a fully GDPR-compliant environment, offering a clear path to AI Act and ISO 27001 compliance. Unlike many US-based API providers that route traffic through black-box infrastructure, Lyceum guarantees that your data remains secure and sovereign.
Lyceum Technology owns and operates its bare-metal GPU infrastructure. This structural advantage allows us to bypass the massive markups charged by traditional cloud providers. By leveraging our open-stack inference engine - powered by vLLM and NVIDIA GPUs - we deliver exceptional performance and pass the cost savings directly to you. You benefit from per-second, per-token billing with zero minimum commitments and absolutely no egress fees for data transfer.
Integrate this infrastructure into your stack effortlessly. Our Serverless Inference API is a 100% drop-in replacement for the OpenAI SDK. You can migrate your existing applications to Qwen3-235B-A22B in minutes simply by updating your base URL and API key. Scale a high-traffic customer support agent or processing massive batches of documents, Lyceum provides the reliability, speed, and compliance necessary to run production AI workloads with complete confidence.