Hermes-4-70B: specs, benchmarks, and how to run it on Lyceum
Deploy the hybrid-mode reasoning model by Nous Research on EU-sovereign infrastructure.
Magnus Grünewald
June 19, 2026 · CEO at Lyceum Technology
Hermes-4-70B is a frontier, hybrid-mode reasoning model developed by Nous Research. Built on the Llama-3.1-70B architecture, it introduces advanced capabilities in mathematics, coding, and logical deduction through a massive post-training corpus of approximately 5 million samples. The model features a unique hybrid reasoning mode, allowing it to generate explicit thinking traces before answering or respond directly based on the prompt. Lyceum Technology serves Hermes-4-70B through our Serverless Inference API. Engineering teams can access the model using the standard OpenAI SDK, making migration straightforward. Because Lyceum operates its own hardware in European data centers, all inference workloads run on EU-sovereign infrastructure, ensuring strict GDPR compliance without the data privacy risks associated with US-based hyperscalers.
Get started: call Hermes-4-70B on Lyceum
To integrate Hermes-4-70B into your application, use the standard OpenAI Python SDK. Because Lyceum provides an OpenAI-compatible API, you only need to update the base URL and provide your Lyceum API key. The model string for this endpoint is NousResearch/Hermes-4-70B.
from openai import OpenAI
client = OpenAI(
base_url="https://api.lyceum.technology/api/v2/external/serverless",
api_key="<your lyceum api key>",
)
response = client.chat.completions.create(
model="NousResearch/Hermes-4-70B",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=256,
)
print(response.choices[0].message.content)
Pricing and region for Hermes-4-70B
Lyceum serves Hermes-4-70B on the Fast tier, which is optimized for cost-efficient, high-throughput inference. The pricing is $0.13 per million input tokens and $0.40 per million output tokens. This per-token billing model ensures you pay for the exact compute resources your application consumes, scaling to zero when idle.
All API requests for this model are processed in the eu-north1 region. By running workloads on Lyceum's owned GPU infrastructure in Europe, your data remains within the European Union. This setup provides a clear path to GDPR compliance for enterprise applications, healthcare platforms, and financial services that cannot route sensitive user data through US-based API providers. The combination of the Fast tier economics and EU data residency makes this model practical for production deployments.
What Hermes-4-70B is good at
Hybrid reasoning and structured outputs
Hermes-4-70B introduces a hybrid reasoning mode that allows the model to deliberate before generating a final response. When faced with complex logic, the model can output explicit thinking segments to work through the problem step by step. For simpler queries, it can bypass this deliberation to provide faster responses. Furthermore, Nous Research trained the model to produce valid JSON for given schemas, making it reliable for programmatic function calling.
Steerability and reduced refusals
One of the primary design goals of the Hermes series is user alignment without excessive censorship. Hermes-4-70B achieves state-of-the-art results on RefusalBench, demonstrating a willingness to be helpful across scenarios that other models often block. This steerability means developers can rely on the model to follow system prompts accurately and maintain complex roleplay instructions without triggering false-positive safety refusals.
Math, code, and logic capabilities
The model was post-trained on a synthesized corpus of approximately 60 billion tokens blended across reasoning and non-reasoning data. This dataset yields improvements in STEM fields. Hermes-4-70B excels at competitive programming tasks, advanced mathematical problem solving, and scientific reasoning. It retains the general assistant quality of its base architecture while pushing the boundaries of what a 70-billion parameter model can achieve in specialized domains.
Benchmarks and how it compares
Hermes-4-70B benchmark results
Nous Research and independent evaluators have published extensive benchmark data for Hermes-4-70B, demonstrating its strong performance across coding, mathematics, and general knowledge tasks. The model consistently competes with or outperforms other models in the 70B weight class.
| Benchmark | Metric | Score |
|---|---|---|
| MATH | Competition mathematics | 91.0% |
| HumanEval+ | Code generation correctness | 90.0% |
| MMLU-Pro | Massive Multitask Language Understanding | 87.0% |
| SWE-bench Verified | Real-world software engineering | 72.0% |
| GPQA Diamond | Graduate-level science Q&A | 49.1% |
| LiveCodeBench | Live competitive programming | 26.9% |
Source: AI Value Index and developer performance benchmarks.
When compared to its base model, Llama-3.1-70B, Hermes-4-70B shows improvements in structured output generation and mathematical reasoning. The addition of the hybrid reasoning mode allows it to score higher on complex logic evaluations like MATH and SWE-bench Verified. Against current sibling models in the Lyceum catalogue, such as standard instruction-tuned 70B models, Hermes-4-70B offers a distinct advantage for developers who need strict JSON schema adherence and the ability to toggle deep thinking traces. Its performance on HumanEval+ makes it an excellent choice for coding assistants.
Using it in production
Production configuration for Hermes-4-70B
When deploying Hermes-4-70B in production, understanding the model parameters and pricing structure is critical. The model supports a context window of 131,072 tokens, which is ideal for analyzing large codebases or maintaining long multi-turn conversations.
Lyceum categorizes this model in the Fast tier. The Fast tier is designed for cost-efficient, high-throughput workloads where latency and unit economics are the primary concerns. The pricing is set at $0.13 per million input tokens and $0.40 per million output tokens.
To understand the production economics, consider an application processing 10,000 requests per day. If an average request contains 1,500 input tokens and generates 500 output tokens, the daily token volume would be 15 million input tokens and 5 million output tokens.
- Input cost: 15 million tokens × $0.13 = $1.95
- Output cost: 5 million tokens × $0.40 = $2.00
- Total daily cost: $3.95
This per-token pricing model ensures you pay for the exact compute used. Furthermore, Lyceum does not charge any egress fees, meaning you can stream large volumes of generated text back to your application without incurring hidden network transfer costs. All API requests for Hermes-4-70B are routed through the eu-north1 region, ensuring low latency for European users while maintaining strict data residency.
Running Hermes-4-70B on EU-sovereign infrastructure
Why run Hermes-4-70B on Lyceum
For European engineering teams, data sovereignty is a hard requirement. Running Hermes-4-70B on Lyceum ensures that your inference workloads remain entirely within the European Union. Because the model is hosted in the eu-north1 region, your data never crosses the Atlantic, providing a clear and provable path to GDPR compliance. This is a critical advantage over US-based API providers that route traffic through American data centers.
Lyceum operates its own GPU infrastructure rather than renting compute from hyperscalers. This structural advantage allows us to offer competitive pricing without markups. By utilizing our open-stack transparency, powered by vLLM and NVIDIA Dynamo, developers gain deep visibility into the inference process. You are not locked into a proprietary engine.
The platform supports engineering velocity. The OpenAI-compatible API means you can migrate existing applications to Lyceum by changing a single line of code, the base URL. There is no need to rewrite your application logic or learn a new SDK. Additionally, our per-second billing and scale-to-zero capabilities ensure that you never pay for idle compute. Lyceum provides the performance, compliance, and cost-efficiency required to scale AI applications across Europe.