Cosmos3-Super-Reasoner: specs, benchmarks, and how to run it on Lyceum
NVIDIA's 32B omnimodal vision-language model for physical AI and complex reasoning.
Magnus Grünewald
June 15, 2026 · CEO at Lyceum Technology
Cosmos3-Super-Reasoner is a 32B-parameter omnimodal vision-language model developed by NVIDIA as part of the Cosmos 3 family. Designed specifically for physical AI, it excels at understanding real-world environments, analyzing fixed-camera footage, and reasoning about complex multi-step tasks in robotics and autonomous systems. Lyceum Technology serves Cosmos3-Super-Reasoner through our OpenAI-compatible Serverless Inference API, allowing developers to integrate advanced physical reasoning into their applications with zero code changes. All inference runs on our EU-hosted infrastructure, ensuring strict data privacy and GDPR compliance for sensitive enterprise workloads.
Get started: call Cosmos3-Super-Reasoner on Lyceum
Integrating NVIDIA's Cosmos3-Super-Reasoner into your application requires zero new frameworks if you already use standard API clients. Lyceum Technology provides a drop-in replacement for the OpenAI SDK, allowing you to route requests to secure European infrastructure by updating the base URL and API key. This approach ensures that your physical AI and video understanding workloads remain fully GDPR compliant without requiring architectural rewrites.
Below is the exact Python snippet to call the model using the standard OpenAI client.
from openai import OpenAI
client = OpenAI(
base_url="https://api.lyceum.technology/api/v2/external/serverless",
api_key="<your lyceum api key>",
)
response = client.chat.completions.create(
model="nvidia/Cosmos3-Super-Reasoner",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=256,
)
print(response.choices[0].message.content)Pricing and region for Cosmos3-Super-Reasoner
When deploying physical AI models, predictable unit economics are critical. Cosmos3-Super-Reasoner is available on Lyceum's Standard tier, which is optimized for high-capability workloads requiring complex multi-step reasoning. The model is priced at $0.10 per million input tokens and $0.30 per million output tokens.
All inference for this model runs in the eu-north1 region. This guarantees that your sensitive video feeds, images, and proprietary text data never leave the European Union. For enterprises building autonomous systems or analyzing factory floor footage, this strict data residency eliminates the compliance risks associated with routing data to US-based hyperscalers. You pay strictly per token with no base fees, allowing you to scale from prototyping to production efficiently.
What Cosmos3-Super-Reasoner is good at
Physical AI and omnimodal understanding
Cosmos3-Super-Reasoner is a 32B-parameter vision-language model built specifically for physical AI. Unlike standard text-based large language models, it uses a unified mixture-of-transformers architecture to process text, images, video, and audio natively. This omnimodal design allows the model to understand the physical world, making it highly effective for robotics, autonomous vehicles, and smart space environments. It can analyze complex scenes, track object permanence, and understand spatial relationships across video frames.
Complex multi-step reasoning
The model excels at breaking down real-world scenarios into structured state sequences. When analyzing fixed-camera footage from warehouses, transportation hubs, or factory assembly lines, Cosmos3-Super-Reasoner can reliably segment activity and reason about what is happening. It serves as a planning model, using prior knowledge and physics understanding to determine what steps an embodied agent should take next. This makes it ideal for generating action sequences and evaluating physical plausibility in simulated environments.
Video analytics and anomaly detection
For industrial vision applications, the model provides robust performance in detecting events and anomalies. It can process long video sequences to identify deviations from standard operating procedures on a manufacturing line or track specific activities in a logistics hub. By combining visual perception with deep reasoning capabilities, Cosmos3-Super-Reasoner allows engineering teams to build automated monitoring systems that understand context, rather than relying on brittle, hard-coded computer vision rules.
Benchmarks and how it compares
Cosmos3-Super-Reasoner benchmark results
NVIDIA evaluates the Cosmos 3 family across multiple benchmark suites targeting physical AI reasoning, generation quality, and domain-specific performance. Cosmos3-Super-Reasoner ranks at the top of its parameter class for understanding real-world environments.
| Benchmark | Metric / Focus | Result |
|---|---|---|
| VANTAGE-Bench | Real-world fixed-camera footage (32B tier) | #1 Open Model |
| Traffic Anomaly Reasoning (TAR) | Event detection in driving scenes | #1 Open Model |
| Heron-Bench | Free-form VLM response scoring | High-tier performance |
Source: NVIDIA Technical Blog.
Comparing to sibling models
Within the NVIDIA catalogue, Cosmos3-Super-Reasoner (32B) sits above Cosmos3-Nano-Reasoner (8B). While the Nano version is optimized for lightweight policy execution and edge deployments, the Super variant provides the high-capacity world simulation and advanced reasoning required for complex autonomous vehicle planning and datacenter-scale synthetic data generation.
When compared to generalist vision-language models, Cosmos3-Super-Reasoner demonstrates an advantage in physical plausibility. Standard VLMs often fail to maintain object permanence or understand momentum across video frames. Cosmos3-Super-Reasoner is explicitly trained to respect the laws of physics, making it far more reliable for robotics training pipelines. However, this specialization means it requires more careful prompt engineering and region framing to extract structured state sequences effectively.
Using it in production
Production configuration for Cosmos3-Super-Reasoner
Deploying Cosmos3-Super-Reasoner effectively requires understanding its context limits and pricing structure. The model supports a massive 256K context window, allowing it to ingest long video sequences, high-resolution image batches, and extensive system prompts in a single API call. This deep context is essential for analyzing multi-minute fixed-camera footage or providing a robot with extensive historical state data before asking it to reason about its next action.
Lyceum Technology serves this model on our Standard tier, which prioritizes high-capability execution for complex tasks. The pricing is highly competitive for a 32B-parameter omnimodal model: $0.10 per million input tokens and $0.30 per million output tokens.
To understand the unit economics, consider a video analytics workload. If you pass a sequence of frames and text prompts totaling 50,000 input tokens, and the model generates a detailed 500-token structured JSON analysis of the physical events, the cost is minimal. The input costs $0.005, and the output costs $0.00015, resulting in a total API call cost of $0.00515.
All requests are routed to our eu-north1 region. For European manufacturing, logistics, and automotive companies, this ensures that proprietary factory footage and autonomous driving data remain strictly within the EU. You can scale your inference volume dynamically without committing to expensive reserved instances, paying only for the exact tokens processed during your physical AI evaluations.
Running Cosmos3-Super-Reasoner on EU-sovereign infrastructure
Why run Cosmos3-Super-Reasoner on Lyceum
Building physical AI systems requires processing highly sensitive data. Factory floor camera feeds, autonomous vehicle sensor logs, and proprietary robotics training data cannot be sent to US-based infrastructure without triggering severe compliance risks. Lyceum Technology provides an EU-native inference platform capable of serving heavy omnimodal models like Cosmos3-Super-Reasoner with strict data sovereignty. Learn more about GDPR-compliant LLM inference in Europe.
By hosting the model in our eu-north1 region, we ensure that your data is processed entirely within European borders, fulfilling GDPR requirements by default. Unlike API providers that rent capacity from hyperscalers, Lyceum owns and operates its GPU infrastructure. This structural advantage allows us to offer per-second, pay-per-token billing without the massive markups typical of public clouds. You avoid the pain of managing your own hardware, dealing with cooling requirements, or fighting for GPU availability.
Furthermore, Lyceum provides open-stack transparency. We utilize optimized open-source inference engines like vLLM and NVIDIA Dynamo rather than locking you into a black-box proprietary stack. Our API is fully OpenAI-compatible, meaning your engineering team can switch from existing providers by changing a single URL string. You get the advanced physical reasoning capabilities of NVIDIA's 32B model, the scalability of serverless compute, and the legal certainty of European data residency, all without minimum commitments or egress fees.