Serverless Inference Model Library Text LLMs 8 min read read

Cosmos3-Super-Reasoner: specs, benchmarks, and how to run it on Lyceum

NVIDIA's 32B omnimodal vision-language model for physical AI and complex reasoning.

Magnus Grünewald

Magnus Grünewald

June 15, 2026 · CEO at Lyceum Technology

Cosmos3-Super-Reasoner is a 32B-parameter omnimodal vision-language model developed by NVIDIA as part of the Cosmos 3 family. Designed specifically for physical AI, it excels at understanding real-world environments, analyzing fixed-camera footage, and reasoning about complex multi-step tasks in robotics and autonomous systems. Lyceum Technology serves Cosmos3-Super-Reasoner through our OpenAI-compatible Serverless Inference API, allowing developers to integrate advanced physical reasoning into their applications with zero code changes. All inference runs on our EU-hosted infrastructure, ensuring strict data privacy and GDPR compliance for sensitive enterprise workloads.

Get started: call Cosmos3-Super-Reasoner on Lyceum

Integrating NVIDIA's Cosmos3-Super-Reasoner into your application requires zero new frameworks if you already use standard API clients. Lyceum Technology provides a drop-in replacement for the OpenAI SDK, allowing you to route requests to secure European infrastructure by updating the base URL and API key. This approach ensures that your physical AI and video understanding workloads remain fully GDPR compliant without requiring architectural rewrites.

Below is the exact Python snippet to call the model using the standard OpenAI client.

from openai import OpenAI

client = OpenAI(
 base_url="https://api.lyceum.technology/api/v2/external/serverless",
 api_key="<your lyceum api key>",
)
response = client.chat.completions.create(
 model="nvidia/Cosmos3-Super-Reasoner",
 messages=[{"role": "user", "content": "Hello!"}],
 max_tokens=256,
)
print(response.choices[0].message.content)

Pricing and region for Cosmos3-Super-Reasoner

When deploying physical AI models, predictable unit economics are critical. Cosmos3-Super-Reasoner is available on Lyceum's Standard tier, which is optimized for high-capability workloads requiring complex multi-step reasoning. The model is priced at $0.10 per million input tokens and $0.30 per million output tokens.

All inference for this model runs in the eu-north1 region. This guarantees that your sensitive video feeds, images, and proprietary text data never leave the European Union. For enterprises building autonomous systems or analyzing factory floor footage, this strict data residency eliminates the compliance risks associated with routing data to US-based hyperscalers. You pay strictly per token with no base fees, allowing you to scale from prototyping to production efficiently.

What Cosmos3-Super-Reasoner is good at

Physical AI and omnimodal understanding

Cosmos3-Super-Reasoner is a 32B-parameter vision-language model built specifically for physical AI. Unlike standard text-based large language models, it uses a unified mixture-of-transformers architecture to process text, images, video, and audio natively. This omnimodal design allows the model to understand the physical world, making it highly effective for robotics, autonomous vehicles, and smart space environments. It can analyze complex scenes, track object permanence, and understand spatial relationships across video frames.

Complex multi-step reasoning

The model excels at breaking down real-world scenarios into structured state sequences. When analyzing fixed-camera footage from warehouses, transportation hubs, or factory assembly lines, Cosmos3-Super-Reasoner can reliably segment activity and reason about what is happening. It serves as a planning model, using prior knowledge and physics understanding to determine what steps an embodied agent should take next. This makes it ideal for generating action sequences and evaluating physical plausibility in simulated environments.

Video analytics and anomaly detection

For industrial vision applications, the model provides robust performance in detecting events and anomalies. It can process long video sequences to identify deviations from standard operating procedures on a manufacturing line or track specific activities in a logistics hub. By combining visual perception with deep reasoning capabilities, Cosmos3-Super-Reasoner allows engineering teams to build automated monitoring systems that understand context, rather than relying on brittle, hard-coded computer vision rules.

Benchmarks and how it compares

Cosmos3-Super-Reasoner benchmark results

NVIDIA evaluates the Cosmos 3 family across multiple benchmark suites targeting physical AI reasoning, generation quality, and domain-specific performance. Cosmos3-Super-Reasoner ranks at the top of its parameter class for understanding real-world environments.

BenchmarkMetric / FocusResult
VANTAGE-BenchReal-world fixed-camera footage (32B tier)#1 Open Model
Traffic Anomaly Reasoning (TAR)Event detection in driving scenes#1 Open Model
Heron-BenchFree-form VLM response scoringHigh-tier performance

Source: NVIDIA Technical Blog.

Comparing to sibling models

Within the NVIDIA catalogue, Cosmos3-Super-Reasoner (32B) sits above Cosmos3-Nano-Reasoner (8B). While the Nano version is optimized for lightweight policy execution and edge deployments, the Super variant provides the high-capacity world simulation and advanced reasoning required for complex autonomous vehicle planning and datacenter-scale synthetic data generation.

When compared to generalist vision-language models, Cosmos3-Super-Reasoner demonstrates an advantage in physical plausibility. Standard VLMs often fail to maintain object permanence or understand momentum across video frames. Cosmos3-Super-Reasoner is explicitly trained to respect the laws of physics, making it far more reliable for robotics training pipelines. However, this specialization means it requires more careful prompt engineering and region framing to extract structured state sequences effectively.

Using it in production

Production configuration for Cosmos3-Super-Reasoner

Deploying Cosmos3-Super-Reasoner effectively requires understanding its context limits and pricing structure. The model supports a massive 256K context window, allowing it to ingest long video sequences, high-resolution image batches, and extensive system prompts in a single API call. This deep context is essential for analyzing multi-minute fixed-camera footage or providing a robot with extensive historical state data before asking it to reason about its next action.

Lyceum Technology serves this model on our Standard tier, which prioritizes high-capability execution for complex tasks. The pricing is highly competitive for a 32B-parameter omnimodal model: $0.10 per million input tokens and $0.30 per million output tokens.

To understand the unit economics, consider a video analytics workload. If you pass a sequence of frames and text prompts totaling 50,000 input tokens, and the model generates a detailed 500-token structured JSON analysis of the physical events, the cost is minimal. The input costs $0.005, and the output costs $0.00015, resulting in a total API call cost of $0.00515.

All requests are routed to our eu-north1 region. For European manufacturing, logistics, and automotive companies, this ensures that proprietary factory footage and autonomous driving data remain strictly within the EU. You can scale your inference volume dynamically without committing to expensive reserved instances, paying only for the exact tokens processed during your physical AI evaluations.

Running Cosmos3-Super-Reasoner on EU-sovereign infrastructure

Why run Cosmos3-Super-Reasoner on Lyceum

Building physical AI systems requires processing highly sensitive data. Factory floor camera feeds, autonomous vehicle sensor logs, and proprietary robotics training data cannot be sent to US-based infrastructure without triggering severe compliance risks. Lyceum Technology provides an EU-native inference platform capable of serving heavy omnimodal models like Cosmos3-Super-Reasoner with strict data sovereignty. Learn more about GDPR-compliant LLM inference in Europe.

By hosting the model in our eu-north1 region, we ensure that your data is processed entirely within European borders, fulfilling GDPR requirements by default. Unlike API providers that rent capacity from hyperscalers, Lyceum owns and operates its GPU infrastructure. This structural advantage allows us to offer per-second, pay-per-token billing without the massive markups typical of public clouds. You avoid the pain of managing your own hardware, dealing with cooling requirements, or fighting for GPU availability.

Furthermore, Lyceum provides open-stack transparency. We utilize optimized open-source inference engines like vLLM and NVIDIA Dynamo rather than locking you into a black-box proprietary stack. Our API is fully OpenAI-compatible, meaning your engineering team can switch from existing providers by changing a single URL string. You get the advanced physical reasoning capabilities of NVIDIA's 32B model, the scalability of serverless compute, and the legal certainty of European data residency, all without minimum commitments or egress fees.

Frequently Asked Questions

What is the price of Cosmos3-Super-Reasoner?

Cosmos3-Super-Reasoner costs $0.10 per million input tokens and $0.30 per million output tokens on Lyceum Technology's Standard tier. This pay-per-token pricing model allows you to scale physical AI workloads efficiently without paying for idle GPU time or minimum monthly commitments.

What is the context window for Cosmos3-Super-Reasoner?

The model features a massive 256K token context window. This extensive capacity allows engineering teams to input long video sequences, high-resolution image batches, and detailed system prompts in a single request, which is critical for complex video analytics and robotics planning.

Where is Cosmos3-Super-Reasoner hosted?

Lyceum Technology hosts Cosmos3-Super-Reasoner exclusively in the eu-north1 region. This guarantees that all inference data, including sensitive video feeds and proprietary text, remains within the European Union, ensuring full GDPR compliance and data sovereignty for enterprise workloads.

How do I call Cosmos3-Super-Reasoner using the OpenAI SDK?

You can call the model using the standard OpenAI Python or Node.js SDK. Update the base URL to https://build.nvidia.com/nvidia/cosmos3-super-reasoner, provide your Lyceum API key, and set the model parameter to nvidia/Cosmos3-Super-Reasoner. No other code changes are required.

How does Cosmos3-Super-Reasoner compare to Cosmos3-Nano?

Cosmos3-Super-Reasoner is a 32B-parameter model designed for high-capacity world simulation and complex reasoning, making it ideal for datacenter workloads. The 8B-parameter Nano variant is lighter and faster, optimized for edge robotics, embedded hardware, and tasks requiring ultra-low latency policy execution.

What are the primary use cases for Cosmos3-Super-Reasoner?

The model is built for physical AI applications. Primary use cases include analyzing fixed-camera footage in warehouses, detecting anomalies in driving scenes, generating synthetic training data for robotics, and serving as a reasoning engine to determine the next physical actions for embodied agents.

Related Resources

/magazine/glm-5-2; /magazine/llama-3-3-70b; /magazine/gpt-oss-120b