Nemotron - Premier Open Models for Efficient AI Agents

🔓

True Openness

Open Weights, Open Data, Open Recipes. We empower you to own your AI stack completely.

🤖

Agent-Native

Designed specifically for tool use, multi-step reasoning, and long-context agentic workflows.

⚡

Unmatched Efficiency

Hybrid architectures delivering small model footprint with large model intelligence.

PRODUCT SPOTLIGHT

Nemotron-3 Nano (30B A3B)

“The Brain of Your Next Agent”

The Hybrid Revolution

World's first Hybrid Mamba-Transformer MoE. Combines infinite context speed with Transformer precision.

Extreme Efficiency

31.6B Parameters, yet only 3.6B Active per token. Delivers 4x Faster Throughput than previous generations.

1M Context Window

Perfect memory for RAG and long-document analysis, keeping context intact across long horizons.

Reasoning Mode

Toggle "Thinking Mode" ON/OFF. Control your budget between speed and deep chain-of-thought reasoning.

4x Faster Throughput

1M Context Window

30B MoE Parameters

Specialized for Every Need

From edge devices to data center scale, there is a model for every task.

Flagship

Llama-3.1 Nemotron 70B

The Judge & Teacher

Beats GPT-4o on release. Best for hard instruction following and synthetic data generation.

Start-of-the-art Reward Model
High Precision

Research

Nemotron Super (49B)

The Deep Researcher

High accuracy for deep research agents. Fits on a single Data Center GPU.

Deep Reasoning
Single GPU Deployable

Enterprise

Nemotron Ultra (253B)

The Heavyweight

Maximum accuracy for rigid enterprise workflows like Supply Chain and Security.

Massive Scale
High Reliability

Utility

Nemotron RAG & Safety

The Infrastructure

Industry-leading embedding/reranking models and safety guardrails.

Jailbreak Detection
Top Reranking

Built for Developers

We didn't just open the model, we opened the lab.

Open Data Stack 3 Trillion Tokens of high-quality pre-training data revealed.

NeMo Gym Standardized RL environments with 900k+ tasks for training your own agents.

Deployment Ready Optimized for vLLM and available as NIM microservices.

Explore Developer Tools

Frequently Asked Questions

What is the hardware requirement for Nemotron-3 Nano?

For full precision (BF16), you need ~60GB+ VRAM (e.g., A100). For quantized (FP8/INT4), it fits in ~20-32GB VRAM (RTX 3090/4090).

Can I use it for commercial products?

Yes. It is released under the Open Model License, allowing for commercial use and derivative works.

Does it support "Thinking" mode?

Yes. Nemotron features a toggleable Reasoning Mode to improve accuracy on complex tasks at the cost of more tokens.

How do I deploy it?

We recommend vLLM for the fastest Nemotron inference speed on High-Performance GPUs. It is also fully supported by standard Hugging Face Transformers.