NVIDIA
Models published by NVIDIA, available through the AnyRouter API. Each can route across multiple upstream providers for availability and price.
NVIDIA's Mixture-of-Experts model with 120B total parameters and 12B active, optimized for efficient inference with strong reasoning capabilities.
A 120 billion parameter model from NVIDIA's Nemotron-3 family, optimized for efficient text generation and understanding tasks.
NVIDIA Nemotron-3-Ultra-550B-A55B is a 550B parameter (55B active) frontier model built on a LatentMoE hybrid architecture combining Mamba-2, MoE, and Attention with Multi-Token Prediction. Features a 1M token context window, configurable reasoning mode (enable_thinking), and strong multilingual support across English, French, Spanish, Italian, German, Japanese, Korean, Hindi, Brazilian Portuguese, and Chinese. Best suited for complex agentic workflows, long-context analysis, tool use, and high-stakes RAG.
A compact 4B-parameter multimodal guardrail model fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, returning a safe/unsafe classification, safety category labels, and optional reasoning. Served via NVIDIA NIM.
NVIDIA Nemotron 3 Ultra is built for frontier reasoning, orchestration, coding agents, deep research, and complex enterprise workflows. It delivers up to 5x faster inference and up to 30% lower cost for agentic workloads while supporting up to 1M token context. Designed for advanced function calling, structured output, and complex reasoning tasks.