NVIDIA

5 modelsModel creator

Models published by NVIDIA, available through the AnyRouter API. Each can route across multiple upstream providers for availability and price.

Nemotron 3 120B A12B

nvidia/nemotron-3-120b-a12b

$0.10 in

$0.30 out

Text GenerationText GenerationFunction callingFunction calling131,072FreeMiễn phí

NVIDIA's Mixture-of-Experts model with 120B total parameters and 12B active, optimized for efficient inference with strong reasoning capabilities.

TermsĐiều khoản2

TTFT 400msTPS 60 tok/s

Nemotron-3 Super 120B

nvidia/nemotron-3-super-120b-a12b

$0 in

$0 out

Text GenerationText Generation32,768FreeMiễn phí

A 120 billion parameter model from NVIDIA's Nemotron-3 family, optimized for efficient text generation and understanding tasks.

TermsĐiều khoản1

Nemotron 3 Ultra 550B

nvidia/nemotron-3-ultra-550b-a55b

$0.50 in

$2.50 out

Text GenerationText GenerationFunction callingFunction calling1,000,000FreeMiễn phí

NVIDIA Nemotron-3-Ultra-550B-A55B is a 550B parameter (55B active) frontier model built on a LatentMoE hybrid architecture combining Mamba-2, MoE, and Attention with Multi-Token Prediction. Features a 1M token context window, configurable reasoning mode (enable_thinking), and strong multilingual support across English, French, Spanish, Italian, German, Japanese, Korean, Hindi, Brazilian Portuguese, and Chinese. Best suited for complex agentic workflows, long-context analysis, tool use, and high-stakes RAG.

TermsĐiều khoản9

Nemotron 3.5 Content Safety

nvidia/nemotron-3.5-content-safety

$0 in

$0 out

MultimodalMultimodal128,000FreeMiễn phí

A compact 4B-parameter multimodal guardrail model fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, returning a safe/unsafe classification, safety category labels, and optional reasoning. Served via NVIDIA NIM.

TermsĐiều khoản3

NVIDIA Nemotron 3 Ultra

venice/nvidia-nemotron-3-ultra-550b-a55b

$0.63 in

$3.13 out

Text GenerationText GenerationFunction callingFunction calling1,000,000FreeMiễn phí

NVIDIA Nemotron 3 Ultra is built for frontier reasoning, orchestration, coding agents, deep research, and complex enterprise workflows. It delivers up to 5x faster inference and up to 30% lower cost for agentic workloads while supporting up to 1M token context. Designed for advanced function calling, structured output, and complex reasoning tasks.

TermsĐiều khoản2