NVIDIA

5 modelsModel creator

Models published by NVIDIA, available through the AnyRouter API. Each can route across multiple upstream providers for availability and price.

Text GenerationText GenerationFunction callingFunction calling131,072FreeMiễn phí

NVIDIA's Mixture-of-Experts model with 120B total parameters and 12B active, optimized for efficient inference with strong reasoning capabilities.

TermsĐiều khoản2
TTFT 400msTPS 60 tok/s
Text GenerationText Generation32,768FreeMiễn phí

A 120 billion parameter model from NVIDIA's Nemotron-3 family, optimized for efficient text generation and understanding tasks.

Text GenerationText GenerationFunction callingFunction calling1,000,000FreeMiễn phí

NVIDIA Nemotron-3-Ultra-550B-A55B is a 550B parameter (55B active) frontier model built on a LatentMoE hybrid architecture combining Mamba-2, MoE, and Attention with Multi-Token Prediction. Features a 1M token context window, configurable reasoning mode (enable_thinking), and strong multilingual support across English, French, Spanish, Italian, German, Japanese, Korean, Hindi, Brazilian Portuguese, and Chinese. Best suited for complex agentic workflows, long-context analysis, tool use, and high-stakes RAG.

MultimodalMultimodal128,000FreeMiễn phí

A compact 4B-parameter multimodal guardrail model fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, returning a safe/unsafe classification, safety category labels, and optional reasoning. Served via NVIDIA NIM.

Text GenerationText GenerationFunction callingFunction calling1,000,000FreeMiễn phí

NVIDIA Nemotron 3 Ultra is built for frontier reasoning, orchestration, coding agents, deep research, and complex enterprise workflows. It delivers up to 5x faster inference and up to 30% lower cost for agentic workloads while supporting up to 1M token context. Designed for advanced function calling, structured output, and complex reasoning tasks.