Meta

17 modelsModel creator

Models published by Meta, available through the AnyRouter API. Each can route across multiple upstream providers for availability and price.

Text GenerationText Generation4,096ZDRFreeMiễn phí

Full precision (fp16) generative text model with 7 billion parameters from Meta

Text GenerationText Generation8,192ZDRFreeMiễn phí

Quantized (int8) generative text model with 7 billion parameters from Meta

Text GenerationText Generation8,192ZDRFreeMiễn phí

Quantized (int4) generative text model with 8 billion parameters from Meta.

Text GenerationText Generation7,968ZDRFreeMiễn phí

Generation over generation, Meta Llama 3 demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning.

Text GenerationText GenerationFunction callingFunction calling128,000FreeMiễn phí

Meta's Llama 3.1 405B Instruct is the flagship open-weight model of the Llama 3.1 family, with strong reasoning, coding, and multilingual performance and a 128k context window. Served free through the GitHub Models tier.

Text GenerationText GenerationFunction callingFunction calling131,072ZDRFreeMiễn phí

Meta's Llama 3.1 70B instruction-tuned model with strong reasoning and multilingual capabilities.

TermsĐiều khoản4
TTFT 600msTPS 50 tok/s
Text GenerationText Generation8,192ZDRFreeMiễn phí

Quantized (int4) generative text model with 8 billion parameters from Meta.

Text GenerationText Generation128,000ZDRFreeMiễn phí

[Fast version] The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

Text GenerationText Generation32,000ZDRFreeMiễn phí

Llama 3.1 8B quantized to FP8 precision

Text GenerationText GenerationFunction callingFunction calling131,072FreeMiễn phí

Meta's compact Llama 3.1 8B instruction-tuned model optimized for fast inference and edge deployments.

TermsĐiều khoản5
TTFT 200msTPS 150 tok/s
Text GenerationText GenerationFunction callingFunction calling131,072ZDRFreeMiễn phí

The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image.

Text GenerationText GenerationFunction callingFunction calling131,072ZDRFreeMiễn phí

The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks.

Text GenerationText GenerationFunction callingFunction calling60,000ZDRFreeMiễn phí

Meta's compact Llama 3.2 3B instruction-tuned model optimized for edge devices and low-latency applications.

TermsĐiều khoản4
TTFT 120msTPS 200 tok/s
Text GenerationText GenerationFunction callingFunction calling131,072ZDRFreeMiễn phí

Llama 3.3 70B quantized to fp8 precision, optimized to be faster.

Text GenerationText GenerationFunction callingFunction calling131,072ZDRFreeMiễn phí

Meta's Llama 4 Scout with 17B parameters and 16 experts, featuring native multimodal support with 10M context window via interleaved attention.

TermsĐiều khoản3
TTFT 400msTPS 80 tok/s
Text GenerationText Generation131,072ZDRFreeMiễn phí

Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.

Text GenerationText Generation8,192ZDRFreeMiễn phí

This is a Llama2 base model that Cloudflare dedicated for inference with LoRA adapters. Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.