Meta
Models published by Meta, available through the AnyRouter API. Each can route across multiple upstream providers for availability and price.
Full precision (fp16) generative text model with 7 billion parameters from Meta
Quantized (int8) generative text model with 7 billion parameters from Meta
Quantized (int4) generative text model with 8 billion parameters from Meta.
Generation over generation, Meta Llama 3 demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning.
Meta's Llama 3.1 405B Instruct is the flagship open-weight model of the Llama 3.1 family, with strong reasoning, coding, and multilingual performance and a 128k context window. Served free through the GitHub Models tier.
Meta's Llama 3.1 70B instruction-tuned model with strong reasoning and multilingual capabilities.
Quantized (int4) generative text model with 8 billion parameters from Meta.
[Fast version] The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
Llama 3.1 8B quantized to FP8 precision
Meta's compact Llama 3.1 8B instruction-tuned model optimized for fast inference and edge deployments.
The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image.
The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks.
Meta's compact Llama 3.2 3B instruction-tuned model optimized for edge devices and low-latency applications.
Llama 3.3 70B quantized to fp8 precision, optimized to be faster.
Meta's Llama 4 Scout with 17B parameters and 16 experts, featuring native multimodal support with 10M context window via interleaved attention.
Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.
This is a Llama2 base model that Cloudflare dedicated for inference with LoRA adapters. Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.