Meta

Generation over generation, Meta Llama 3 demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning.

TermsĐiều khoản1

Llama 3.1 405B Instruct

meta/llama-3.1-405b-instruct

$0 in

$0 out

Text GenerationText GenerationFunction callingFunction calling128,000FreeMiễn phí

Meta's Llama 3.1 405B Instruct is the flagship open-weight model of the Llama 3.1 family, with strong reasoning, coding, and multilingual performance and a 128k context window. Served free through the GitHub Models tier.

TermsĐiều khoản2

Llama 3.1 70B Instruct

meta/llama-3.1-70b-instruct

$0 in

$0 out

Text GenerationText GenerationFunction callingFunction calling131,072ZDRFreeMiễn phí

Meta's Llama 3.1 70B instruction-tuned model with strong reasoning and multilingual capabilities.

TermsĐiều khoản4

TTFT 600msTPS 50 tok/s

Llama 3.1 8B Instruct AWQ

meta/llama-3.1-8b-instruct-awq

$0.12 in

$0.27 out

Text GenerationText Generation8,192ZDRFreeMiễn phí

Quantized (int4) generative text model with 8 billion parameters from Meta.

TermsĐiều khoản1

Llama 3.1 8B Instruct Fast

meta/llama-3.1-8b-instruct-fast

$0.04 in

$0.38 out

Text GenerationText Generation128,000ZDRFreeMiễn phí

[Fast version] The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

TermsĐiều khoản1

Llama 3.1 8B Instruct FP8

meta/llama-3.1-8b-instruct-fp8

$0.15 in

$0.29 out

Text GenerationText Generation32,000ZDRFreeMiễn phí

Llama 3.1 8B quantized to FP8 precision

TermsĐiều khoản1

Llama 3.1 8B Instruct

meta/llama-3.1-8b-instruct

$0 in

$0 out

Text GenerationText GenerationFunction callingFunction calling131,072FreeMiễn phí

Meta's compact Llama 3.1 8B instruction-tuned model optimized for fast inference and edge deployments.

TermsĐiều khoản5

TTFT 200msTPS 150 tok/s

Llama 3.2 11B Vision Instruct

meta/llama-3.2-11b-vision-instruct

$0 in

$0 out

Text GenerationText GenerationFunction callingFunction calling131,072ZDRFreeMiễn phí

The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image.

TermsĐiều khoản3

Llama 3.2 1B Instruct

meta/llama-3.2-1b-instruct

$0 in

$0 out

Text GenerationText GenerationFunction callingFunction calling131,072ZDRFreeMiễn phí

The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks.

TermsĐiều khoản4

Llama 3.2 3B Instruct

meta/llama-3.2-3b-instruct

$0.05 in

$0.34 out

Text GenerationText GenerationFunction callingFunction calling60,000ZDRFreeMiễn phí

Meta's compact Llama 3.2 3B instruction-tuned model optimized for edge devices and low-latency applications.

TermsĐiều khoản4

TTFT 120msTPS 200 tok/s

Llama 3.3 70B Instruct FP8 Fast

meta/llama-3.3-70b-instruct-fp8-fast

$0 in

$0 out

Text GenerationText GenerationFunction callingFunction calling131,072ZDRFreeMiễn phí

Llama 3.3 70B quantized to fp8 precision, optimized to be faster.

TermsĐiều khoản4

Llama 4 Scout 17B 16E

meta/llama-4-scout-17b-16e-instruct

$0 in

$0 out

Text GenerationText GenerationFunction callingFunction calling131,072ZDRFreeMiễn phí

Meta's Llama 4 Scout with 17B parameters and 16 experts, featuring native multimodal support with 10M context window via interleaved attention.

TermsĐiều khoản3

TTFT 400msTPS 80 tok/s

Llama Guard 3 8B

meta/llama-guard-3-8b

$0.48 in

$0.03 out

Text GenerationText Generation131,072ZDRFreeMiễn phí

Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.

TermsĐiều khoản1

Llama 2 7B Chat HF LoRA

meta-llama/llama-2-7b-chat-hf-lora

$0.11 in

$0.11 out

Text GenerationText Generation8,192ZDRFreeMiễn phí

This is a Llama2 base model that Cloudflare dedicated for inference with LoRA adapters. Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.

TermsĐiều khoản1