Showing 21 of 21 models

Claude Haiku 4.5

anthropic/claude-haiku-4.5

200K

Anthropic's fastest and most affordable Claude model — ideal for low-latency, high-volume tasks

$1.00/1M in · $5.00/1M out400ms · 180 tok/s
Low LatencyChatSummarizationClassification+1

Claude Opus 4.6

anthropic/claude-opus-4.6

1000K

Anthropic's most intelligent model — state-of-the-art reasoning, extended thinking, 1M-token context, and top-tier coding capabilities

$15.00/1M in · $75.00/1M out2100ms · 60 tok/s
Advanced ReasoningExtended ThinkingLong ContextCode+2

Claude Sonnet 4.6

anthropic/claude-sonnet-4.6

1000K

Anthropic's flagship production model — 1M context, best balance of intelligence and speed

$3.00/1M in · $15.00/1M out1100ms · 90 tok/s
ReasoningCodeAnalysisLong Context+2

DeepSeek R1 Distill Qwen 32B

deepseek/deepseek-r1-distill-qwen-32b

131K

DeepSeek R1 distilled into Qwen 32B — strong reasoning at zero cost on Cloudflare edge

$0/1M in · $0/1M out700ms · 90 tok/s
ChatReasoningCode GenerationMath+1

DeepSeek V3.2

deepseek/deepseek-v3.2

164K

DeepSeek's flagship open-source MoE model — strong reasoning, coding, and math

$0.26/1M in · $0.38/1M out900ms · 60 tok/s
ChatReasoningCode GenerationMath+1

Gemini 2.5 Flash

google/gemini-2.5-flash

1049K

Google's fast multimodal model with 1M context, vision, and audio I/O

$0.30/1M in · $2.50/1M out400ms · 250 tok/s
VisionAudioLong ContextTool Use+1

Gemini 3 Flash (Preview)

google/gemini-3-flash-preview

1049K

Google's next-generation Gemini 3 Flash — 1M context, enhanced reasoning and multimodal

$0.50/1M in · $3.00/1M out500ms · 220 tok/s
VisionAudioVideoLong Context+2

Gemini 3.1 Flash Lite (Preview)

google/gemini-3.1-flash-lite-preview

1049K

Ultra-affordable Gemini 3.1 Flash Lite — very cheap, 1M context, multimodal

$0.25/1M in · $1.50/1M out350ms · 280 tok/s
VisionLong ContextLow LatencyTool Use

Gemma 3 12B

google/gemma-3-12b-it

33K

Google's Gemma 3 12B — compact and efficient, free on Cloudflare Workers AI edge

$0/1M in · $0/1M out400ms · 150 tok/s
ChatSummarizationClassificationLow Latency+1

Gemma 4 26B

google/gemma-4-26b-a4b-it

131K

Google's Gemma 4 26B — next-generation open model with improved reasoning and efficiency

$0.12/1M in · $0.40/1M out500ms · 120 tok/s
ChatCode GenerationReasoningMultilingual

Gemma 4 31B

google/gemma-4-31b-it

131K

Google's Gemma 4 31B — larger open model with stronger reasoning, instruction following, and multilingual performance

$0.18/1M in · $0.54/1M out620ms · 95 tok/s
ChatCode GenerationAdvanced ReasoningMultilingual+1

Llama 3.1 405B

meta-llama/llama-3.1-405b-instruct

131K

Flagship open-source model matching frontier performance at 128k context

$5.00/1M in · $16.00/1M out2000ms · 50 tok/s
ReasoningCodeLong ContextMultilingual

Llama 3.1 70B

meta-llama/llama-3.1-70b-instruct

131K

Best open-source model for most production use cases with 128k context

$0.40/1M in · $0.40/1M out900ms · 115 tok/s
Text GenerationCodeLong ContextInstruction Following

Llama 3.1 8B Instruct

meta-llama/llama-3.1-8b-instruct

131K

Meta's Llama 3.1 8B — lightweight, fast, and free via Cloudflare Workers AI edge

$0/1M in · $0/1M out300ms · 200 tok/s
ChatCode GenerationLow LatencyFree Tier

Llama 3.3 70B Instruct

meta-llama/llama-3.3-70b-instruct

131K

Meta's Llama 3.3 70B — high-quality open model, available free on Cloudflare Workers AI

$0/1M in · $0/1M out800ms · 80 tok/s
ChatCode GenerationSummarizationClassification+1

Mistral Small 3.1 24B

mistralai/mistral-small-3.1-24b-instruct

131K

Mistral Small 3.1 — efficient 24B model with vision, free on Cloudflare Workers AI

$0/1M in · $0/1M out500ms · 120 tok/s
ChatVisionCode GenerationLow Latency+1

GPT-4o Mini

openai/gpt-4o-mini

128K

Cost-efficient small model that outperforms GPT-3.5 Turbo on most benchmarks

$0.15/1M in · $0.60/1M out500ms · 200 tok/s
VisionTextFunction CallingLow Cost

GPT-4o

openai/gpt-4o

128K

Omni model with native vision, audio, and text at GPT-4 intelligence level

$2.50/1M in · $10.00/1M out800ms · 110 tok/s
VisionAudioTextReasoning+1

Qwen 2.5 72B Instruct

qwen/qwen2.5-72b-instruct

131K

Alibaba's Qwen 2.5 72B — strong multilingual reasoning, free on Cloudflare Workers AI

$0/1M in · $0/1M out900ms · 70 tok/s
ChatCode GenerationMultilingualReasoning+1

Qwen 3.5 9B

qwen/qwen3.5-9b

131K

Alibaba's Qwen 3.5 9B — compact, efficient, and very affordable reasoning model

$0.05/1M in · $0.15/1M out250ms · 180 tok/s
ChatCode GenerationMultilingualReasoning+1

Step 3.5 Flash

stepfun/step-3.5-flash

66K

StepFun Step 3.5 Flash — fast, affordable model with strong multilingual support

$0.10/1M in · $0.30/1M out400ms · 150 tok/s
ChatCode GenerationMultilingualLow Latency