GLM-4.7-Flash is a fast and efficient multilingual text generation model with a 131,072 token context window. Optimized for dialogue, instruction-following, and multi-turn tool calling across 100+ languages.
Upstream Providers
ZDR only
Z-AI
z-ai-byok
In$0Out$0TTFT200msTPS120.0 tps
No ZDRTools
Ollama Cloud
ollama-byok
In$0Out$0TTFT—TPS—
BYOKBYOKNo ZDRTools
Z-AI
z-ai
In$0.06Out$0.40TTFT200msTPS120.0 tps
CachedNo ZDRTools
DeepInfra
deepinfra
In$0.06Out$0.40TTFT200msTPS120.0 tps
CachedNo ZDRTools
Z-AI
z-ai-coding-byok
In$0.06Out$0.40TTFT200msTPS120.0 tps
BYOKBYOKNo ZDRTools
Cloudflare AI Gateway
cloudflare
In$0.06Out$0.40TTFT200msTPS120.0 tps
ZDRTools
API Details
Chat Completions API
/api/v1/chat/completions
OpenAI-compatible endpoint with a messages array. Standard interface for chat-based interactions.
Uptime & Health
Uptime %Finish ReasonsLatency
No uptime data yet
These providers haven't been health-probed for this model yet. The router still routes around upstreams that fail live requests — uptime fills in once probe history accrues.