OpenAI GPT OSS model with ~3000 tokens/sec inference speed via Cerebras, optimized for real-time coding assistance and efficient reasoning across science, math, and coding applications
Upstream Providers
ZDR only
Cerebras
cerebras-byok
In$0Out$0TTFT10msTPS3000.0 tps
No ZDRToolsReasoning
Cerebras
cerebras
In$0.35Out$0.75TTFT10msTPS3000.0 tps
No ZDRToolsReasoning
API Details
Chat Completions API
/api/v1/chat/completions
OpenAI-compatible endpoint with a messages array. Standard interface for chat-based interactions.
Responses API
/api/v1/responses
Native OpenAI Responses format with a simplified input parameter for single-turn requests.
Uptime & Health
Uptime %Finish ReasonsLatency
No uptime data yet
These providers haven't been health-probed for this model yet. The router still routes around upstreams that fail live requests — uptime fills in once probe history accrues.