Streaming

Receive completions incrementally via server-sent events. Reduce perceived latency and render tokens as they arrive.

Streaming

Every chat completions and Messages API request supports streaming. Set stream: true in the request body and the response becomes a stream of server-sent events (SSE).

Why stream?

  • Lower perceived latency — render the first token as soon as the model emits it, instead of waiting for the full response.
  • Long outputs feel responsive — users see progress on multi-second generations.
  • Early cancellation — abort a request mid-flight if the user changes their mind.

cURL

BASH
bash
curl -N https://anyrouter.dev/api/v1/chat/completions \
  -H "Authorization: Bearer ar-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4-turbo",
    "stream": true,
    "messages": [{"role": "user", "content": "Count to 10"}]
  }'

The -N flag disables cURL's output buffering so chunks appear as they arrive.

Event shape

Each line is an SSE data: event carrying a JSON payload:

TEXT
text
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Hel"}}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"lo"}}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"!"},"finish_reason":"stop"}]}
data: [DONE]
  • delta.content — the token(s) added in this event.
  • delta.role — set only on the first event.
  • finish_reason — set on the final event.
  • [DONE] — sentinel. No more events after this.

TypeScript

The official OpenAI SDK handles the SSE parsing for you:

TYPESCRIPT
typescript
const stream = await client.chat.completions.create({
  model: "openai/gpt-4-turbo",
  messages: [{ role: "user", content: "Count to 10" }],
  stream: true,
})

for await (const chunk of stream) {
  const token = chunk.choices[0]?.delta?.content ?? ""
  process.stdout.write(token)
}

Python

PYTHON
python
stream = client.chat.completions.create(
    model="openai/gpt-4-turbo",
    messages=[{"role": "user", "content": "Count to 10"}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Cancellation

Close the HTTP connection to cancel a stream. AnyRouter propagates the abort to the upstream provider so you only pay for tokens actually emitted up to the cancel point.

Tip

Wire cancellation into your UI: if the user navigates away or hits a stop button, call controller.abort() on your AbortController. AnyRouter bills only for tokens streamed before the abort.