Tutorials

How to Stream Responses With Server-Sent Events

PNPriya NairApr 7, 20264 min read

Waiting for a long model response to finish before showing anything feels slow. Streaming fixes that: the model sends its answer token by token as it is generated, so your users see text appear in real time, just like a typing effect. Model Database supports streaming through the same OpenAI-compatible interface using Server-Sent Events (SSE).

This tutorial shows how streaming works on the wire and how to consume it from curl, Python, and Node.

How streaming works

To enable streaming, add "stream": true to your chat completion request. Instead of one JSON body, the server keeps the connection open and pushes a sequence of SSE events. Each event is a line beginning with data: followed by a JSON chunk. The chunks contain incremental content in choices[0].delta.content rather than the full message. The stream ends with a final line:

data: [DONE]

Streaming with curl

You can watch the raw event stream directly. The -N flag disables buffering so chunks print as they arrive:

curl -N https://modeldatabase.com/v1/chat/completions \
  -H "Authorization: Bearer $MDB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "stream": true,
    "messages": [{"role": "user", "content": "Write a haiku about the ocean."}]
  }'

You will see a series of data: {...} lines, each carrying a small piece of the haiku, then data: [DONE].

Streaming in Python

The OpenAI SDK turns the SSE stream into a simple iterator. Set stream=True and loop over the chunks, printing each delta as it arrives:

from openai import OpenAI

client = OpenAI(
    base_url="https://modeldatabase.com/v1",
    api_key="mdb_live_xxxxxxxxxxxxxxxxxxxxxxxx",
)

stream = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Explain SSE in three sentences."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
print()

The flush=True ensures text appears immediately rather than being buffered. To capture the full text, accumulate the deltas into a string as you go.

Streaming in Node.js

In Node, the returned stream is an async iterable. Use for await to read chunks and write them to stdout without newlines:

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://modeldatabase.com/v1",
  apiKey: process.env.MDB_API_KEY,
});

const stream = await client.chat.completions.create({
  model: "google/gemini-2.0-flash",
  messages: [{ role: "user", content: "Count from 1 to 5 slowly." }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content || "";
  process.stdout.write(delta);
}
process.stdout.write("\n");

Billing and usage with streaming

Streaming responses are billed just like normal ones. The X-MDB-Charged-USD and X-MDB-Balance-USD headers are sent with the response, and they are available on the raw HTTP response object before you begin iterating the body. If you want token usage in the stream itself, you can request it with the OpenAI stream_options parameter:

stream = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hi"}],
    stream=True,
    stream_options={"include_usage": True},
)

With this enabled, a final chunk carries a usage object with prompt and completion token counts after the content finishes.

Tips for production

Streaming makes your app feel dramatically faster for the same cost. Get your key at your dashboard and see the full streaming reference in the docs.

← All articles Get your API key →