Cost & Scaling

Monitoring Usage and Spend in Real Time

EFElena FischerMar 10, 20264 min read

You cannot control what you cannot see. The teams that keep LLM costs under control are the ones that treat spend as a first-class metric, monitored as closely as latency or error rate. The good news is that Model Database gives you everything you need to do this on every single response.

This article shows how to build real-time visibility into usage and spend using the X-MDB headers and your dashboard.

Two headers, total transparency

Every response from Model Database carries two headers:

Because the charge is per request, you never have to estimate or reconcile against a monthly statement. The number is authoritative and immediate.

Capture the headers in code

With the OpenAI SDK, grab the raw response so you can read the headers alongside the parsed body.

import openai
client = openai.OpenAI(base_url="https://modeldatabase.com/v1",
                       api_key="mdb_live_...")

resp = client.chat.completions.with_raw_response.create(
    model="openai/gpt-4o-mini",
    messages=[{"role":"user","content":"Hello"}])

charged = float(resp.headers["X-MDB-Charged-USD"])
balance = float(resp.headers["X-MDB-Balance-USD"])
completion = resp.parse()
log_spend(endpoint="chat", model="openai/gpt-4o-mini",
          charged=charged, balance=balance)

Or read them straight off a raw HTTP response:

curl -sD - https://modeldatabase.com/v1/chat/completions \
  -H "Authorization: Bearer mdb_live_..." \
  -H "Content-Type: application/json" \
  -d @req.json | grep -i x-mdb

Turn per-request data into metrics

Once you log X-MDB-Charged-USD with useful labels, you can answer the questions that actually matter. Emit it to your metrics system tagged by endpoint, model, customer, and feature.

Alert on the balance

The balance header is a built-in low-fuel gauge. Because billing is prepaid and a zero balance returns HTTP 402, you want to act before you hit empty. Watch the trend and alert early.

if balance < 25.0:
    notify_ops(f"MDB balance low: ${balance:.2f}")

Also handle the 402 explicitly so a depleted balance degrades gracefully instead of throwing raw errors at users.

try:
    resp = client.chat.completions.create(...)
except openai.APIStatusError as e:
    if e.status_code == 402:
        queue_for_retry_after_topup(request)

Use the dashboard for the big picture

Your code gives you fine-grained, labeled telemetry; the dashboard gives you the aggregate view: current balance, spend over time, and usage trends across models. Use the dashboard to top up credit, watch daily totals, and verify that the numbers your logs report match the account of record. Together they form a complete loop, real-time signals in your app and a reliable summary in the console.

Watch the cost cap in action

The per-request cost cap is part of your monitoring story too. If you see requests getting blocked by the cap, that is a signal: either a prompt has grown too large or something is generating far more output than intended. Treat cap hits as an alertable event rather than silent noise.

Build the habit early

Add header logging on day one, before you have a cost problem. A few lines of instrumentation now means that when traffic grows you already have the dashboards to understand it, and you will never be surprised by a bill again.

Top up credit and review your spend trends on your dashboard, or check per-model rates on the pricing page.

← All articles Get your API key →