Tutorials

Understanding Tokens, Context, and Billing

PNPriya NairMar 18, 20264 min read

To use an LLM API well, it helps to understand three connected ideas: tokens, context windows, and how those translate into cost. This tutorial explains each in plain terms and shows how Model Database makes billing transparent on every request.

None of this is provider-specific magic. The same concepts apply whether you call anthropic/claude-opus-4-8, openai/gpt-4o, or deepseek/deepseek-chat through Model Database.

What is a token?

Models do not read characters or words directly. They read tokens, which are chunks of text. A token is often a word, part of a word, or a piece of punctuation. As a rough rule of thumb in English, one token is around four characters, and 100 tokens is roughly 75 words, but this varies by language and content. Code, numbers, and unusual symbols tend to use more tokens.

Every request has two token counts that matter:

What is the context window?

The context window is the maximum number of tokens a model can consider at once, prompt plus completion combined. Different models have different limits. If a conversation grows beyond the window, you must drop or summarize older messages, otherwise the request will be rejected for being too long.

Because the chat endpoint is stateless, every turn resends the full history. That means a long conversation sends more prompt tokens each turn, which both approaches the context limit and increases cost. Trimming or summarizing old turns keeps both in check.

How billing works on Model Database

Model Database is prepaid and pay-as-you-go. You top up a credit balance, and each request deducts its cost. Pricing depends on the model and the number of prompt and completion tokens, more capable models generally cost more per token, and longer prompts and replies cost more. The key advantage is that you do not need a separate billing relationship with each provider, one balance covers them all.

See the cost of every request

You never have to guess what a call cost. Every billable response includes two headers:

Inspect them with curl using the -i flag:

curl -i https://modeldatabase.com/v1/chat/completions \
  -H "Authorization: Bearer $MDB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"openai/gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}'

You will see lines like X-MDB-Charged-USD and X-MDB-Balance-USD in the response headers.

Read token usage in code

Standard chat completions return a usage object with token counts, which is handy for logging and estimating cost before it hits your balance:

from openai import OpenAI

client = OpenAI(base_url="https://modeldatabase.com/v1", api_key="mdb_live_...")

resp = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Summarize the water cycle."}],
)

print(resp.usage.prompt_tokens)      # tokens you sent
print(resp.usage.completion_tokens)  # tokens generated
print(resp.usage.total_tokens)       # combined

Practical ways to control cost

Once tokens, context, and billing click, you can build cost-aware apps with confidence. Top up credit and watch your usage at your dashboard, and find model details in the docs.

← All articles Get your API key →