To use an LLM API well, it helps to understand three connected ideas: tokens, context windows, and how those translate into cost. This tutorial explains each in plain terms and shows how Model Database makes billing transparent on every request.
None of this is provider-specific magic. The same concepts apply whether you call anthropic/claude-opus-4-8, openai/gpt-4o, or deepseek/deepseek-chat through Model Database.
What is a token?
Models do not read characters or words directly. They read tokens, which are chunks of text. A token is often a word, part of a word, or a piece of punctuation. As a rough rule of thumb in English, one token is around four characters, and 100 tokens is roughly 75 words, but this varies by language and content. Code, numbers, and unusual symbols tend to use more tokens.
Every request has two token counts that matter:
- Prompt tokens — everything you send: system prompt, conversation history, and the new user message.
- Completion tokens — everything the model generates in its reply.
What is the context window?
The context window is the maximum number of tokens a model can consider at once, prompt plus completion combined. Different models have different limits. If a conversation grows beyond the window, you must drop or summarize older messages, otherwise the request will be rejected for being too long.
Because the chat endpoint is stateless, every turn resends the full history. That means a long conversation sends more prompt tokens each turn, which both approaches the context limit and increases cost. Trimming or summarizing old turns keeps both in check.
How billing works on Model Database
Model Database is prepaid and pay-as-you-go. You top up a credit balance, and each request deducts its cost. Pricing depends on the model and the number of prompt and completion tokens, more capable models generally cost more per token, and longer prompts and replies cost more. The key advantage is that you do not need a separate billing relationship with each provider, one balance covers them all.
See the cost of every request
You never have to guess what a call cost. Every billable response includes two headers:
- X-MDB-Charged-USD — the exact charge for this request.
- X-MDB-Balance-USD — your remaining balance after the charge.
Inspect them with curl using the -i flag:
curl -i https://modeldatabase.com/v1/chat/completions \
-H "Authorization: Bearer $MDB_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"openai/gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}'
You will see lines like X-MDB-Charged-USD and X-MDB-Balance-USD in the response headers.
Read token usage in code
Standard chat completions return a usage object with token counts, which is handy for logging and estimating cost before it hits your balance:
from openai import OpenAI
client = OpenAI(base_url="https://modeldatabase.com/v1", api_key="mdb_live_...")
resp = client.chat.completions.create(
model="anthropic/claude-sonnet-4-6",
messages=[{"role": "user", "content": "Summarize the water cycle."}],
)
print(resp.usage.prompt_tokens) # tokens you sent
print(resp.usage.completion_tokens) # tokens generated
print(resp.usage.total_tokens) # combined
Practical ways to control cost
- Pick the right model. Use a smaller model like openai/gpt-4o-mini or google/gemini-2.0-flash for simple tasks, and reserve larger models for hard ones.
- Cap output with
max_tokens. If you only need a short answer, limit completion length. - Trim history. Drop or summarize old turns so you are not paying to resend a huge transcript every request.
- Be concise in prompts. Shorter system prompts and inputs mean fewer prompt tokens.
- Monitor the headers. Log X-MDB-Charged-USD to spot expensive endpoints in your app.
Once tokens, context, and billing click, you can build cost-aware apps with confidence. Top up credit and watch your usage at your dashboard, and find model details in the docs.