The worst time to discover what an LLM feature costs is after it ships and the bill arrives. The best time is before you write the production code, when a back-of-the-envelope estimate can still change your design. Estimating cost per feature up front turns LLM spend from a surprise into a planned line item.
Here is a repeatable way to estimate, validate, and track feature cost on Model Database.
Start with a usage model
Every feature has a few drivers that determine its cost. Write them down:
- Calls per user action: does one click trigger one model call or five?
- Frequency: how many times per day does the average user do this?
- Tokens in: system prompt + context + user input.
- Tokens out: bounded by your
max_tokens. - Model: which one, since rates vary widely.
With those five numbers you can estimate cost before a single request is sent.
Do the napkin math
Use the rough rule that one token is about four characters of English. Suppose a "summarize this thread" feature has a 500-token system prompt, pulls in 1,500 tokens of thread context, and produces a 250-token summary. That is 2,000 input tokens and 250 output tokens per call.
Now layer in volume. If 10,000 users each use it three times a day, that is 30,000 calls daily: 60 million input tokens and 7.5 million output tokens per day. Multiply each by the per-token rate for your chosen model and you have a daily cost estimate you can defend in a planning meeting.
Validate with a real probe
An estimate built on assumptions can be wrong. Before trusting it, send a handful of realistic requests and read the actual charge from the headers. Ground truth beats a spreadsheet.
import openai
client = openai.OpenAI(base_url="https://modeldatabase.com/v1",
api_key="mdb_live_...")
total = 0.0
for sample in representative_inputs: # 20-50 real examples
r = client.chat.completions.with_raw_response.create(
model="anthropic/claude-sonnet-4-6",
max_tokens=300,
messages=build_messages(sample))
total += float(r.headers["X-MDB-Charged-USD"])
print("avg cost/call:", total / len(representative_inputs))
Multiply that measured average by your projected call volume and you have an estimate grounded in real X-MDB-Charged-USD figures rather than guesses.
Compare designs before committing
Estimation pays off most when you use it to choose. Run the same probe across a few designs and pick the cheapest one that meets quality:
- Frontier model versus a smaller one like
openai/gpt-4o-mini. - Full context versus retrieved snippets.
- One big call versus two cheaper specialized calls.
Often a design change cuts cost far more than any per-request tweak, and you can only see that by comparing estimates side by side.
Convert cost to a unit that matters
Raw daily dollars are hard to reason about. Translate the estimate into a business unit: cost per active user, per document processed, or per conversation. A feature that costs a fraction of a cent per user per day is easy to approve; one that costs more than a user is worth needs a redesign. This framing also tells you whether the feature can pay for itself.
Set guardrails from the estimate
Your estimate directly informs your safety limits. Set max_tokens from the output size you measured, and set the per-request cost cap just above your largest legitimate call so a malfunction is blocked rather than billed. The estimate is not just a forecast, it is the source of your limits.
Keep the estimate alive
After launch, compare logged X-MDB-Charged-USD against your forecast. If reality drifts from the model, your prompts or usage patterns have changed, and updating the estimate keeps your forecasts trustworthy for the next feature.
Probe a few real requests and watch the charges add up on your dashboard, then compare model rates on the pricing page before you ship.