Model Guides

Working With Long-Context Models

MBMarcus BellJan 13, 20264 min read

Long-context models let you feed an entire book, codebase, or document set into a single prompt. That capability unlocks powerful workflows, but it also introduces new failure modes around cost, latency, and how reliably a model actually uses everything you give it. Working effectively with long context is as much about discipline as it is about window size.

This guide covers when long context helps, its pitfalls, and how to use it well on Model Database.

What long context buys you

A large context window means you can include more source material directly in the prompt instead of building complex retrieval pipelines. Common uses include analyzing a long contract, answering questions over a full set of meeting transcripts, reviewing a large code module, or maintaining a long conversation history. When all the relevant information fits in the window, the model can reason over it holistically.

The pitfalls of stuffing the window

Bigger is not automatically better. Be aware of three issues:

The lesson: use long context because the task needs it, not because the window is available.

Choosing a long-context model

Different models offer different maximum context lengths, so check each model's limit before committing. Strong general models like anthropic/claude-sonnet-4-6, anthropic/claude-opus-4-8, and google/gemini-2.0-flash are common choices for long-input work, but always confirm the current context limit for the specific model. You can list available models programmatically:

curl https://modeldatabase.com/v1/models \
  -H "Authorization: Bearer mdb_live_..."

A long-context request

Sending a large document is the same chat completion call, just with a big user message:

from openai import OpenAI

client = OpenAI(base_url="https://modeldatabase.com/v1", api_key="mdb_live_...")

document = open("contract.txt").read()

resp = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-6",
    messages=[
        {"role": "system", "content": "Answer only from the provided document. If unknown, say so."},
        {"role": "user", "content": f"Document:\n{document}\n\nQuestion: What is the termination notice period?"},
    ],
)
print(resp.choices[0].message.content)

Grounding the model with a system instruction to answer only from the document reduces hallucination on long inputs.

Long context vs retrieval

Long context and retrieval-augmented generation (RAG) solve overlapping problems. As a rule of thumb:

Many production systems combine both: retrieve the most relevant chunks, then pass a generous amount of that curated context to a capable model.

Keep cost under control

Long-context calls are where costs can surprise you, so measure deliberately. Every billable response returns X-MDB-Charged-USD and X-MDB-Balance-USD, letting you see exactly what a large prompt costs. A few habits help:

For user-facing long-context work, enable streaming with "stream": true so the first tokens appear quickly even when the model has a lot to read.

Working with big documents or codebases? Get a key and add credit at your dashboard, check each model's context limits, and read the docs for streaming and the cost headers that keep long-context spend predictable.

← All articles Get your API key →