A hallucination is a confident, fluent, wrong answer. LLMs generate plausible text, not verified facts, so some rate of fabrication is inherent. You can't eliminate it, but you can drive it down dramatically with the right engineering. Here are tactics that work in production on the Model Database API.
Ground the model in real context
The biggest lever is retrieval. A model answering from supplied documents fabricates far less than one answering from memory. Pull the relevant facts, put them in the prompt, and instruct the model to use only that context.
from openai import OpenAI
client = OpenAI(base_url="https://modeldatabase.com/v1", api_key="mdb_live_...")
system = ("Answer ONLY from the context below. "
"If the answer isn't there, reply exactly: 'Not found in sources.'")
resp = client.chat.completions.create(
model="anthropic/claude-sonnet-4-6",
temperature=0,
messages=[
{"role": "system", "content": system},
{"role": "user", "content": f"Context:\n{context}\n\nQ: {question}"},
],
)
Give the model an exit
Models hallucinate partly because they're nudged to always answer. Explicitly permit "I don't know" and make it the required response when context is missing. A model that can decline is a model that lies less. Reinforce this with one or two examples where the correct answer is a refusal.
Lower the temperature
For factual tasks, set temperature to 0 or near it. Creativity is the enemy of accuracy here; you want the most probable, grounded continuation, not a surprising one.
Ask for citations and verify them
Require the model to attach a source identifier to every claim, then programmatically check that each cited ID actually exists in your context. Drop or flag any claim whose citation can't be matched. This converts a vague trust problem into a mechanical check.
cited = extract_citation_ids(answer)
unknown = [c for c in cited if c not in context_ids]
if unknown:
answer = "[unverified citation] " + answer
Decompose and verify hard claims
For high-stakes answers, run a second pass: ask a model to list the factual claims in the answer and check each against the source. Disagreement between the writer and the checker is a strong hallucination signal you can route to a human or to a regeneration.
verify = client.chat.completions.create(
model="anthropic/claude-opus-4-8",
temperature=0,
messages=[{"role": "user", "content":
f"For each claim, say SUPPORTED or UNSUPPORTED by the context.\n"
f"Context:\n{context}\n\nAnswer:\n{answer}"}],
)
Constrain the surface area
- Narrow the task: a model asked one focused question fabricates less than one asked to write an essay.
- Prefer extraction over generation when you only need facts that exist in the source.
- Avoid leading prompts that presuppose a false premise; the model will often play along.
Honest limitations
None of this is a guarantee. Grounding fails when retrieval returns the wrong passage. Citations can be attached to claims they don't actually support. Verifier models hallucinate too, just less often on a narrower task. The realistic goal is layered defense: grounding plus an explicit out plus citation checks plus, for critical paths, human review. Measure your hallucination rate on a labeled set so you can prove each layer helps rather than assuming it does.
Start grounding your features with a key from your dashboard, and compare model behavior using the reference in the docs.