A customer support assistant is one of the highest-leverage features you can ship: it deflects repetitive tickets, answers customers instantly, and hands off cleanly to humans when it hits a wall. With Model Database you can build one against a single OpenAI-compatible API, swapping models freely as your needs change.
This walkthrough covers the architecture, a working backend endpoint, and the production concerns that separate a demo from something you can put in front of real users.
Architecture at a glance
A practical support assistant has four moving parts:
- Retrieval: pull relevant help-center articles and account context for the user's question.
- The model call: send a system prompt, the retrieved context, and the conversation history to a chat model.
- Escalation: detect low confidence or explicit requests for a human and route to your ticketing system.
- Logging: store transcripts for quality review and to improve your knowledge base.
Model Database sits in the middle: your code talks to https://modeldatabase.com/v1 and you pick a model per request. Use a fast, inexpensive model like openai/gpt-4o-mini for routine questions, and reserve anthropic/claude-sonnet-4-6 for complex or sensitive threads.
The core endpoint
Point the OpenAI SDK at Model Database by changing two lines: the base URL and the API key. Everything else is standard.
from openai import OpenAI
client = OpenAI(
base_url="https://modeldatabase.com/v1",
api_key="mdb_live_...",
)
SYSTEM_PROMPT = """You are a support agent for Acme.
Answer only from the provided context. If the context does
not contain the answer, say you are not sure and offer to
connect the user with a human. Be concise and friendly."""
def answer(question, context_docs, history):
context = "\n\n".join(context_docs)
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "system", "content": f"Knowledge base:\n{context}"},
*history,
{"role": "user", "content": question},
]
resp = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=messages,
temperature=0.2,
)
return resp.choices[0].message.content
Keeping temperature low makes answers more deterministic and less likely to invent policy. Grounding the model in retrieved documents is what keeps it honest.
Streaming for a responsive UI
Support chat feels much faster when tokens appear as they are generated. Set stream=True and forward chunks to your frontend over Server-Sent Events or a WebSocket.
stream = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=messages,
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
yield delta # push to the browser
Knowing when to escalate
The assistant should never guess about billing, refunds, or account security. A simple, reliable pattern is to ask the model to return structured output that includes a confidence signal and an escalation flag.
import json
ROUTER_PROMPT = """Classify the user's message. Return JSON:
{"category": "...", "needs_human": true|false, "reason": "..."}
Set needs_human to true for refunds, legal, or account access."""
def triage(question):
resp = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[
{"role": "system", "content": ROUTER_PROMPT},
{"role": "user", "content": question},
],
response_format={"type": "json_object"},
temperature=0,
)
return json.loads(resp.choices[0].message.content)
If needs_human is true, create a ticket and tell the customer a person will follow up. This keeps the bot inside its competence and builds trust.
Production concerns
- Latency budget: cap context size and history length so responses stay under a second or two. Trim old turns rather than sending the whole transcript.
- Cost control: route the bulk of traffic to a small model and only upgrade when triage flags complexity. Because Model Database is prepaid pay-as-you-go, you only pay for tokens actually used.
- Graceful failure: wrap calls in a timeout and retry once; if the API is unreachable, fall back to a canned message plus a ticket so the customer is never left hanging.
- Privacy: redact obvious PII before logging transcripts, and never put secrets in the system prompt.
Where to go next
Once the basics work, add per-customer context (plan tier, recent orders) to the retrieval step, and test alternative models by changing a single string. A nightly job can review escalated threads to find gaps in your help center.
Grab an API key and free credit at your dashboard, and see the full request and streaming reference in the docs.