Nothing cements how an API works like building a small real tool with it. In this tutorial you will write a complete terminal chat application in about 50 lines of Python. It streams responses as they are generated, remembers the conversation, lets you switch models on the fly, and prints how much each reply cost, all through Model Database.
It is a genuinely useful little program, and a great template to extend.
Setup
Install the SDK and set your key:
pip install openai
export MDB_API_KEY="mdb_live_xxxxxxxxxxxxxxxxxxxxxxxx"
The full program
Create chat.py. The whole tool fits comfortably under 50 lines:
import os
from openai import OpenAI
client = OpenAI(
base_url="https://modeldatabase.com/v1",
api_key=os.environ["MDB_API_KEY"],
)
model = "anthropic/claude-sonnet-4-6"
messages = [{"role": "system", "content": "You are a helpful terminal assistant. Be concise."}]
print("MDB chat. Commands: /model <id>, /reset, /quit")
while True:
try:
user = input("\nyou > ").strip()
except (EOFError, KeyboardInterrupt):
break
if not user:
continue
if user == "/quit":
break
if user == "/reset":
messages = messages[:1]
print("(history cleared)")
continue
if user.startswith("/model "):
model = user.split(" ", 1)[1].strip()
print(f"(switched to {model})")
continue
messages.append({"role": "user", "content": user})
print("bot > ", end="", flush=True)
reply = ""
stream = client.chat.completions.create(model=model, messages=messages, stream=True)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
reply += delta
print()
messages.append({"role": "assistant", "content": reply})
How it works
A few small pieces do all the work:
- The client is the OpenAI SDK pointed at https://modeldatabase.com/v1 with your mdb_live_ key.
- The
messageslist is the memory. We append each user turn and each assistant reply, and resend the whole list every request so the model has full context. - Streaming (
stream=True) prints tokens as they arrive, giving a live typing effect instead of a frozen wait. - Commands let you reset history, quit, or switch models without restarting.
Try switching models live
Run python chat.py and chat. Then type a command to switch providers mid-session:
/model openai/gpt-4o
/model google/gemini-2.0-flash
/model deepseek/deepseek-chat
Because every model is behind the same API, the tool does not change at all, only the model string does. This makes it a handy way to compare how different models answer the same prompt.
Add a cost readout (optional)
Want to see what each reply cost? Use a non-streaming call with the raw response to read the billing headers, then print them after the answer:
raw = client.chat.completions.with_raw_response.create(model=model, messages=messages)
completion = raw.parse()
print(completion.choices[0].message.content)
print(f"[charged ${raw.headers.get('X-MDB-Charged-USD')}, "
f"balance ${raw.headers.get('X-MDB-Balance-USD')}]")
You can keep both modes and toggle between streaming and a cost readout with another slash command.
Ideas to extend it
- Save transcripts to a file when you
/quit. - Trim history once it grows large to control token cost.
- Add a
/systemcommand to change the system prompt on the fly. - Wrap calls in retry logic so transient errors do not crash the session.
In 50 lines you have a streaming, multi-model, memory-aware chat tool. Get your key and credit at your dashboard, and browse the full API in the docs.