Build a Terminal Chat Tool in 50 Lines

Nothing cements how an API works like building a small real tool with it. In this tutorial you will write a complete terminal chat application in about 50 lines of Python. It streams responses as they are generated, remembers the conversation, lets you switch models on the fly, and prints how much each reply cost, all through Model Database.

It is a genuinely useful little program, and a great template to extend.

Setup

Install the SDK and set your key:

pip install openai
export MDB_API_KEY="mdb_live_xxxxxxxxxxxxxxxxxxxxxxxx"

The full program

Create chat.py. The whole tool fits comfortably under 50 lines:

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://modeldatabase.com/v1",
    api_key=os.environ["MDB_API_KEY"],
)

model = "anthropic/claude-sonnet-4-6"
messages = [{"role": "system", "content": "You are a helpful terminal assistant. Be concise."}]

print("MDB chat. Commands: /model <id>, /reset, /quit")

while True:
    try:
        user = input("\nyou > ").strip()
    except (EOFError, KeyboardInterrupt):
        break
    if not user:
        continue
    if user == "/quit":
        break
    if user == "/reset":
        messages = messages[:1]
        print("(history cleared)")
        continue
    if user.startswith("/model "):
        model = user.split(" ", 1)[1].strip()
        print(f"(switched to {model})")
        continue

    messages.append({"role": "user", "content": user})

    print("bot > ", end="", flush=True)
    reply = ""
    stream = client.chat.completions.create(model=model, messages=messages, stream=True)
    for chunk in stream:
        delta = chunk.choices[0].delta.content
        if delta:
            print(delta, end="", flush=True)
            reply += delta
    print()
    messages.append({"role": "assistant", "content": reply})

How it works

A few small pieces do all the work:

The client is the OpenAI SDK pointed at https://modeldatabase.com/v1 with your mdb_live_ key.
The messages list is the memory. We append each user turn and each assistant reply, and resend the whole list every request so the model has full context.
Streaming (stream=True) prints tokens as they arrive, giving a live typing effect instead of a frozen wait.
Commands let you reset history, quit, or switch models without restarting.

Try switching models live

Run python chat.py and chat. Then type a command to switch providers mid-session:

/model openai/gpt-4o
/model google/gemini-2.0-flash
/model deepseek/deepseek-chat

Because every model is behind the same API, the tool does not change at all, only the model string does. This makes it a handy way to compare how different models answer the same prompt.

Add a cost readout (optional)

Want to see what each reply cost? Use a non-streaming call with the raw response to read the billing headers, then print them after the answer:

raw = client.chat.completions.with_raw_response.create(model=model, messages=messages)
completion = raw.parse()
print(completion.choices[0].message.content)
print(f"[charged ${raw.headers.get('X-MDB-Charged-USD')}, "
      f"balance ${raw.headers.get('X-MDB-Balance-USD')}]")

You can keep both modes and toggle between streaming and a cost readout with another slash command.

Ideas to extend it

Save transcripts to a file when you /quit.
Trim history once it grows large to control token cost.
Add a /system command to change the system prompt on the fly.
Wrap calls in retry logic so transient errors do not crash the session.

In 50 lines you have a streaming, multi-model, memory-aware chat tool. Get your key and credit at your dashboard, and browse the full API in the docs.

Build a Terminal Chat Tool in 50 Lines

Setup

The full program

How it works

Try switching models live

Add a cost readout (optional)

Ideas to extend it

More in Tutorials

How to Get Your First Model Database API Key

Switch From the OpenAI API in One Line

Your First Chat Completion in Python