Tutorials

Your First Chat Completion in Python

PNPriya NairMay 17, 20264 min read

This tutorial gets you from an empty Python file to a working chat completion against Model Database. We will install the OpenAI SDK, point it at Model Database, send a message, read the reply, and inspect the billing headers that tell you what each call cost.

Because Model Database is OpenAI-SDK compatible, you use the familiar openai Python package, just with a different base URL and your mdb_live_ key.

Step 1: Set up your environment

Create a project folder and a virtual environment, then install the SDK:

python -m venv .venv
source .venv/bin/activate
pip install openai

Store your key in an environment variable so it never ends up in source control:

export MDB_API_KEY="mdb_live_xxxxxxxxxxxxxxxxxxxxxxxx"

Step 2: Create the client

The only Model Database-specific configuration is the base_url and the key. Create a file called chat.py:

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://modeldatabase.com/v1",
    api_key=os.environ["MDB_API_KEY"],
)

Step 3: Send your first message

A chat completion takes a model and a list of messages. Each message has a role (system, user, or assistant) and content:

resp = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-6",
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": "Explain what an API is in two sentences."},
    ],
)

print(resp.choices[0].message.content)

Run it with python chat.py and you will see the model's reply printed to your terminal. The system message sets behavior, and the user message is the actual question.

Step 4: Control the output

You can tune the response with standard sampling parameters. temperature controls randomness (lower is more deterministic) and max_tokens caps the length of the reply:

resp = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Give me three blog title ideas about caching."}],
    temperature=0.7,
    max_tokens=200,
)
for choice in resp.choices:
    print(choice.message.content)

Step 5: Hold a multi-turn conversation

The API is stateless, so to continue a conversation you send the full history back each time, appending the model's previous reply as an assistant message:

messages = [{"role": "user", "content": "My name is Sam."}]
resp = client.chat.completions.create(model="anthropic/claude-sonnet-4-6", messages=messages)

# append the assistant reply, then ask a follow-up
messages.append({"role": "assistant", "content": resp.choices[0].message.content})
messages.append({"role": "user", "content": "What did I say my name was?"})

resp = client.chat.completions.create(model="anthropic/claude-sonnet-4-6", messages=messages)
print(resp.choices[0].message.content)

Step 6: Check what each call cost

Every billable response carries the headers X-MDB-Charged-USD and X-MDB-Balance-USD. With the OpenAI SDK you can grab the raw response to read them:

raw = client.chat.completions.with_raw_response.create(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
)
print("charged:", raw.headers.get("X-MDB-Charged-USD"))
print("balance:", raw.headers.get("X-MDB-Balance-USD"))

completion = raw.parse()   # the normal completion object
print(completion.choices[0].message.content)

You can also read resp.usage on a normal response to see prompt and completion token counts, which is useful for estimating cost before it lands on your balance.

Switching models

To try a different model, change one string. The same code runs against google/gemini-2.0-flash, meta-llama/llama-3.3-70b-instruct, or deepseek/deepseek-chat without any other edits.

You now have a complete Python workflow: client, messages, parameters, multi-turn history, and cost tracking. Get a key and credit from your dashboard, and explore every parameter in the docs.

← All articles Get your API key →