A Content Generation Pipeline That Scales

Generating one blog post with an LLM is easy. Generating hundreds of consistent, on-brand pieces a week, with review gates and predictable cost, is an engineering problem. This article shows how to build a content generation pipeline on Model Database that you can actually run at scale.

The goal is a system that turns a queue of briefs into draft content, runs quality checks automatically, and flags anything that needs a human before publishing.

Pipeline stages

Think of content generation as a series of small, inspectable steps rather than one giant prompt:

Outline: expand a brief into a structured outline.
Draft: write each section from the outline.
Edit: tighten prose, enforce tone, and fix obvious issues.
Check: validate length, banned words, and required elements.

Breaking the work apart lets you use a cheaper model where quality is less critical and a stronger one where it matters. Model Database makes the swap a one-line change.

Stage one: outlines

from openai import OpenAI

client = OpenAI(
    base_url="https://modeldatabase.com/v1",
    api_key="mdb_live_...",
)

def outline(brief):
    resp = client.chat.completions.create(
        model="openai/gpt-4o-mini",
        messages=[
            {"role": "system", "content":
             "Turn the brief into a JSON outline with 'title' and "
             "'sections' (each with a heading and 2-3 talking points)."},
            {"role": "user", "content": brief},
        ],
        response_format={"type": "json_object"},
        temperature=0.4,
    )
    return resp.choices[0].message.content

A small model handles structure well and keeps this high-volume step inexpensive.

Stage two: drafting sections

Generating section by section keeps each call focused and lets you parallelize. For the writing itself, a stronger model earns its keep on coherence and voice.

def write_section(title, heading, points):
    resp = client.chat.completions.create(
        model="anthropic/claude-sonnet-4-6",
        messages=[
            {"role": "system", "content":
             "You are a brand writer. Active voice, no cliches, "
             "150-220 words per section."},
            {"role": "user", "content":
             f"Article: {title}\nSection: {heading}\n"
             f"Cover: {points}"},
        ],
        temperature=0.6,
    )
    return resp.choices[0].message.content

Because every section is an independent call, you can run them concurrently with a thread pool or async client and assemble the full draft when they finish.

Stage three: automated quality gates

Never publish raw model output. Add deterministic checks plus an LLM-based review pass. The deterministic checks are cheap and catch the obvious failures.

def passes_rules(text, banned, min_words):
    words = len(text.split())
    if words < min_words:
        return False, f"too short: {words}"
    for term in banned:
        if term.lower() in text.lower():
            return False, f"banned term: {term}"
    return True, "ok"

Then use a model as an editor that returns a structured verdict, so a human only sees pieces that need attention.

import json

def review(text):
    resp = client.chat.completions.create(
        model="anthropic/claude-sonnet-4-6",
        messages=[
            {"role": "system", "content":
             "Review the draft. Return JSON: "
             "{\"score\": 1-5, \"issues\": [...], \"publish\": bool}."},
            {"role": "user", "content": text},
        ],
        response_format={"type": "json_object"},
        temperature=0,
    )
    return json.loads(resp.choices[0].message.content)

Orchestration and throughput

Tie the stages together with a job queue so the pipeline is resilient and observable:

One row per brief in a database table with a status column (queued, drafting, review, ready, failed).
Workers pull jobs, call Model Database, and advance the status. Retries are safe because each stage is idempotent.
Backpressure: limit concurrent in-flight requests so you control spend. Prepaid credit means there are no surprise invoices, but you still want a cap.
Auditing: store the model ID and token usage on each job so you can compare models on real output later.

Tuning cost and quality

Once the pipeline runs, treat model choice as a dial. Start with openai/gpt-4o-mini everywhere, then upgrade only the stages where reviewers reject output. You might find drafting needs anthropic/claude-sonnet-4-6 while outlining and rule-checking stay on the small model. Since all of these are reachable through the same endpoint, A/B testing is just changing the model string and comparing review scores.

Ready to build it? Create a key and load credit at your dashboard, and review streaming and JSON-mode details in the docs.

A Content Generation Pipeline That Scales

Pipeline stages

Stage one: outlines

Stage two: drafting sections

Stage three: automated quality gates

Orchestration and throughput

Tuning cost and quality

More in Use Cases

Building a Customer Support Assistant

Automating Code Review With LLMs

Extracting Structured Data From Documents