Translation at Scale With LLMs

Translating a product into a dozen languages is no longer a quarter-long project. LLMs produce fluent, context-aware translations that respect tone, preserve placeholders, and handle domain vocabulary. This article builds a translation pipeline on Model Database that localizes UI strings and content while keeping quality and cost under control.

The difference between a toy translator and a production one is everything around the model call: placeholder safety, glossary enforcement, batching, and review. We'll cover all of it.

Why an LLM over classic MT

Traditional machine translation is fast but context-blind. LLMs understand that "Save" in a toolbar is a verb, respect formality levels, and follow instructions like "keep the brand name untranslated." You trade a little latency for noticeably better output, and you can steer the result with a prompt.

Translating a single string safely

UI strings contain placeholders like {count} that must survive untouched. Instruct the model explicitly and keep temperature low for consistency.

from openai import OpenAI

client = OpenAI(
    base_url="https://modeldatabase.com/v1",
    api_key="mdb_live_...",
)

def translate(text, target_lang, glossary=""):
    resp = client.chat.completions.create(
        model="google/gemini-2.0-flash",
        messages=[
            {"role": "system", "content":
             f"Translate to {target_lang}. Preserve placeholders like "
             "{name} exactly. Keep HTML tags intact. Match the tone. "
             f"Apply this glossary strictly:\n{glossary}\n"
             "Return only the translation."},
            {"role": "user", "content": text},
        ],
        temperature=0,
    )
    return resp.choices[0].message.content

google/gemini-2.0-flash is fast and broadly multilingual, which makes it a strong default for high-volume localization. For literary or marketing copy where nuance matters, test anthropic/claude-sonnet-4-6.

Batching for throughput

Translating thousands of strings one request at a time is slow and wasteful. Batch related strings into a single call using JSON, which also gives the model surrounding context to translate consistently.

import json

def translate_batch(strings, target_lang):
    resp = client.chat.completions.create(
        model="google/gemini-2.0-flash",
        messages=[
            {"role": "system", "content":
             f"Translate each value to {target_lang}. Keep keys and "
             "placeholders unchanged. Return the same JSON shape."},
            {"role": "user", "content": json.dumps(strings)},
        ],
        response_format={"type": "json_object"},
        temperature=0,
    )
    return json.loads(resp.choices[0].message.content)

Keep batches to a sensible size so a single bad response doesn't force you to retranslate everything. A few dozen strings per call is a good balance.

Enforcing a glossary

Brands need consistency: product names, legal terms, and feature labels must translate the same way every time. Pass a glossary in the prompt and validate the output against it.

def glossary_respected(translated, glossary):
    for term, expected in glossary.items():
        if term in translated and expected not in translated:
            return False
    return True

When a translation violates the glossary or drops a placeholder, retry once with the specific error, or route it to human review.

Validation that catches real bugs

Placeholder check: confirm every {token} in the source appears in the translation. This catches the most common and most damaging error.
Tag balance: verify HTML tags open and close as in the source.
Length sanity: flag translations that are wildly longer or shorter than expected for the language.

import re

def placeholders_intact(src, dst):
    pat = re.compile(r"\{[^}]+\}")
    return set(pat.findall(src)) == set(pat.findall(dst))

Cost, caching, and review

Translation is naturally cacheable: the same source string in the same language always gives the same result. Cache by a hash of source text plus target language, and you'll only pay to translate each unique string once. With Model Database's prepaid pay-as-you-go billing, a one-time bulk translation of your catalog has a predictable cost you can estimate from token counts.

For regulated or high-visibility content, keep a human in the loop: machine-translate first, then have a reviewer approve. The model does the tedious 90 percent; people handle the nuanced last mile.

Picking the right model

Build a small evaluation set of source strings with known-good translations and run candidates against it. Use the fast model for bulk UI strings and a stronger one for prose. Switching is a single model string through the same endpoint, so you can mix models per content type without changing your architecture.

Create a key and load credit at your dashboard, and find JSON-mode and model details in the docs.

Translation at Scale With LLMs

Why an LLM over classic MT

Translating a single string safely

Batching for throughput

Enforcing a glossary

Validation that catches real bugs

Cost, caching, and review

Picking the right model

More in Use Cases

Building a Customer Support Assistant

A Content Generation Pipeline That Scales

Automating Code Review With LLMs