Llama 3.2 11B Vision Instruct

meta-llama/llama-3.2-11b-vision-instruct

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

Input

$0.4485 / 1M tokens

Output

$0.4485 / 1M tokens

Context

131K tokens

Capabilities

✓ Streaming○ Tool / function calling✓ JSON / structured output✓ Vision (image input)

Call this model

Pass meta-llama/llama-3.2-11b-vision-instruct as the model. Change only the base URL from your existing OpenAI SDK setup.

quickstart.sh

curl https://modeldatabase.com/v1/chat/completions \
  -H "Authorization: Bearer $MDB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/llama-3.2-11b-vision-instruct",
    "messages": [{"role":"user","content":"Hello!"}]
  }'

Prices are final — the routing margin is already included. See the full pricing table or the API docs.