Serving Private & Gated Models

If the model you wish to serve is behind gated access or the model repository on Model Database Hub is private, and you have access to the model, you can provide your Model Database Hub access token. You can generate and copy a read token from Model Database Hub tokens page

If you’re using the CLI, set the HUGGING_FACE_HUB_TOKEN environment variable. For example:

export HUGGING_FACE_HUB_TOKEN=<YOUR READ TOKEN>

If you would like to do it through Docker, you can provide your token by specifying HUGGING_FACE_HUB_TOKEN as shown below.

model=meta-llama/Llama-2-7b-chat-hf
volume=$PWD/data
token=<your READ token>

docker run --gpus all \
    --shm-size 1g \
    -e HUGGING_FACE_HUB_TOKEN=$token \
    -p 8080:80 \
    -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.1 \
    --model-id $model