Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Strict copy of https://huggingface.co/tiiuae/falcon-40b but quantized with GPTQ (on wikitext-2, 4bits, groupsize=128).

Intended to be used with https://github.com/huggingface/text-generation-inference

model=huggingface/falcon-40b-gptq
num_shard=2
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id $model --num-shard $num_shard --quantize gptq

For full configuration and usage outside docker, please refer to https://github.com/huggingface/text-generation-inference

Downloads last month
503
Safetensors
Model size
6.53B params
Tensor type
I64
I32
F16
Hosted inference API

Inference API does not yet support model repos that contain custom code.

Space using huggingface/falcon-40b-gptq 1