Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Strict copy of https://huggingface.co/tiiuae/falcon-40b but quantized with GPTQ (on wikitext-2, 4bits, groupsize=128).
|
2 |
+
|
3 |
+
Intended to be used with https://github.com/huggingface/text-generation-inference
|
4 |
+
|
5 |
+
```
|
6 |
+
model=huggingface/falcon-40b-gptq
|
7 |
+
num_shard=2
|
8 |
+
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
|
9 |
+
|
10 |
+
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id $model --num-shard $num_shard --quantize gptq
|
11 |
+
```
|
12 |
+
|
13 |
+
For full configuration or using in docker (which is recommended)
|