Update README.md
Browse files
README.md
CHANGED
@@ -9,10 +9,15 @@ This repository contains CPU-optimized GGUF quantizations of the Meta-Llama-3.1-
|
|
9 |
|
10 |
## Available Quantizations
|
11 |
|
|
|
|
|
12 |
1. Q4_0_4_8 (CPU FMA-Optimized): ~246 GB
|
13 |
-
2.
|
14 |
-
3.
|
15 |
-
4.
|
|
|
|
|
|
|
16 |
|
17 |
## Use Aria2 for parallelized downloads, links will download 9x faster
|
18 |
|
@@ -22,8 +27,7 @@ This repository contains CPU-optimized GGUF quantizations of the Meta-Llama-3.1-
|
|
22 |
>>
|
23 |
>>Feel free to paste these all in at once or one at a time
|
24 |
|
25 |
-
### Q4_0_48 (CPU Optimized
|
26 |
-
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/DD71wAB7DlQBmTG8wVaWS.png)
|
27 |
|
28 |
|
29 |
```bash
|
@@ -36,7 +40,7 @@ aria2c -x 16 -s 16 -k 1M -o meta-405b-inst-cpu-optimized-q4048-00006-of-00006.gg
|
|
36 |
```
|
37 |
|
38 |
|
39 |
-
### IQ4_XS Version - Fastest for CPU/GPU (Size: ~212 GB)
|
40 |
```bash
|
41 |
aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-i1-q4xs-00001-of-00005.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-cpu-i1-q4xs-00001-of-00005.gguf
|
42 |
aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-i1-q4xs-00002-of-00005.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-cpu-i1-q4xs-00002-of-00005.gguf
|
@@ -52,7 +56,7 @@ aria2c -x 16 -s 16 -k 1M -o meta-405b-1bit-00002-of-00003.gguf https://huggingfa
|
|
52 |
aria2c -x 16 -s 16 -k 1M -o meta-405b-1bit-00003-of-00003.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-1bit-00003-of-00003.gguf
|
53 |
```
|
54 |
|
55 |
-
|
56 |
### Q2K-Q8 Mixed 2bit 8bit I wrote myself. This is the smallest coherent one I could make WITHOUT imatrix
|
57 |
|
58 |
```verilog
|
@@ -70,6 +74,11 @@ aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-imatrix-2k-00003-of-00004.gguf https:/
|
|
70 |
aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-imatrix-2k-00004-of-00004.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-cpu-imatrix-2k-00004-of-00004.gguf
|
71 |
```
|
72 |
|
|
|
|
|
|
|
|
|
|
|
73 |
### BF16 Version
|
74 |
|
75 |
```bash
|
|
|
9 |
|
10 |
## Available Quantizations
|
11 |
|
12 |
+
Available Quantizations
|
13 |
+
|
14 |
1. Q4_0_4_8 (CPU FMA-Optimized): ~246 GB
|
15 |
+
2. IQ4_XS (Fastest for CPU/GPU): ~212 GB
|
16 |
+
3. Q2K-Q8 Mixed quant with iMatrix: ~154 GB
|
17 |
+
4. Q2K-Q8 Mixed without iMat for testing: ~165 GB
|
18 |
+
5. 1-bit Custom per weight COHERENT quant: ~103 GB
|
19 |
+
6. BF16: ~811 GB (original model)
|
20 |
+
7. Q8_0: ~406 GB (original model)
|
21 |
|
22 |
## Use Aria2 for parallelized downloads, links will download 9x faster
|
23 |
|
|
|
27 |
>>
|
28 |
>>Feel free to paste these all in at once or one at a time
|
29 |
|
30 |
+
### Q4_0_48 (CPU FMA Optimized Specifically for ARM server chips, NOT TESTED on X86)
|
|
|
31 |
|
32 |
|
33 |
```bash
|
|
|
40 |
```
|
41 |
|
42 |
|
43 |
+
### IQ4_XS Version - Fastest for CPU/GPU should work everywhere (Size: ~212 GB)
|
44 |
```bash
|
45 |
aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-i1-q4xs-00001-of-00005.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-cpu-i1-q4xs-00001-of-00005.gguf
|
46 |
aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-i1-q4xs-00002-of-00005.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-cpu-i1-q4xs-00002-of-00005.gguf
|
|
|
56 |
aria2c -x 16 -s 16 -k 1M -o meta-405b-1bit-00003-of-00003.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-1bit-00003-of-00003.gguf
|
57 |
```
|
58 |
|
59 |
+
|
60 |
### Q2K-Q8 Mixed 2bit 8bit I wrote myself. This is the smallest coherent one I could make WITHOUT imatrix
|
61 |
|
62 |
```verilog
|
|
|
74 |
aria2c -x 16 -s 16 -k 1M -o meta-405b-cpu-imatrix-2k-00004-of-00004.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-cpu-imatrix-2k-00004-of-00004.gguf
|
75 |
```
|
76 |
|
77 |
+
<figure>
|
78 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/DD71wAB7DlQBmTG8wVaWS.png" alt="Q4_0_48 CPU Optimized example response">
|
79 |
+
<figcaption><strong>Q4_0_48 (CPU Optimized) (246GB):</strong> Example response of 20000 token prompt</figcaption>
|
80 |
+
</figure>
|
81 |
+
|
82 |
### BF16 Version
|
83 |
|
84 |
```bash
|