Update README.md
Browse files
README.md
CHANGED
@@ -17,30 +17,29 @@ license: other
|
|
17 |
license_name: llama3
|
18 |
license_link: LICENSE
|
19 |
---
|
20 |
-
# Updated beta quants based on new fixed tokenizer, only works with in-development branch gg/bpe-preprocess
|
21 |
|
22 |
# Quant Infos
|
23 |
|
|
|
|
|
24 |
- Updated for latest bpe pre-tokenizer fixes https://github.com/ggerganov/llama.cpp/pull/6920
|
25 |
- quants done with an importance matrix for improved quantization loss
|
26 |
- K & IQ quants in basically all variants from Q6_K down to IQ1_S
|
27 |
- fixed end token for instruct mode (<|eot_id|>[128009])
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
-
|
30 |
-
|
31 |
-
I cherry-picked tokenizer fixes from [this](https://github.com/ggerganov/llama.cpp/pull/6745) branch to get it to work.
|
32 |
|
33 |
-
The quants
|
34 |
|
35 |
-
Using this command to generate the importance matrix from the f16.gguf with [this](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)
|
36 |
-
dataset.
|
37 |
-
|
38 |
-
```
|
39 |
-
./imatrix -c 512 -m $model_name-f16.gguf -f $llama_cpp_path/groups_merged.txt -o $out_path/imat-f16-gmerged.dat
|
40 |
-
```
|
41 |
|
42 |
## Note about eos token
|
43 |
-
|
|
|
44 |
The initial upload had some issues with this as it uses the "default" eos token of 128001, but when in instruct mode llama only outputs 128009 as eos token which causes it to ramble on and on without stopping.
|
45 |
|
46 |
I have uploaded fixed quants with the eos token id manually set to 128009.
|
|
|
17 |
license_name: llama3
|
18 |
license_link: LICENSE
|
19 |
---
|
|
|
20 |
|
21 |
# Quant Infos
|
22 |
|
23 |
+
## Includes latest bpe tokenizer fixes 🎉
|
24 |
+
|
25 |
- Updated for latest bpe pre-tokenizer fixes https://github.com/ggerganov/llama.cpp/pull/6920
|
26 |
- quants done with an importance matrix for improved quantization loss
|
27 |
- K & IQ quants in basically all variants from Q6_K down to IQ1_S
|
28 |
- fixed end token for instruct mode (<|eot_id|>[128009])
|
29 |
+
- Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit [f4ab2a41476600a98067a9474ea8f9e6db41bcfa](https://github.com/ggerganov/llama.cpp/commit/f4ab2a41476600a98067a9474ea8f9e6db41bcfa) (master from 2024-04-29)
|
30 |
+
- Imatrtix generated with [this](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384) dataset.
|
31 |
+
```
|
32 |
+
./imatrix -c 512 -m $model_name-f16.gguf -f $llama_cpp_path/groups_merged.txt -o $out_path/imat-f16-gmerged.dat
|
33 |
+
```
|
34 |
|
35 |
+
## Note about recent tokenizer fixes
|
|
|
|
|
36 |
|
37 |
+
The newest quants uploaded here need at least commit f4ab2a41476600a98067a9474ea8f9e6db41bcfa, this is not integrated into most upstream tools yet as it was just released. (29-04-24)
|
38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
## Note about eos token
|
41 |
+
Llama 3 uses a different eos tokens depending if it is in instruct mode.
|
42 |
+
|
43 |
The initial upload had some issues with this as it uses the "default" eos token of 128001, but when in instruct mode llama only outputs 128009 as eos token which causes it to ramble on and on without stopping.
|
44 |
|
45 |
I have uploaded fixed quants with the eos token id manually set to 128009.
|