Add new quants
Browse files
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
---
|
2 |
-
license: llama3
|
3 |
language:
|
4 |
- en
|
5 |
- de
|
@@ -43,10 +43,10 @@ GGUF quantized models of [mattshumer/ref_70_e3](https://huggingface.co/mattshume
|
|
43 |
| Q4_K_L | 45.3GB | false | false |
|
44 |
| Q4_K_M | ??.?GB | false | false |
|
45 |
| Q4_K_S | 40.3GB | false | false |
|
46 |
-
| IQ4_NL |
|
47 |
| IQ4_XS | ??.?GB | false | true |
|
48 |
| Q3_K_XL | 37.2GB | false | false |
|
49 |
-
| Q3_K_L |
|
50 |
| Q3_K_M | 34.3GB | false | false |
|
51 |
| IQ3_M | ??.?GB | false | true |
|
52 |
| Q3_K_S | ??.?GB | false | false |
|
@@ -56,12 +56,12 @@ GGUF quantized models of [mattshumer/ref_70_e3](https://huggingface.co/mattshume
|
|
56 |
| IQ3_XXS | ??.?GB | false | true |
|
57 |
| Q2_K | ??.?GB | false | false |
|
58 |
| Q2_K_S | ??.?GB | false | true |
|
59 |
-
| IQ2_M |
|
60 |
-
| IQ2_S |
|
61 |
-
| IQ2_XS |
|
62 |
-
| IQ2_XXS |
|
63 |
-
| IQ1_M |
|
64 |
-
| IQ1_S |
|
65 |
|
66 |
The `_L` or `_XL` suffix means that the token embeddings and output weight are at fp16 precision.
|
67 |
|
@@ -69,9 +69,17 @@ The iMatrix dataset is bartowski's, which you can find here: [calibration_datav3
|
|
69 |
|
70 |
Computation is done on static Q6_K for 125 chunks.
|
71 |
|
|
|
|
|
|
|
|
|
|
|
|
|
72 |
## Benchmarks
|
73 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/60518f3731c5be7f3dd5ebc3/zNs-ZFs0SbnomH7mikiOU.png)
|
74 |
|
|
|
|
|
75 |
All benchmarks tested have been checked for contamination by running [LMSys's LLM Decontaminator](https://github.com/lm-sys/llm-decontaminator). When benchmarking, we isolate the `<output>` and benchmark on solely that section.
|
76 |
|
77 |
Trained from Llama 3.1 70B Instruct, you can sample from Reflection Llama-3.1 70B using the same code, pipelines, etc. as any other Llama model. It even uses the stock Llama 3.1 chat template format (though, we've trained in a few new special tokens to aid in reasoning and reflection).
|
|
|
1 |
---
|
2 |
+
license: llama3
|
3 |
language:
|
4 |
- en
|
5 |
- de
|
|
|
43 |
| Q4_K_L | 45.3GB | false | false |
|
44 |
| Q4_K_M | ??.?GB | false | false |
|
45 |
| Q4_K_S | 40.3GB | false | false |
|
46 |
+
| IQ4_NL | 38.2GB | false | true |
|
47 |
| IQ4_XS | ??.?GB | false | true |
|
48 |
| Q3_K_XL | 37.2GB | false | false |
|
49 |
+
| Q3_K_L | 37.1GB | false | false |
|
50 |
| Q3_K_M | 34.3GB | false | false |
|
51 |
| IQ3_M | ??.?GB | false | true |
|
52 |
| Q3_K_S | ??.?GB | false | false |
|
|
|
56 |
| IQ3_XXS | ??.?GB | false | true |
|
57 |
| Q2_K | ??.?GB | false | false |
|
58 |
| Q2_K_S | ??.?GB | false | true |
|
59 |
+
| IQ2_M | 23.0GB | false | true |
|
60 |
+
| IQ2_S | 21.2GB | false | true |
|
61 |
+
| IQ2_XS | 20.2GB | false | true |
|
62 |
+
| IQ2_XXS | 18.2GB | false | true |
|
63 |
+
| IQ1_M | 16.0GB | false | true |
|
64 |
+
| IQ1_S | 14.6GB | false | true |
|
65 |
|
66 |
The `_L` or `_XL` suffix means that the token embeddings and output weight are at fp16 precision.
|
67 |
|
|
|
69 |
|
70 |
Computation is done on static Q6_K for 125 chunks.
|
71 |
|
72 |
+
## Model Info
|
73 |
+
|
74 |
+
The model not trained on 3 epoches, because it's identical to the 2nd epoch run [mattshumer/Reflection-Llama-3.1-70B-ep2-working](https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B-ep2-working) (it's possible this is also fake).
|
75 |
+
|
76 |
+
The fine-tuning was done using LoRA with rank 256 on the Llama-3.1-70B-Instruct model.
|
77 |
+
|
78 |
## Benchmarks
|
79 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/60518f3731c5be7f3dd5ebc3/zNs-ZFs0SbnomH7mikiOU.png)
|
80 |
|
81 |
+
**Warning: These are likely false scores and cannot be replicated with this model.**
|
82 |
+
|
83 |
All benchmarks tested have been checked for contamination by running [LMSys's LLM Decontaminator](https://github.com/lm-sys/llm-decontaminator). When benchmarking, we isolate the `<output>` and benchmark on solely that section.
|
84 |
|
85 |
Trained from Llama 3.1 70B Instruct, you can sample from Reflection Llama-3.1 70B using the same code, pipelines, etc. as any other Llama model. It even uses the stock Llama 3.1 chat template format (though, we've trained in a few new special tokens to aid in reasoning and reflection).
|