GGUF
Inference Endpoints
imatrix
ThomasBaruzier commited on
Commit
1af9c31
1 Parent(s): ef36c09

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md CHANGED
@@ -19,6 +19,41 @@ All quants were made using the imatrix option and Bartowski's [calibration file]
19
 
20
  <hr>
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  # Llama-3.1-Minitron-4B-Width-Base
23
 
24
  ## Model Overview
 
19
 
20
  <hr>
21
 
22
+ # Perplexity table (the lower the better)
23
+
24
+ | Quant | Size (MB) | PPL | Size (%) | Accuracy (%) | PPL error rate |
25
+ | ------- | --------- | ------- | -------- | ------------ | -------------- |
26
+ | IQ1_S | 1158 | 81.7502 | 13.44 | 9.26 | 0.69197 |
27
+ | IQ1_M | 1227 | 40.8601 | 14.24 | 18.53 | 0.31979 |
28
+ | IQ2_XXS | 1343 | 16.5816 | 15.59 | 45.67 | 0.11466 |
29
+ | IQ2_XS | 1448 | 13.024 | 16.81 | 58.14 | 0.08768 |
30
+ | IQ2_S | 1551 | 12.6045 | 18 | 60.07 | 0.08478 |
31
+ | IQ2_M | 1643 | 11.0911 | 19.07 | 68.27 | 0.07374 |
32
+ | Q2_K_S | 1654 | 11.0796 | 19.2 | 68.34 | 0.07646 |
33
+ | Q2_K | 1755 | 10.3111 | 20.37 | 73.44 | 0.07045 |
34
+ | IQ3_XXS | 1794 | 9.342 | 20.82 | 81.05 | 0.0612 |
35
+ | IQ3_XS | 1934 | 9.4403 | 22.45 | 80.21 | 0.06137 |
36
+ | Q3_K_S | 2005 | 8.8949 | 23.27 | 85.13 | 0.05946 |
37
+ | IQ3_S | 2017 | 9.0714 | 23.41 | 83.47 | 0.05851 |
38
+ | IQ3_M | 2083 | 8.3352 | 24.18 | 90.84 | 0.0534 |
39
+ | Q3_K_M | 2191 | 8.1839 | 25.43 | 92.52 | 0.05408 |
40
+ | Q3_K_L | 2351 | 8.093 | 27.29 | 93.56 | 0.05352 |
41
+ | IQ4_XS | 2419 | 7.774 | 28.08 | 97.4 | 0.05097 |
42
+ | Q4_0 | 2533 | 7.8479 | 29.4 | 96.49 | 0.05132 |
43
+ | IQ4_NL | 2538 | 7.7697 | 29.46 | 97.46 | 0.05091 |
44
+ | Q4_K_S | 2541 | 7.8125 | 29.49 | 96.92 | 0.05101 |
45
+ | Q4_K_M | 2650 | 7.7376 | 30.76 | 97.86 | 0.05038 |
46
+ | Q4_1 | 2772 | 7.8155 | 32.17 | 96.89 | 0.05116 |
47
+ | Q5_K_S | 3017 | 7.6649 | 35.02 | 98.79 | 0.05021 |
48
+ | Q5_0 | 3024 | 7.6407 | 35.1 | 99.1 | 0.0499 |
49
+ | Q5_K_M | 3081 | 7.6283 | 35.76 | 99.26 | 0.04985 |
50
+ | Q5_1 | 3263 | 7.6439 | 37.87 | 99.06 | 0.04996 |
51
+ | Q6_K | 3539 | 7.587 | 41.07 | 99.8 | 0.04945 |
52
+ | Q8_0 | 4581 | 7.5739 | 53.17 | 99.98 | 0.04941 |
53
+ | F16 | 8616 | 7.5721 | 100 | 100 | 0.04942 |
54
+
55
+ <hr>
56
+
57
  # Llama-3.1-Minitron-4B-Width-Base
58
 
59
  ## Model Overview