Nikolai1902
commited on
Commit
•
6040c6b
1
Parent(s):
788d2ce
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,16 @@
|
|
1 |
---
|
2 |
license: wtfpl
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: wtfpl
|
3 |
---
|
4 |
+
This model was a QLoRA of LLaMA 2-13B base finetuned on the FIMFiction archive and then merged with the base model (as most GGML loading apps don't support LoRAs), and quantized for llama.cpp-based frontends. Was trained with 1024 context length
|
5 |
+
|
6 |
+
|
7 |
+
There are two options, depending on the resources you have:
|
8 |
+
- Q5_K_M: Low quality loss K-quantized 5 bits model. Max RAM consumption is 11.73 GB, recommended if you have 12GB of VRAM to load 40 layers
|
9 |
+
- Q4_K_S: Compact K-quantized 4 bits. Max RAM consumption is 9.87 GB
|
10 |
+
|
11 |
+
This not an instruction tuned model, it was trained on raw text, so treat it like an autocomplete.
|
12 |
+
Seems sensitive to formatting: I found it's usually better at staying on topic when using double spacing in the prompt.
|
13 |
+
|
14 |
+
Only tags excluded from the dataset are eqg and humanized.
|
15 |
+
|
16 |
+
|