johannhartmann
commited on
Commit
•
abf2a43
1
Parent(s):
e7827a4
Upload folder using huggingface_hub
Browse files- .gitattributes +15 -0
- README.md +103 -0
- llama3-discoleo-instruct-8b-32k-v0.1.Q2_K.gguf +3 -0
- llama3-discoleo-instruct-8b-32k-v0.1.Q3_K_L.gguf +3 -0
- llama3-discoleo-instruct-8b-32k-v0.1.Q3_K_M.gguf +3 -0
- llama3-discoleo-instruct-8b-32k-v0.1.Q3_K_S.gguf +3 -0
- llama3-discoleo-instruct-8b-32k-v0.1.Q4_0.gguf +3 -0
- llama3-discoleo-instruct-8b-32k-v0.1.Q4_1.gguf +3 -0
- llama3-discoleo-instruct-8b-32k-v0.1.Q4_K_M.gguf +3 -0
- llama3-discoleo-instruct-8b-32k-v0.1.Q4_K_S.gguf +3 -0
- llama3-discoleo-instruct-8b-32k-v0.1.Q5_0.gguf +3 -0
- llama3-discoleo-instruct-8b-32k-v0.1.Q5_1.gguf +3 -0
- llama3-discoleo-instruct-8b-32k-v0.1.Q5_K_M.gguf +3 -0
- llama3-discoleo-instruct-8b-32k-v0.1.Q5_K_S.gguf +3 -0
- llama3-discoleo-instruct-8b-32k-v0.1.Q6_K.gguf +3 -0
- llama3-discoleo-instruct-8b-32k-v0.1.Q8_0.gguf +3 -0
- llama3-discoleo-instruct-8b-32k-v0.1.gguf +3 -0
.gitattributes
CHANGED
@@ -33,3 +33,18 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
llama3-discoleo-instruct-8b-32k-v0.1.Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
|
37 |
+
llama3-discoleo-instruct-8b-32k-v0.1.Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
|
38 |
+
llama3-discoleo-instruct-8b-32k-v0.1.Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
39 |
+
llama3-discoleo-instruct-8b-32k-v0.1.Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
|
40 |
+
llama3-discoleo-instruct-8b-32k-v0.1.Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
|
41 |
+
llama3-discoleo-instruct-8b-32k-v0.1.Q4_1.gguf filter=lfs diff=lfs merge=lfs -text
|
42 |
+
llama3-discoleo-instruct-8b-32k-v0.1.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
43 |
+
llama3-discoleo-instruct-8b-32k-v0.1.Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
|
44 |
+
llama3-discoleo-instruct-8b-32k-v0.1.Q5_0.gguf filter=lfs diff=lfs merge=lfs -text
|
45 |
+
llama3-discoleo-instruct-8b-32k-v0.1.Q5_1.gguf filter=lfs diff=lfs merge=lfs -text
|
46 |
+
llama3-discoleo-instruct-8b-32k-v0.1.Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
47 |
+
llama3-discoleo-instruct-8b-32k-v0.1.Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
|
48 |
+
llama3-discoleo-instruct-8b-32k-v0.1.Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
|
49 |
+
llama3-discoleo-instruct-8b-32k-v0.1.Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
50 |
+
llama3-discoleo-instruct-8b-32k-v0.1.gguf filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
@@ -0,0 +1,103 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- de
|
4 |
+
license: llama3
|
5 |
+
library_name: transformers
|
6 |
+
tags:
|
7 |
+
- gguf
|
8 |
+
---
|
9 |
+
# # Llama3-DiscoLeo-Instruct 8B 32k-context (version 0.1)
|
10 |
+
|
11 |
+
## Thanks and Accreditation
|
12 |
+
|
13 |
+
[DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1](https://huggingface.co/collections/DiscoResearch/discoleo-8b-llama3-for-german-6650527496c0fafefd4c9729)
|
14 |
+
is the result of a joint effort between [DiscoResearch](https://huggingface.co/DiscoResearch) and [Occiglot](https://huggingface.co/occiglot)
|
15 |
+
with support from the [DFKI](https://www.dfki.de/web/) (German Research Center for Artificial Intelligence) and [hessian.Ai](https://hessian.ai).
|
16 |
+
Occiglot kindly handled data preprocessing, filtering, and deduplication as part of their latest [dataset release](https://huggingface.co/datasets/occiglot/occiglot-fineweb-v0.5), as well as sharing their compute allocation at hessian.Ai's 42 Supercomputer.
|
17 |
+
|
18 |
+
## Model Overview
|
19 |
+
|
20 |
+
DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1 is an instruction tuned version of our long-context [Llama3-German-8B-32k](https://huggingface.co/DiscoResearch/Llama3_German_8B_32k).
|
21 |
+
The base model was derived from [Meta's Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) through continuous pretraining on 65 billion high-quality German tokens, similar to previous [LeoLM](https://huggingface.co/LeoLM) or [Occiglot](https://huggingface.co/collections/occiglot/occiglot-eu5-7b-v01-65dbed502a6348b052695e01) models.
|
22 |
+
For the long-context version we trained on an additional 100 million tokens at 32k context length, using a rope_theta value of 1.5e6 and a learning rate of 1.5e-5 with a batch size of 256*8192 and otherwise equal hyperparameters to the base model.
|
23 |
+
We finetuned this checkpoint on the German Instruction dataset from DiscoResearch created by [Jan-Philipp Harries](https://huggingface.co/jphme) and [Daniel Auras](https://huggingface.co/rasdani) ([DiscoResearch](https://huggingface.co/DiscoResearch), [ellamind](https://ellamind.com)).
|
24 |
+
|
25 |
+
|
26 |
+
## How to use
|
27 |
+
Llama3_DiscoLeo_Instruct_8B_32k_v0.1 uses the [Llama-3 chat template](https://github.com/meta-llama/llama3?tab=readme-ov-file#instruction-tuned-models), which can be easily used with [transformer's chat templating](https://huggingface.co/docs/transformers/main/en/chat_templating).
|
28 |
+
See [below](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1#usage-example) for a usage example.
|
29 |
+
|
30 |
+
## Model Training and Hyperparameters
|
31 |
+
The model was full-fintuned with axolotl on the [hessian.Ai 42](hessian.ai) with 32,768 context-length, learning rate 2e-5 and batch size of 16.
|
32 |
+
|
33 |
+
|
34 |
+
## Evaluation and Results
|
35 |
+
|
36 |
+
We evaluated the model using a suite of common English Benchmarks and their German counterparts with [GermanBench](https://github.com/bjoernpl/GermanBenchmark).
|
37 |
+
|
38 |
+
In the below image and corresponding table, you can see the benchmark scores for the different instruct models compared to Metas instruct version. All checkpoints are available in this [collection](https://huggingface.co/collections/DiscoResearch/discoleo-8b-llama3-for-german-6650527496c0fafefd4c9729).
|
39 |
+
|
40 |
+
![instruct scores](instruct_model_benchmarks.png)
|
41 |
+
|
42 |
+
|
43 |
+
| Model | truthful_qa_de | truthfulqa_mc | arc_challenge | arc_challenge_de | hellaswag | hellaswag_de | MMLU | MMLU-DE | mean |
|
44 |
+
|----------------------------------------------------|----------------|---------------|---------------|------------------|-------------|--------------|-------------|-------------|-------------|
|
45 |
+
| meta-llama/Meta-Llama-3-8B-Instruct | 0.47498 | 0.43923 | **0.59642** | 0.47952 | **0.82025** | 0.60008 | **0.66658** | 0.53541 | 0.57656 |
|
46 |
+
| DiscoResearch/Llama3-German-8B | 0.49499 | 0.44838 | 0.55802 | 0.49829 | 0.79924 | 0.65395 | 0.62240 | 0.54413 | 0.57743 |
|
47 |
+
| DiscoResearch/Llama3-German-8B-32k | 0.48920 | 0.45138 | 0.54437 | 0.49232 | 0.79078 | 0.64310 | 0.58774 | 0.47971 | 0.55982 |
|
48 |
+
| DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1 | **0.53042** | 0.52867 | 0.59556 | **0.53839** | 0.80721 | 0.66440 | 0.61898 | 0.56053 | **0.60552** |
|
49 |
+
| **DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1** | 0.52749 | **0.53245** | 0.58788 | 0.53754 | 0.80770 | **0.66709** | 0.62123 | **0.56238** | 0.60547 |
|
50 |
+
|
51 |
+
## Model Configurations
|
52 |
+
|
53 |
+
We release DiscoLeo-8B in the following configurations:
|
54 |
+
1. [Base model with continued pretraining](https://huggingface.co/DiscoResearch/Llama3-German_8B)
|
55 |
+
2. [Long-context version (32k context length)](https://huggingface.co/DiscoResearch/Llama3_German_8B_32k)
|
56 |
+
3. [Instruction-tuned version of the base model](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_Instruct_8B_v0.1)
|
57 |
+
4. [Instruction-tuned version of the long-context model](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1) (This model)
|
58 |
+
5. [Experimental `DARE-TIES` Merge with Llama3-Instruct](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_8B_DARE_Experimental)
|
59 |
+
6. [Collection of Quantized versions](https://huggingface.co/collections/DiscoResearch/discoleo-8b-quants-6651bcf8f72c9a37ce485d42)
|
60 |
+
|
61 |
+
## Usage Example
|
62 |
+
Here's how to use the model with transformers:
|
63 |
+
```python
|
64 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
65 |
+
|
66 |
+
model = AutoModelForCausalLM.from_pretrained(
|
67 |
+
"DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1",
|
68 |
+
torch_dtype="auto",
|
69 |
+
device_map="auto"
|
70 |
+
)
|
71 |
+
tokenizer = AutoTokenizer.from_pretrained("DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1")
|
72 |
+
|
73 |
+
prompt = "Schreibe ein Essay über die Bedeutung der Energiewende für Deutschlands Wirtschaft"
|
74 |
+
messages = [
|
75 |
+
{"role": "system", "content": "Du bist ein hilfreicher Assistent."},
|
76 |
+
{"role": "user", "content": prompt}
|
77 |
+
]
|
78 |
+
text = tokenizer.apply_chat_template(
|
79 |
+
messages,
|
80 |
+
tokenize=False,
|
81 |
+
add_generation_prompt=True
|
82 |
+
)
|
83 |
+
model_inputs = tokenizer([text], return_tensors="pt").to(device)
|
84 |
+
|
85 |
+
generated_ids = model.generate(
|
86 |
+
model_inputs.input_ids,
|
87 |
+
max_new_tokens=512
|
88 |
+
)
|
89 |
+
generated_ids = [
|
90 |
+
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
91 |
+
]
|
92 |
+
|
93 |
+
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
94 |
+
```
|
95 |
+
|
96 |
+
## Acknowledgements
|
97 |
+
|
98 |
+
The model was trained and evaluated by [Björn Plüster](https://huggingface.co/bjoernp) ([DiscoResearch](https://huggingface.co/DiscoResearch), [ellamind](https://ellamind.com)) with data preparation and project supervision by [Manuel Brack](http://manuel-brack.eu) ([DFKI](https://www.dfki.de/web/), [TU-Darmstadt](https://www.tu-darmstadt.de/)). Initial work on dataset collection and curation was performed by [Malte Ostendorff](https://ostendorff.org) and [Pedro Ortiz Suarez](https://portizs.eu). Instruction tuning was done with the DiscoLM German dataset created by [Jan-Philipp Harries](https://huggingface.co/jphme) and [Daniel Auras](https://huggingface.co/rasdani) ([DiscoResearch](https://huggingface.co/DiscoResearch), [ellamind](https://ellamind.com)). We extend our gratitude to [LAION](https://laion.ai/) and friends, especially [Christoph Schuhmann](https://entwickler.de/experten/christoph-schuhmann) and [Jenia Jitsev](https://huggingface.co/JJitsev), for initiating this collaboration.
|
99 |
+
|
100 |
+
|
101 |
+
The model training was supported by a compute grant at the [42 supercomputer](https://hessian.ai/) which is a central component in the development of [hessian AI](https://hessian.ai/), the [AI Innovation Lab](https://hessian.ai/infrastructure/ai-innovationlab/) (funded by the [Hessian Ministry of Higher Education, Research and the Art (HMWK)](https://wissenschaft.hessen.de) & the [Hessian Ministry of the Interior, for Security and Homeland Security (HMinD)](https://innen.hessen.de)) and the [AI Service Centers](https://hessian.ai/infrastructure/ai-service-centre/) (funded by the [German Federal Ministry for Economic Affairs and Climate Action (BMWK)](https://www.bmwk.de/Navigation/EN/Home/home.html)).
|
102 |
+
The curation of the training data is partially funded by the [German Federal Ministry for Economic Affairs and Climate Action (BMWK)](https://www.bmwk.de/Navigation/EN/Home/home.html)
|
103 |
+
through the project [OpenGPT-X](https://opengpt-x.de/en/) (project no. 68GX21007D).
|
llama3-discoleo-instruct-8b-32k-v0.1.Q2_K.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9a6da2182cbbd614d0aac339babd2435dc7a1fe10b77bcc0779c666a555f6e90
|
3 |
+
size 3179131168
|
llama3-discoleo-instruct-8b-32k-v0.1.Q3_K_L.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:76f7e5a2df537f8dca3b10e844da708bb231af7d83e37b4dfcb4df10f1b682c4
|
3 |
+
size 4321956128
|
llama3-discoleo-instruct-8b-32k-v0.1.Q3_K_M.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b693363175ef1b3351cc5af5f1a9de0286a01dd036dd02bc5ec1c0afeaed827a
|
3 |
+
size 4018917664
|
llama3-discoleo-instruct-8b-32k-v0.1.Q3_K_S.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0627368fa8ba6d087f17e8f473f1ac3ee2122c58690b795840557e6a7132db73
|
3 |
+
size 3664498976
|
llama3-discoleo-instruct-8b-32k-v0.1.Q4_0.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7985a292627de859c1e81be38813bc852bab117609fa49300199f51087f15fae
|
3 |
+
size 4661211424
|
llama3-discoleo-instruct-8b-32k-v0.1.Q4_1.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a86a9aa8598ff5c1f8f64509b8f1c61b42079eac750e648ca6fc1d99aa4ada80
|
3 |
+
size 5130252576
|
llama3-discoleo-instruct-8b-32k-v0.1.Q4_K_M.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b40b79cdee2aeaadc2c8ab5b02a4ba088de9b754aa02e7b8e6e55561ea9936e9
|
3 |
+
size 4920733984
|
llama3-discoleo-instruct-8b-32k-v0.1.Q4_K_S.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c39eec96ec3bf83f32e48ee47384d1f012a8dd1b2726ef3439bbd48e7d634fac
|
3 |
+
size 4692668704
|
llama3-discoleo-instruct-8b-32k-v0.1.Q5_0.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:af92ec23fa526b0cc980322aa85de236ef811c5473b9b4452ee7f43fb6995da5
|
3 |
+
size 5599293728
|
llama3-discoleo-instruct-8b-32k-v0.1.Q5_1.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:064a77f9774892792f51e9350d3fffd521d90f3442ea6ff18ba435cefdc5ef74
|
3 |
+
size 6068334880
|
llama3-discoleo-instruct-8b-32k-v0.1.Q5_K_M.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:24622031fa8df627669d775c0764bccf8a425d47d5fd8e406b90cf85ad823630
|
3 |
+
size 5732987168
|
llama3-discoleo-instruct-8b-32k-v0.1.Q5_K_S.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8b632d1c3135123cd6b172db7354b680871d9f213b501f3057e833eaf846ce8a
|
3 |
+
size 5599293728
|
llama3-discoleo-instruct-8b-32k-v0.1.Q6_K.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7c4076adf4407ba1452d8e8bce9d6cdd5e7d11f4502ed7b260b3b55a635152db
|
3 |
+
size 6596006176
|
llama3-discoleo-instruct-8b-32k-v0.1.Q8_0.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d686c899cd7dcb01f90d970fff5f896d1bf502284278996958220f5b1f7f6280
|
3 |
+
size 8540770592
|
llama3-discoleo-instruct-8b-32k-v0.1.gguf
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d2e486bb54193b1e2ea9258cbe8b0953e46c6cb36891021192a2055804d172f1
|
3 |
+
size 16068890880
|