File size: 12,919 Bytes
2883112
aa63a32
 
2883112
e68eb19
aa63a32
 
 
 
 
 
e68eb19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2883112
aa63a32
 
b457e76
aa63a32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b457e76
aa63a32
 
 
 
 
 
b457e76
aa63a32
 
 
 
 
 
 
 
 
b457e76
aa63a32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b457e76
aa63a32
 
 
 
 
e68eb19
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
---
language:
- de
license: apache-2.0
library_name: transformers
tags:
- pytorch
- german
- deutsch
- mistral
- leolm
model_name: EM German
inference: false
model_creator: jphme
model_type: mistral
pipeline_tag: text-generation
prompt_template: 'Du bist ein hilfreicher Assistent. USER: Was ist 1+1? ASSISTANT:'
model-index:
- name: em_german_leo_mistral
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: acc_norm
      value: 52.82
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jphme/em_german_leo_mistral
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HellaSwag (10-Shot)
      type: hellaswag
      split: validation
      args:
        num_few_shot: 10
    metrics:
    - type: acc_norm
      value: 78.03
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jphme/em_german_leo_mistral
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 50.03
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jphme/em_german_leo_mistral
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: TruthfulQA (0-shot)
      type: truthful_qa
      config: multiple_choice
      split: validation
      args:
        num_few_shot: 0
    metrics:
    - type: mc2
      value: 50.19
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jphme/em_german_leo_mistral
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Winogrande (5-shot)
      type: winogrande
      config: winogrande_xl
      split: validation
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 73.48
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jphme/em_german_leo_mistral
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 5.61
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=jphme/em_german_leo_mistral
      name: Open LLM Leaderboard
---
![EM Logo](em_model_logo_web.jpeg)

LeoLM Mistral is the showcase-model of the EM German model family and as of its release in our opinion the best open German LLM.

**Many thanks to the [LeoLM](https://huggingface.co/LeoLM) team for the publication of a base model that has received continued pretraining with German texts, greatly improving generation capabilities.**

*Please note that the Mistral architecture is very recent and still not supported by all libraries (e.g. AutoGPTQ). In case of any problems, please try a different format/base model.*

# Table of Contents

1. [Introduction](#introduction)
2. [Links & Demos](#links--demos)
   - [Model Links](#model-links)
   - [Demos](#demos)
3. [Prompt Format](#prompt-format)
4. [Example Output](#example-output)
5. [Acknowledgements](#acknowledgements)
6. [Contact](#contact)
7. [Disclaimer](#disclaimer)

# Introduction

**EM German** is a Llama2/Mistral/LeoLM-based model family, finetuned on a large dataset of various instructions in German language. The models are optimized for German text, providing proficiency in understanding, generating, and interacting with German language content.

We offer versions based on 7b, 13b and 70b Llama-2, Mistral and LeoLM (Llama-2/Mistral with continued pretraining on German texts) models.

Please find all Informations, Example Outputs, the special RAG prompt format, output examples and eval results for the EM German Model family in [our Github Repository](https://github.com/jphme/EM_German).  ([Deutsche Version](https://github.com/jphme/EM_German/blob/main/README_DE.md)). You will also find instructions on how to run the models with a GUI (GPT4All/LM Studio).


# Links & Demos

## Model Links

Should you only try one model version, I strongly recommend the **[LeoLM Mistral](https://huggingface.co/jphme/em_german_leo_mistral)** model which offers by far the best combination of performance and computing requirements!

| Base Model | HF    | GPTQ  | GGUF  | AWQ   |
|-------|-------|-------|-------|-------|
| Llama2 7b   | [Link](https://huggingface.co/jphme/em_german_7b_v01) | [Link](https://huggingface.co/TheBloke/em_german_7b_v01-GPTQ) | [Link](https://huggingface.co/TheBloke/em_german_7b_v01-GGUF) | [Link](https://huggingface.co/TheBloke/em_german_7b_v01-AWQ) |
| Llama2 13b  | [Link](https://huggingface.co/jphme/em_german_13b_v01) | [Link](https://huggingface.co/TheBloke/em_german_13b_v01-GPTQ) | [Link](https://huggingface.co/TheBloke/em_german_13b_v01-GGUF) | [Link](https://huggingface.co/TheBloke/em_german_13b_v01-AWQ) |
| Llama2 70b   | [Link](https://huggingface.co/jphme/em_german_70b_v01) | [Link](https://huggingface.co/TheBloke/em_german_70b_v01-GPTQ) | [Link](https://huggingface.co/TheBloke/em_german_70b_v01-GGUF) | [Link](https://huggingface.co/TheBloke/em_german_70b_v01-AWQ) |
| [Mistral 7b](https://huggingface.co/mistralai/Mistral-7B-v0.1) | [Link](https://huggingface.co/jphme/em_german_mistral_v01) | [Link](https://huggingface.co/TheBloke/em_german_mistral_v01-GPTQ) | [Link](https://huggingface.co/TheBloke/em_german_mistral_v01-GGUF) | [Link](https://huggingface.co/TheBloke/em_german_mistral_v01-AWQ) |
| [LeoLM 7b](https://huggingface.co/LeoLM/leo-hessianai-7b) | [Link](https://huggingface.co/jphme/em_german_7b_leo) | [Link](https://huggingface.co/jphme/em_german_7b_leo_gptq) | [Link](hhttps://huggingface.co/jphme/em_german_7b_leo_gguf) | tbc |
| [LeoLM 13b](https://huggingface.co/LeoLM/leo-hessianai-13b) | soon | soon | [Link](https://huggingface.co/jphme/em_german_13b_leo_gguf) | tbc |
| [LeoLM Mistral](https://huggingface.co/LeoLM/leo-mistral-hessianai-7b) | [Link](https://huggingface.co/jphme/em_german_leo_mistral) | [Link](https://huggingface.co/TheBloke/em_german_leo_mistral-GPTQ) | [Link](https://huggingface.co/TheBloke/em_german_leo_mistral-GGUF) | [Link](https://huggingface.co/TheBloke/em_german_leo_mistral-AWQ) |

### Notes about the different versions:
See also the [comparison of example outputs](https://github.com/jphme/EM_German/blob/main/example_output_comparison.md) for a comparison of (7b) model capabilities.

If you get unsatisfying results with one or another EM German version, please try a different (and/or larger) model or version for your usecase.


## Demos:

You can use some of the models with **free** google Colab instances (e.g. the 7bn model in 8bit or the 13b model with GPTQ):

* [Example Colab Notebook for 13b with GPTQ](https://colab.research.google.com/drive/1IJfJdVwGkfe5MYOqHptystR3FBeEUdGn?usp=sharing)
* [Example Colab Notebook for 7b with 8bit-Loading](https://colab.research.google.com/drive/1bsv6vkLM4AlCpSyXA6ol9P32zxZmf7Zu?usp=sharing)
* [Example Colab Notebook for 7b Mistral GGUF with Grammar-based structured output](https://colab.research.google.com/drive/17O-a3D4BnVc_V9Of3dFfed18v1ULu1Sv?usp=sharing)


# Prompt Format

This model follows the Vicuna format without linebreaks (but should work with linebreaks as well). The format is as follows:

```
Du bist ein hilfreicher Assistent. USER: <instruction> ASSISTANT:
```

You can swap the standard system prompt for a better suited one (see below for RAG-tasks).

# Example Output

Please find a selection of various example outputs for all 7b versions of EM German [here](https://github.com/jphme/EM_German/blob/main/example_output_comparison.md).

**Prompt:**
> Erkläre die Bedeutung von drei wesentlichen Elementen der Bilanzierung und des Accountings nach IFRS für Unternehmen, die auf internationaler Ebene tätig sind.

**Antwort EM German Leo Mistral 7b:**
```
Die International Financial Reporting Standards (IFRS) sind eine Reihe von internationalen Rechnungslegungsstandards, die von der International Accounting Standards Board (IASB) entwickelt wurden. Sie sollen dazu beitragen, dass Finanzberichte vergleichbar, zuverlässig und relevant sind.

1. Fair Value: Dieses Konzept ist ein zentraler Bestandteil der IFRS. Es besagt, dass Vermögenswerte und Schulden zum fairen Wert bilanziert werden sollten, was dem Preis entspricht, zu dem sie zwischen Marktteilnehmern gehandelt werden könnten. Dadurch wird sichergestellt, dass das Unternehmen eine genaue Darstellung seiner finanziellen Lage liefert.

2. Konsolidierung: Die IFRS verlangen von Unternehmen, ihre Finanzergebnisse auf konsolidierter Basis darzustellen. Das bedeutet, dass alle Tochtergesellschaften und verbundenen Unternehmen in die Bilanz des Mutterunternehmens aufgenommen werden. Dadurch wird ein klareres Bild von der finanziellen Situation des Konzerns geschaffen.

3. Vollständigkeitsprinzip: Die IFRS verlangen von Unternehmen, alle relevanten Informationen offenzulegen. Dies umfasst sowohl positive als auch negative Faktoren, wie zum Beispiel Gewinne und Verluste, Einnahmen und Ausgaben sowie Risiken und Unsicherheiten. Durch dieses Prinzip wird sichergestellt, dass Investoren, Kreditgeber und andere Stakeholder eine umfassende Einschätzung der finanziellen Lage des Unternehmens vornehmen können.
```

**Prompt:**
> Ein Mensch atmet durchschnittlich 16 Mal pro Minute. Wie oft atmet er an einem Tag?

**Antwort EM German Leo Mistral 7b:**
```
 Zuerst berechnen wir, wie oft eine Person in einer Stunde atmet. Wir wissen, dass eine Stunde 60 Minuten hat, also:

Atemzüge pro Stunde = Atemfrequenz * Minuten pro Stunde
= 16 Atemzüge pro Minute * 60 Minuten
= 960 Atemzüge

Um herauszufinden, wie viele Atemzüge eine Person an einem Tag macht, müssen wir diese stündliche Rate mit der Anzahl der Stunden an einem Tag multiplizieren, also 24:

Atemzüge pro Tag = Atemzüge pro Stunde * Stunden pro Tag
= 960 Atemzüge * 24 Stunden
= 23.040 Atemzüge

Also macht ein durchschnittlicher Mensch etwa 23.040 Atemzüge an einem Tag.
```
(For more examples,  please visit [our Github Repository](https://github.com/jphme/EM_German).)

# Acknowledgements:

Many thanks to [winglian/caseus](https://huggingface.co/winglian) for his great work on Axolotl which I used to train the EM mdoels. I am also grateful to [Jon Durbin](https://huggingface.co/jondurbin) and his [Airoboros](https://huggingface.co/jondurbin/airoboros-l2-70b-2.2.1) models and code from which I borrowed many ideas and code snippets.
Additionally many thanks to [Björn Plüster](https://huggingface.co/bjoernp) and the LeoLM team for the outstanding pretraining work on LeoLM and last but not least many many thanks to [TheBloke](https://huggingface.co/TheBloke) for the preparation of quantized versions in all formats under the sun.
The 70b model was trained with support of the [OVH Cloud Startup Program](https://startup.ovhcloud.com/en/).

# Contact

For detailed feedback & feature requests, please open an issue or get in contact with me via [my website](https://www.jph.me).

*PS: We are also always interested in support for our startup [ellamind](https://ellamind.com), which will offer customized models for business applications in the future (we are currently still in stealth mode). If you use our models for business applications and have advanced needs for specialized capabilities, please get in touch.*

# Disclaimer:

I am not responsible for the actions of third parties who use this model or the outputs of the model. This model should only be used for research purposes. The original base model license applies and is distributed with the model files.
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_jphme__em_german_leo_mistral)

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |51.69|
|AI2 Reasoning Challenge (25-Shot)|52.82|
|HellaSwag (10-Shot)              |78.03|
|MMLU (5-Shot)                    |50.03|
|TruthfulQA (0-shot)              |50.19|
|Winogrande (5-shot)              |73.48|
|GSM8k (5-shot)                   | 5.61|