Update README.md
Browse files
README.md
CHANGED
@@ -48,6 +48,26 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
|
|
48 |
|
49 |
EQ-Bench (v2_de): 61.04 / english (v2): 78.3
|
50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|
52 |
|------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|
53 |
|[Spaetzle-v8-7b](https://huggingface.co/cstr/Spaetzle-v8-7b)| 45.31| 75.69| 63.94| 45.57| 57.63|
|
|
|
48 |
|
49 |
EQ-Bench (v2_de): 61.04 / english (v2): 78.3
|
50 |
|
51 |
+
[ScandEval](https://scandeval.com/german-nlg/) 12.5.2 scores
|
52 |
+
|
53 |
+
| Benchmark | Spaetzle-v8-7b Value |
|
54 |
+
|-----------------------|----------------------------------------------------|
|
55 |
+
| Model ID | cstr/Spaetzle-v8-7b (few-shot, val) |
|
56 |
+
| Parameters | 7242 |
|
57 |
+
| Vocabulary Size | 32 |
|
58 |
+
| Context | 32768 |
|
59 |
+
| Commercial | False |
|
60 |
+
| Speed | 5,980 ± 1,031 / 1,714 ± 552 |
|
61 |
+
| Rank | 1.85 |
|
62 |
+
| GermEval | 58.90 ± 2.30 / 45.55 ± 3.30 |
|
63 |
+
| SB10k | 61.34 ± 1.90 / 72.98 ± 1.30 |
|
64 |
+
| ScaLA-De | 31.58 ± 4.39 / 65.51 ± 2.23 |
|
65 |
+
| GermanQuAD | 24.91 ± 3.98 / 60.88 ± 3.31 |
|
66 |
+
| MLSum | 67.25 ± 1.06 / 22.95 ± 2.64 |
|
67 |
+
| MMLU-De | 34.62 ± 2.20 / 50.43 ± 1.52 |
|
68 |
+
| HellaSwag-De | 48.70 ± 2.47 / 61.05 ± 1.79 |
|
69 |
+
|
70 |
+
|
71 |
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|
72 |
|------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|
73 |
|[Spaetzle-v8-7b](https://huggingface.co/cstr/Spaetzle-v8-7b)| 45.31| 75.69| 63.94| 45.57| 57.63|
|