leaderboard-pr-bot commited on
Commit
b6f69e2
1 Parent(s): d30a7a1

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +122 -14
README.md CHANGED
@@ -1,12 +1,11 @@
1
  ---
2
- License: agpl-3.0
3
- Language:
4
- - En
5
- Pipeline_tag: text-generation
6
- Base_model: nvidia/Mistral-NeMo-Minitron-8B-Base
7
- Tags:
8
- - Chat
9
  license: agpl-3.0
 
 
 
 
10
  datasets:
11
  - anthracite-org/kalo-opus-instruct-22k-no-refusal
12
  - Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
@@ -15,12 +14,108 @@ datasets:
15
  - Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
16
  - anthracite-org/kalo_opus_misc_240827
17
  - anthracite-org/kalo_misc_part2
18
- tags:
19
- - chat
20
- language:
21
- - en
22
- base_model:
23
- - nvidia/Mistral-NeMo-Minitron-8B-Base
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ---
25
 
26
  ![](https://huggingface.co/Delta-Vector/Tor-8B/resolve/main/FinalTor8B.jpg)
@@ -205,4 +300,17 @@ Thank you to [Lucy Knada](https://huggingface.co/lucyknada), [Kalomaze](https://
205
  ## Training
206
  The training was done for 4 epochs. (This model is the 2 epoch checkpoint), I used 10 x [A40s](https://www.nvidia.com/en-us/data-center/a40/) GPUs graciously provided by [Kalomaze](https://huggingface.co/kalomaze) for the full-parameter fine-tuning of the model.
207
 
208
- [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
 
 
 
 
 
4
  license: agpl-3.0
5
+ tags:
6
+ - chat
7
+ base_model:
8
+ - nvidia/Mistral-NeMo-Minitron-8B-Base
9
  datasets:
10
  - anthracite-org/kalo-opus-instruct-22k-no-refusal
11
  - Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
 
14
  - Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
15
  - anthracite-org/kalo_opus_misc_240827
16
  - anthracite-org/kalo_misc_part2
17
+ License: agpl-3.0
18
+ Language:
19
+ - En
20
+ Pipeline_tag: text-generation
21
+ Base_model: nvidia/Mistral-NeMo-Minitron-8B-Base
22
+ Tags:
23
+ - Chat
24
+ model-index:
25
+ - name: Tor-8B
26
+ results:
27
+ - task:
28
+ type: text-generation
29
+ name: Text Generation
30
+ dataset:
31
+ name: IFEval (0-Shot)
32
+ type: HuggingFaceH4/ifeval
33
+ args:
34
+ num_few_shot: 0
35
+ metrics:
36
+ - type: inst_level_strict_acc and prompt_level_strict_acc
37
+ value: 23.82
38
+ name: strict accuracy
39
+ source:
40
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Delta-Vector/Tor-8B
41
+ name: Open LLM Leaderboard
42
+ - task:
43
+ type: text-generation
44
+ name: Text Generation
45
+ dataset:
46
+ name: BBH (3-Shot)
47
+ type: BBH
48
+ args:
49
+ num_few_shot: 3
50
+ metrics:
51
+ - type: acc_norm
52
+ value: 31.74
53
+ name: normalized accuracy
54
+ source:
55
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Delta-Vector/Tor-8B
56
+ name: Open LLM Leaderboard
57
+ - task:
58
+ type: text-generation
59
+ name: Text Generation
60
+ dataset:
61
+ name: MATH Lvl 5 (4-Shot)
62
+ type: hendrycks/competition_math
63
+ args:
64
+ num_few_shot: 4
65
+ metrics:
66
+ - type: exact_match
67
+ value: 5.44
68
+ name: exact match
69
+ source:
70
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Delta-Vector/Tor-8B
71
+ name: Open LLM Leaderboard
72
+ - task:
73
+ type: text-generation
74
+ name: Text Generation
75
+ dataset:
76
+ name: GPQA (0-shot)
77
+ type: Idavidrein/gpqa
78
+ args:
79
+ num_few_shot: 0
80
+ metrics:
81
+ - type: acc_norm
82
+ value: 9.84
83
+ name: acc_norm
84
+ source:
85
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Delta-Vector/Tor-8B
86
+ name: Open LLM Leaderboard
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: MuSR (0-shot)
92
+ type: TAUR-Lab/MuSR
93
+ args:
94
+ num_few_shot: 0
95
+ metrics:
96
+ - type: acc_norm
97
+ value: 8.82
98
+ name: acc_norm
99
+ source:
100
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Delta-Vector/Tor-8B
101
+ name: Open LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: MMLU-PRO (5-shot)
107
+ type: TIGER-Lab/MMLU-Pro
108
+ config: main
109
+ split: test
110
+ args:
111
+ num_few_shot: 5
112
+ metrics:
113
+ - type: acc
114
+ value: 30.33
115
+ name: accuracy
116
+ source:
117
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Delta-Vector/Tor-8B
118
+ name: Open LLM Leaderboard
119
  ---
120
 
121
  ![](https://huggingface.co/Delta-Vector/Tor-8B/resolve/main/FinalTor8B.jpg)
 
300
  ## Training
301
  The training was done for 4 epochs. (This model is the 2 epoch checkpoint), I used 10 x [A40s](https://www.nvidia.com/en-us/data-center/a40/) GPUs graciously provided by [Kalomaze](https://huggingface.co/kalomaze) for the full-parameter fine-tuning of the model.
302
 
303
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
304
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
305
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Delta-Vector__Tor-8B)
306
+
307
+ | Metric |Value|
308
+ |-------------------|----:|
309
+ |Avg. |18.33|
310
+ |IFEval (0-Shot) |23.82|
311
+ |BBH (3-Shot) |31.74|
312
+ |MATH Lvl 5 (4-Shot)| 5.44|
313
+ |GPQA (0-shot) | 9.84|
314
+ |MuSR (0-shot) | 8.82|
315
+ |MMLU-PRO (5-shot) |30.33|
316
+