leaderboard-pr-bot commited on
Commit
21c9131
1 Parent(s): 56925fa

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +112 -4
README.md CHANGED
@@ -1,13 +1,121 @@
1
  ---
2
- license: apache-2.0
3
- datasets:
4
- - PocketDoc/Dans-MemoryCore-CoreCurriculum-Small
5
  language:
6
  - en
 
7
  base_model: mistralai/Mistral-Nemo-Base-2407
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
 
11
  Test model do not use. All of this text is to pad it to the limit and submit an eval on the leaderboard hope you enjoy reading it.
12
 
13
- Trained on the PocketDoc/Dans-MemoryCore-CoreCurriculum-Small dataset using 4x H100 for 2 epochs.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  language:
3
  - en
4
+ license: apache-2.0
5
  base_model: mistralai/Mistral-Nemo-Base-2407
6
+ datasets:
7
+ - PocketDoc/Dans-MemoryCore-CoreCurriculum-Small
8
+ model-index:
9
+ - name: Dans-Instruct-CoreCurriculum-12b-ChatML
10
+ results:
11
+ - task:
12
+ type: text-generation
13
+ name: Text Generation
14
+ dataset:
15
+ name: IFEval (0-Shot)
16
+ type: HuggingFaceH4/ifeval
17
+ args:
18
+ num_few_shot: 0
19
+ metrics:
20
+ - type: inst_level_strict_acc and prompt_level_strict_acc
21
+ value: 4.78
22
+ name: strict accuracy
23
+ source:
24
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Dans-DiscountModels/Dans-Instruct-CoreCurriculum-12b-ChatML
25
+ name: Open LLM Leaderboard
26
+ - task:
27
+ type: text-generation
28
+ name: Text Generation
29
+ dataset:
30
+ name: BBH (3-Shot)
31
+ type: BBH
32
+ args:
33
+ num_few_shot: 3
34
+ metrics:
35
+ - type: acc_norm
36
+ value: 32.02
37
+ name: normalized accuracy
38
+ source:
39
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Dans-DiscountModels/Dans-Instruct-CoreCurriculum-12b-ChatML
40
+ name: Open LLM Leaderboard
41
+ - task:
42
+ type: text-generation
43
+ name: Text Generation
44
+ dataset:
45
+ name: MATH Lvl 5 (4-Shot)
46
+ type: hendrycks/competition_math
47
+ args:
48
+ num_few_shot: 4
49
+ metrics:
50
+ - type: exact_match
51
+ value: 3.78
52
+ name: exact match
53
+ source:
54
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Dans-DiscountModels/Dans-Instruct-CoreCurriculum-12b-ChatML
55
+ name: Open LLM Leaderboard
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: GPQA (0-shot)
61
+ type: Idavidrein/gpqa
62
+ args:
63
+ num_few_shot: 0
64
+ metrics:
65
+ - type: acc_norm
66
+ value: 7.38
67
+ name: acc_norm
68
+ source:
69
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Dans-DiscountModels/Dans-Instruct-CoreCurriculum-12b-ChatML
70
+ name: Open LLM Leaderboard
71
+ - task:
72
+ type: text-generation
73
+ name: Text Generation
74
+ dataset:
75
+ name: MuSR (0-shot)
76
+ type: TAUR-Lab/MuSR
77
+ args:
78
+ num_few_shot: 0
79
+ metrics:
80
+ - type: acc_norm
81
+ value: 12.08
82
+ name: acc_norm
83
+ source:
84
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Dans-DiscountModels/Dans-Instruct-CoreCurriculum-12b-ChatML
85
+ name: Open LLM Leaderboard
86
+ - task:
87
+ type: text-generation
88
+ name: Text Generation
89
+ dataset:
90
+ name: MMLU-PRO (5-shot)
91
+ type: TIGER-Lab/MMLU-Pro
92
+ config: main
93
+ split: test
94
+ args:
95
+ num_few_shot: 5
96
+ metrics:
97
+ - type: acc
98
+ value: 28.67
99
+ name: accuracy
100
+ source:
101
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Dans-DiscountModels/Dans-Instruct-CoreCurriculum-12b-ChatML
102
+ name: Open LLM Leaderboard
103
  ---
104
 
105
 
106
  Test model do not use. All of this text is to pad it to the limit and submit an eval on the leaderboard hope you enjoy reading it.
107
 
108
+ Trained on the PocketDoc/Dans-MemoryCore-CoreCurriculum-Small dataset using 4x H100 for 2 epochs.
109
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
110
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Dans-DiscountModels__Dans-Instruct-CoreCurriculum-12b-ChatML)
111
+
112
+ | Metric |Value|
113
+ |-------------------|----:|
114
+ |Avg. |14.79|
115
+ |IFEval (0-Shot) | 4.78|
116
+ |BBH (3-Shot) |32.02|
117
+ |MATH Lvl 5 (4-Shot)| 3.78|
118
+ |GPQA (0-shot) | 7.38|
119
+ |MuSR (0-shot) |12.08|
120
+ |MMLU-PRO (5-shot) |28.67|
121
+