spow12 leaderboard-pr-bot commited on
Commit
b28f5c6
1 Parent(s): a6e7c20

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (f725eb62a1f528d9e59be0afd3fd1af1b5324289)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +114 -5
README.md CHANGED
@@ -11,11 +11,11 @@ tags:
11
  - mergekit
12
  - merge
13
  base_model:
14
- - anthracite-org/magnum-v2.5-12b-
15
- - Sao10K/MN-12B-Lyra-v4
16
- - Gryphe/Pantheon-RP-1.6.1-12b-Nemo
17
- - Epiculous/Crimson_Dawn-v0.2
18
- - Elizezen/Himeyuri-v0.1-12B
19
  datasets:
20
  - roleplay4fun/aesir-v1.1
21
  - kalomaze/Opus_Instruct_3k
@@ -35,6 +35,101 @@ datasets:
35
  - antiven0m_physical_reasoning_dpo
36
  - aixsatoshi_Swallow_MX_chatbot_DPO
37
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  ---
39
 
40
  # Model Card for Model ID
@@ -220,3 +315,17 @@ By sharing this model, I hope to contribute to the research efforts of our commu
220
  publisher = { Hugging Face }
221
  }
222
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  - mergekit
12
  - merge
13
  base_model:
14
+ - anthracite-org/magnum-v2.5-12b-
15
+ - Sao10K/MN-12B-Lyra-v4
16
+ - Gryphe/Pantheon-RP-1.6.1-12b-Nemo
17
+ - Epiculous/Crimson_Dawn-v0.2
18
+ - Elizezen/Himeyuri-v0.1-12B
19
  datasets:
20
  - roleplay4fun/aesir-v1.1
21
  - kalomaze/Opus_Instruct_3k
 
35
  - antiven0m_physical_reasoning_dpo
36
  - aixsatoshi_Swallow_MX_chatbot_DPO
37
  pipeline_tag: text-generation
38
+ model-index:
39
+ - name: ChatWaifu_v2.0_22B
40
+ results:
41
+ - task:
42
+ type: text-generation
43
+ name: Text Generation
44
+ dataset:
45
+ name: IFEval (0-Shot)
46
+ type: HuggingFaceH4/ifeval
47
+ args:
48
+ num_few_shot: 0
49
+ metrics:
50
+ - type: inst_level_strict_acc and prompt_level_strict_acc
51
+ value: 65.11
52
+ name: strict accuracy
53
+ source:
54
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=spow12/ChatWaifu_v2.0_22B
55
+ name: Open LLM Leaderboard
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: BBH (3-Shot)
61
+ type: BBH
62
+ args:
63
+ num_few_shot: 3
64
+ metrics:
65
+ - type: acc_norm
66
+ value: 42.29
67
+ name: normalized accuracy
68
+ source:
69
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=spow12/ChatWaifu_v2.0_22B
70
+ name: Open LLM Leaderboard
71
+ - task:
72
+ type: text-generation
73
+ name: Text Generation
74
+ dataset:
75
+ name: MATH Lvl 5 (4-Shot)
76
+ type: hendrycks/competition_math
77
+ args:
78
+ num_few_shot: 4
79
+ metrics:
80
+ - type: exact_match
81
+ value: 18.58
82
+ name: exact match
83
+ source:
84
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=spow12/ChatWaifu_v2.0_22B
85
+ name: Open LLM Leaderboard
86
+ - task:
87
+ type: text-generation
88
+ name: Text Generation
89
+ dataset:
90
+ name: GPQA (0-shot)
91
+ type: Idavidrein/gpqa
92
+ args:
93
+ num_few_shot: 0
94
+ metrics:
95
+ - type: acc_norm
96
+ value: 9.96
97
+ name: acc_norm
98
+ source:
99
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=spow12/ChatWaifu_v2.0_22B
100
+ name: Open LLM Leaderboard
101
+ - task:
102
+ type: text-generation
103
+ name: Text Generation
104
+ dataset:
105
+ name: MuSR (0-shot)
106
+ type: TAUR-Lab/MuSR
107
+ args:
108
+ num_few_shot: 0
109
+ metrics:
110
+ - type: acc_norm
111
+ value: 5.59
112
+ name: acc_norm
113
+ source:
114
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=spow12/ChatWaifu_v2.0_22B
115
+ name: Open LLM Leaderboard
116
+ - task:
117
+ type: text-generation
118
+ name: Text Generation
119
+ dataset:
120
+ name: MMLU-PRO (5-shot)
121
+ type: TIGER-Lab/MMLU-Pro
122
+ config: main
123
+ split: test
124
+ args:
125
+ num_few_shot: 5
126
+ metrics:
127
+ - type: acc
128
+ value: 31.51
129
+ name: accuracy
130
+ source:
131
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=spow12/ChatWaifu_v2.0_22B
132
+ name: Open LLM Leaderboard
133
  ---
134
 
135
  # Model Card for Model ID
 
315
  publisher = { Hugging Face }
316
  }
317
  ```
318
+
319
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
320
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_spow12__ChatWaifu_v2.0_22B)
321
+
322
+ | Metric |Value|
323
+ |-------------------|----:|
324
+ |Avg. |28.84|
325
+ |IFEval (0-Shot) |65.11|
326
+ |BBH (3-Shot) |42.29|
327
+ |MATH Lvl 5 (4-Shot)|18.58|
328
+ |GPQA (0-shot) | 9.96|
329
+ |MuSR (0-shot) | 5.59|
330
+ |MMLU-PRO (5-shot) |31.51|
331
+