Update README.md
Browse files
README.md
CHANGED
@@ -99,7 +99,7 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
|
|
99 |
| GPT-4 | -| RLHF |8.99| 95.28|
|
100 |
|
101 |
## Other benchmark:
|
102 |
-
1. HuggingFace OpenLLM Leaderboard
|
103 |
| Metric | Value |
|
104 |
|-----------------------|---------------------------|
|
105 |
| ARC (25-shot) | 47.0 |
|
@@ -110,7 +110,7 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
|
|
110 |
| GSM8K (5-shot) | 42.3 |
|
111 |
|
112 |
|
113 |
-
2. BigBench
|
114 |
|
115 |
- Average: 35.26
|
116 |
- Details:
|
@@ -139,7 +139,7 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
|
|
139 |
| bigbench_tracking_shuffled_objects_seven_objects | 0 | multiple_choice_grade | 0.1856| 0.0110 |
|
140 |
| bigbench_tracking_shuffled_objects_three_objects | 0 | multiple_choice_grade | 0.1269| 0.0080 |
|
141 |
|
142 |
-
3. AGI
|
143 |
- Average: 33.23
|
144 |
- Details:
|
145 |
| Task |Version| Metric |Value | |Stderr|
|
|
|
99 |
| GPT-4 | -| RLHF |8.99| 95.28|
|
100 |
|
101 |
## Other benchmark:
|
102 |
+
1. **HuggingFace OpenLLM Leaderboard**
|
103 |
| Metric | Value |
|
104 |
|-----------------------|---------------------------|
|
105 |
| ARC (25-shot) | 47.0 |
|
|
|
110 |
| GSM8K (5-shot) | 42.3 |
|
111 |
|
112 |
|
113 |
+
2. **BigBench**:
|
114 |
|
115 |
- Average: 35.26
|
116 |
- Details:
|
|
|
139 |
| bigbench_tracking_shuffled_objects_seven_objects | 0 | multiple_choice_grade | 0.1856| 0.0110 |
|
140 |
| bigbench_tracking_shuffled_objects_three_objects | 0 | multiple_choice_grade | 0.1269| 0.0080 |
|
141 |
|
142 |
+
3. **AGI Benchmark**:
|
143 |
- Average: 33.23
|
144 |
- Details:
|
145 |
| Task |Version| Metric |Value | |Stderr|
|