macadeliccc
commited on
Commit
•
d1aab33
1
Parent(s):
f6e7dac
Update README.md
Browse files
README.md
CHANGED
@@ -67,10 +67,89 @@ print(generate_response(prompt), "\n")
|
|
67 |
|
68 |
## Eval
|
69 |
|
70 |
-
<script src="https://gist.github.com/tdolan21/57404d06a9c102904848b795fdaabef3.js"></script>
|
71 |
-
|
72 |
evaluation [colab](https://colab.research.google.com/drive/1FpwgsGzCR4tORTxAwUxpN3PcP22En2xk?usp=sharing)
|
73 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
## Citations
|
75 |
|
76 |
Fernando Fernandes Neto and Eric Hartford. "Optimizing Large Language Models Using Layer-Selective Rank Reduction and Random Matrix Theory." 2024.
|
|
|
67 |
|
68 |
## Eval
|
69 |
|
|
|
|
|
70 |
evaluation [colab](https://colab.research.google.com/drive/1FpwgsGzCR4tORTxAwUxpN3PcP22En2xk?usp=sharing)
|
71 |
|
72 |
+
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|
73 |
+
|---------------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|
74 |
+
|[laser-dolphin-mixtral-2x7b-dpo](https://huggingface.co/macadeliccc/laser-dolphin-mixtral-2x7b-dpo)| 41.31| 73.67| 61.69| 42.79| 54.87|
|
75 |
+
|
76 |
+
### AGIEval
|
77 |
+
| Task |Version| Metric |Value| |Stderr|
|
78 |
+
|------------------------------|------:|--------|----:|---|-----:|
|
79 |
+
|agieval_aqua_rat | 0|acc |22.44|± | 2.62|
|
80 |
+
| | |acc_norm|21.26|± | 2.57|
|
81 |
+
|agieval_logiqa_en | 0|acc |34.87|± | 1.87|
|
82 |
+
| | |acc_norm|35.79|± | 1.88|
|
83 |
+
|agieval_lsat_ar | 0|acc |22.17|± | 2.75|
|
84 |
+
| | |acc_norm|23.04|± | 2.78|
|
85 |
+
|agieval_lsat_lr | 0|acc |43.14|± | 2.20|
|
86 |
+
| | |acc_norm|45.10|± | 2.21|
|
87 |
+
|agieval_lsat_rc | 0|acc |57.25|± | 3.02|
|
88 |
+
| | |acc_norm|55.76|± | 3.03|
|
89 |
+
|agieval_sat_en | 0|acc |71.84|± | 3.14|
|
90 |
+
| | |acc_norm|71.84|± | 3.14|
|
91 |
+
|agieval_sat_en_without_passage| 0|acc |44.17|± | 3.47|
|
92 |
+
| | |acc_norm|41.75|± | 3.44|
|
93 |
+
|agieval_sat_math | 0|acc |40.91|± | 3.32|
|
94 |
+
| | |acc_norm|35.91|± | 3.24|
|
95 |
+
|
96 |
+
Average: 41.31%
|
97 |
+
|
98 |
+
### GPT4All
|
99 |
+
| Task |Version| Metric |Value| |Stderr|
|
100 |
+
|-------------|------:|--------|----:|---|-----:|
|
101 |
+
|arc_challenge| 0|acc |58.02|± | 1.44|
|
102 |
+
| | |acc_norm|60.58|± | 1.43|
|
103 |
+
|arc_easy | 0|acc |85.48|± | 0.72|
|
104 |
+
| | |acc_norm|82.62|± | 0.78|
|
105 |
+
|boolq | 1|acc |87.16|± | 0.59|
|
106 |
+
|hellaswag | 0|acc |65.04|± | 0.48|
|
107 |
+
| | |acc_norm|83.63|± | 0.37|
|
108 |
+
|openbookqa | 0|acc |35.60|± | 2.14|
|
109 |
+
| | |acc_norm|45.00|± | 2.23|
|
110 |
+
|piqa | 0|acc |81.99|± | 0.90|
|
111 |
+
| | |acc_norm|83.51|± | 0.87|
|
112 |
+
|winogrande | 0|acc |73.16|± | 1.25|
|
113 |
+
|
114 |
+
Average: 73.67%
|
115 |
+
|
116 |
+
### TruthfulQA
|
117 |
+
| Task |Version|Metric|Value| |Stderr|
|
118 |
+
|-------------|------:|------|----:|---|-----:|
|
119 |
+
|truthfulqa_mc| 1|mc1 |44.31|± | 1.74|
|
120 |
+
| | |mc2 |61.69|± | 1.50|
|
121 |
+
|
122 |
+
Average: 61.69%
|
123 |
+
|
124 |
+
### Bigbench
|
125 |
+
| Task |Version| Metric |Value| |Stderr|
|
126 |
+
|------------------------------------------------|------:|---------------------|----:|---|-----:|
|
127 |
+
|bigbench_causal_judgement | 0|multiple_choice_grade|59.47|± | 3.57|
|
128 |
+
|bigbench_date_understanding | 0|multiple_choice_grade|66.67|± | 2.46|
|
129 |
+
|bigbench_disambiguation_qa | 0|multiple_choice_grade|36.05|± | 3.00|
|
130 |
+
|bigbench_geometric_shapes | 0|multiple_choice_grade|20.33|± | 2.13|
|
131 |
+
| | |exact_str_match | 7.52|± | 1.39|
|
132 |
+
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|27.80|± | 2.01|
|
133 |
+
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|19.86|± | 1.51|
|
134 |
+
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|48.67|± | 2.89|
|
135 |
+
|bigbench_movie_recommendation | 0|multiple_choice_grade|49.60|± | 2.24|
|
136 |
+
|bigbench_navigate | 0|multiple_choice_grade|53.20|± | 1.58|
|
137 |
+
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|68.50|± | 1.04|
|
138 |
+
|bigbench_ruin_names | 0|multiple_choice_grade|41.74|± | 2.33|
|
139 |
+
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|16.23|± | 1.17|
|
140 |
+
|bigbench_snarks | 0|multiple_choice_grade|64.09|± | 3.58|
|
141 |
+
|bigbench_sports_understanding | 0|multiple_choice_grade|70.69|± | 1.45|
|
142 |
+
|bigbench_temporal_sequences | 0|multiple_choice_grade|37.70|± | 1.53|
|
143 |
+
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|23.44|± | 1.20|
|
144 |
+
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|17.60|± | 0.91|
|
145 |
+
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|48.67|± | 2.89|
|
146 |
+
|
147 |
+
Average: 42.79%
|
148 |
+
|
149 |
+
Average score: 54.87%
|
150 |
+
|
151 |
+
Elapsed time: 02:53:28
|
152 |
+
|
153 |
## Citations
|
154 |
|
155 |
Fernando Fernandes Neto and Eric Hartford. "Optimizing Large Language Models Using Layer-Selective Rank Reduction and Random Matrix Theory." 2024.
|