Update README.md
Browse files
README.md
CHANGED
@@ -39,6 +39,95 @@ Indeed, the cute catgirl is a paradox wrapped in ruffles and ribbons, a living e
|
|
39 |
So let us raise our teacups in honor of this fabulous feline, this queen of camp who reminds us that life is too short for dull clothing and boring hairstyles. May we all strive to embody her spirit, embracing the absurdity of existence with open arms and a generous helping of glitter. Long live the cute catgirl! [end of text]
|
40 |
```
|
41 |
|
42 |
-
|
43 |
|
44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
So let us raise our teacups in honor of this fabulous feline, this queen of camp who reminds us that life is too short for dull clothing and boring hairstyles. May we all strive to embody her spirit, embracing the absurdity of existence with open arms and a generous helping of glitter. Long live the cute catgirl! [end of text]
|
40 |
```
|
41 |
|
42 |
+
![](https://thicc-af.mywaifulist.moe/waifus/miku-nakano-the-quintessential-quintuplets/phUEiEhPOL75GTDLncGy2dUbkDVMfYExZ2A1RBeQ.png?class=thumbnail)
|
43 |
|
44 |
+
some benchmarks
|
45 |
+
|
46 |
+
```
|
47 |
+
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|
48 |
+
|--------------|------:|------|-----:|----------|-----:|---|-----:|
|
49 |
+
|lambada_openai| 1|none | 0|perplexity|2.6354|± |0.0451|
|
50 |
+
| | |none | 0|acc |0.7879|± |0.0057|
|
51 |
+
|
52 |
+
|
53 |
+
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|
54 |
+
|---------|------:|------|-----:|--------|-----:|---|-----:|
|
55 |
+
|hellaswag| 1|none | 0|acc |0.6851|± |0.0046|
|
56 |
+
| | |none | 0|acc_norm|0.8690|± |0.0034|
|
57 |
+
|
58 |
+
| Tasks |Version|Filter|n-shot|Metric|Value | |Stderr|
|
59 |
+
|----------|------:|------|-----:|------|-----:|---|-----:|
|
60 |
+
|winogrande| 1|none | 0|acc |0.7987|± |0.0113|
|
61 |
+
|
62 |
+
|Tasks|Version| Filter |n-shot| Metric |Value | |Stderr|
|
63 |
+
|-----|------:|----------|-----:|-----------|-----:|---|-----:|
|
64 |
+
|gsm8k| 2|get-answer| 5|exact_match|0.7043|± |0.0126|
|
65 |
+
|
66 |
+
| Tasks |Version|Filter|n-shot|Metric|Value | |Stderr|
|
67 |
+
|---------------------------------------|-------|------|-----:|------|-----:|---|-----:|
|
68 |
+
|mmlu |N/A |none | 0|acc |0.7401|± |0.1192|
|
69 |
+
| - humanities |N/A |none | 0|acc |0.7018|± |0.1281|
|
70 |
+
| - formal_logic | 0|none | 0|acc |0.4841|± |0.0447|
|
71 |
+
| - high_school_european_history | 0|none | 0|acc |0.8303|± |0.0293|
|
72 |
+
| - high_school_us_history | 0|none | 0|acc |0.9020|± |0.0209|
|
73 |
+
| - high_school_world_history | 0|none | 0|acc |0.9198|± |0.0177|
|
74 |
+
| - international_law | 0|none | 0|acc |0.8678|± |0.0309|
|
75 |
+
| - jurisprudence | 0|none | 0|acc |0.8519|± |0.0343|
|
76 |
+
| - logical_fallacies | 0|none | 0|acc |0.8344|± |0.0292|
|
77 |
+
| - moral_disputes | 0|none | 0|acc |0.8121|± |0.0210|
|
78 |
+
| - moral_scenarios | 0|none | 0|acc |0.5642|± |0.0166|
|
79 |
+
| - philosophy | 0|none | 0|acc |0.8167|± |0.0220|
|
80 |
+
| - prehistory | 0|none | 0|acc |0.8611|± |0.0192|
|
81 |
+
| - professional_law | 0|none | 0|acc |0.5854|± |0.0126|
|
82 |
+
| - world_religions | 0|none | 0|acc |0.8889|± |0.0241|
|
83 |
+
| - other |N/A |none | 0|acc |0.7889|± |0.0922|
|
84 |
+
| - business_ethics | 0|none | 0|acc |0.7900|± |0.0409|
|
85 |
+
| - clinical_knowledge | 0|none | 0|acc |0.8113|± |0.0241|
|
86 |
+
| - college_medicine | 0|none | 0|acc |0.7514|± |0.0330|
|
87 |
+
| - global_facts | 0|none | 0|acc |0.5500|± |0.0500|
|
88 |
+
| - human_aging | 0|none | 0|acc |0.7848|± |0.0276|
|
89 |
+
| - management | 0|none | 0|acc |0.8835|± |0.0318|
|
90 |
+
| - marketing | 0|none | 0|acc |0.9145|± |0.0183|
|
91 |
+
| - medical_genetics | 0|none | 0|acc |0.7500|± |0.0435|
|
92 |
+
| - miscellaneous | 0|none | 0|acc |0.8838|± |0.0115|
|
93 |
+
| - nutrition | 0|none | 0|acc |0.7974|± |0.0230|
|
94 |
+
| - professional_accounting | 0|none | 0|acc |0.5922|± |0.0293|
|
95 |
+
| - professional_medicine | 0|none | 0|acc |0.8272|± |0.0230|
|
96 |
+
| - virology | 0|none | 0|acc |0.5361|± |0.0388|
|
97 |
+
| - social_sciences |N/A |none | 0|acc |0.8414|± |0.0514|
|
98 |
+
| - econometrics | 0|none | 0|acc |0.6491|± |0.0449|
|
99 |
+
| - high_school_geography | 0|none | 0|acc |0.8990|± |0.0215|
|
100 |
+
| - high_school_government_and_politics| 0|none | 0|acc |0.9430|± |0.0167|
|
101 |
+
| - high_school_macroeconomics | 0|none | 0|acc |0.7795|± |0.0210|
|
102 |
+
| - high_school_microeconomics | 0|none | 0|acc |0.8277|± |0.0245|
|
103 |
+
| - high_school_psychology | 0|none | 0|acc |0.9064|± |0.0125|
|
104 |
+
| - human_sexuality | 0|none | 0|acc |0.8626|± |0.0302|
|
105 |
+
| - professional_psychology | 0|none | 0|acc |0.8056|± |0.0160|
|
106 |
+
| - public_relations | 0|none | 0|acc |0.7636|± |0.0407|
|
107 |
+
| - security_studies | 0|none | 0|acc |0.8204|± |0.0246|
|
108 |
+
| - sociology | 0|none | 0|acc |0.8856|± |0.0225|
|
109 |
+
| - us_foreign_policy | 0|none | 0|acc |0.9100|± |0.0288|
|
110 |
+
| - stem |N/A |none | 0|acc |0.6505|± |0.1266|
|
111 |
+
| - abstract_algebra | 0|none | 0|acc |0.4100|± |0.0494|
|
112 |
+
| - anatomy | 0|none | 0|acc |0.6444|± |0.0414|
|
113 |
+
| - astronomy | 0|none | 0|acc |0.8224|± |0.0311|
|
114 |
+
| - college_biology | 0|none | 0|acc |0.8681|± |0.0283|
|
115 |
+
| - college_chemistry | 0|none | 0|acc |0.5500|± |0.0500|
|
116 |
+
| - college_computer_science | 0|none | 0|acc |0.6200|± |0.0488|
|
117 |
+
| - college_mathematics | 0|none | 0|acc |0.4200|± |0.0496|
|
118 |
+
| - college_physics | 0|none | 0|acc |0.5392|± |0.0496|
|
119 |
+
| - computer_security | 0|none | 0|acc |0.8300|± |0.0378|
|
120 |
+
| - conceptual_physics | 0|none | 0|acc |0.7362|± |0.0288|
|
121 |
+
| - electrical_engineering | 0|none | 0|acc |0.7034|± |0.0381|
|
122 |
+
| - elementary_mathematics | 0|none | 0|acc |0.5503|± |0.0256|
|
123 |
+
| - high_school_biology | 0|none | 0|acc |0.8742|± |0.0189|
|
124 |
+
| - high_school_chemistry | 0|none | 0|acc |0.6256|± |0.0341|
|
125 |
+
| - high_school_computer_science | 0|none | 0|acc |0.8400|± |0.0368|
|
126 |
+
| - high_school_mathematics | 0|none | 0|acc |0.4370|± |0.0302|
|
127 |
+
| - high_school_physics | 0|none | 0|acc |0.5033|± |0.0408|
|
128 |
+
| - high_school_statistics | 0|none | 0|acc |0.6944|± |0.0314|
|
129 |
+
| - machine_learning | 0|none | 0|acc |0.5982|± |0.0465|
|
130 |
+
```
|
131 |
+
no i do not know why the stderr is high. plausibly it is due to the vllm backend used. this is my lm-eval command in most cases (works on h100):
|
132 |
+
|
133 |
+
`lm_eval --model vllm --model_args pretrained=./miqu-1-70b-sf,tensor_parallel_size=4,dtype=auto,gpu_memory_utilization=0.88,data_parallel_size=2 --tasks mmlu --batch_size 20`
|