Mixsmol
Collection
Smol MoE series - in collaboration with Ontocord
•
3 items
•
Updated
This is the first checkpoint (Epoch 1) of Mixsmol-4x400M-v0.1 Note that this is an experimental in data mixing. Therefore, we only trained the model on 50B tokens (95% English and 5% Vietnamese) to test the following:
After verifying our hypothesis with this run, we will schedule a second run on bigger data and compute for it to achieve its maximum capability.
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
arc_challenge | Yaml | none | 25 | acc | 0.1937 | ± | 0.0115 |
none | 25 | acc_norm | 0.2329 | ± | 0.0124 | ||
hellaswag | Yaml | none | 10 | acc | 0.2856 | ± | 0.0045 |
none | 10 | acc_norm | 0.3090 | ± | 0.0046 | ||
mmlu | N/A | none | 0 | acc | 0.2536 | ± | 0.0483 |
- humanities | N/A | none | 5 | acc | 0.2408 | ± | 0.0341 |
- other | N/A | none | 5 | acc | 0.2475 | ± | 0.0443 |
- social_sciences | N/A | none | 5 | acc | 0.2567 | ± | 0.0456 |
- stem | N/A | none | 5 | acc | 0.2756 | ± | 0.0653 |
truthfulqa_mc2 | Yaml | none | 0 | acc | 0.3909 | ± | 0.0148 |
winogrande | Yaml | none | 5 | acc | 0.5107 | ± | 0.014 |
gsm8k | Yaml | get-answer | 5 | exact_match | 0 | ± | 0 |
This work is a shared contribution between Ontocord, BEE-spoke-data and VILM