Mixsmol
Collection
Smol MoE series - in collaboration with Ontocord
•
3 items
•
Updated
This is the third checkpoint (Epoch 3) of Mixsmol-4x400M-v0.1 Note that this is an experimental in data mixing. Therefore, we only trained the model on 50B tokens (95% English and 5% Vietnamese) to test the following:
After verifying our hypothesis with this run, we will schedule a second run on bigger data and compute for it to achieve its maximum capability.