Training
This model was trained on two datasets, shown in this model page.
- Skylion007/openwebtext: 1,000,000 examples at a batch size of 32-4096 (1 epoch)
- Locutusque/TM-DATA: All examples at a batch size of 12288 (3 epochs) Training took approximately 500 GPU hours on a single Titan V.
Metrics
You can look at the training metrics here: https://wandb.ai/locutusque/TinyMistral-V2/runs/g0rvw6wc
🔥 This model performed excellently on TruthfulQA, outperforming models more than 720x its size. These models include: mistralai/Mixtral-8x7B-v0.1, tiiuae/falcon-180B, berkeley-nest/Starling-LM-7B-alpha, upstage/SOLAR-10.7B-v1.0, and more. 🔥
- Downloads last month
- 1,966
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.