Update README.md
Browse files
README.md
CHANGED
@@ -19,8 +19,7 @@ base_model: Na0s/Llama-3.1-8B-Pruned-4-Layers_LoRA-PEFT-2.0
|
|
19 |
- We then recovered the performance loss induced by the pruning process by fine-tuning (from 0.2642 MMLU-Pro 0-shot to 0.3120), this step is called healing the pruned model.
|
20 |
|
21 |
### Upcoming Work:
|
22 |
-
- More healing through SFT/DPO/TPO to see if we can get closer to the meta-llama/Meta-Llama-3.1-8B performance (which has an MMLU-Pro 0-shot of 0.3659). **(In Progress)**
|
23 |
-
- Evaluate on benchmarks other than MMLU-PRO 0-shot (Unfortunately [lighteval](https://github.com/huggingface/lighteval) is broken right now [issue #191](https://github.com/huggingface/nanotron/issues/191), [issue #213](https://github.com/huggingface/nanotron/issues/213)).
|
24 |
- Compare the same exact process when applied to meta-llama/LLama-3.1-70B.
|
25 |
|
26 |
### Training Details:
|
|
|
19 |
- We then recovered the performance loss induced by the pruning process by fine-tuning (from 0.2642 MMLU-Pro 0-shot to 0.3120), this step is called healing the pruned model.
|
20 |
|
21 |
### Upcoming Work:
|
22 |
+
- More healing through SFT/DPO/TPO to see if we can get closer to the meta-llama/Meta-Llama-3.1-8B performance (which has an MMLU-Pro 0-shot of 0.3659 vs 0.3120 for our model). **(In Progress)**
|
|
|
23 |
- Compare the same exact process when applied to meta-llama/LLama-3.1-70B.
|
24 |
|
25 |
### Training Details:
|