What datasets were these trained on?
Can you add the datasets this model was trained on to the model card so people know? Id like to know what kind of model this is in general
+1 β¦ what is SMAUG?
SMAUG is the dragon from Lord of the Rings, I believe. If you've read Tolkien. ))
We've now updated to include the datasets that this model was trained on. It still will have many of the qualities of Meta-Llama, but we have tried to improve its reasoning, math and coding skills in particular in this finetune.
More information on the exact technique/data will be released later on. For now, see the previous Smaug paper: https://arxiv.org/abs/2402.13228.
Hello, the DPOP method proposed in Smaug paper is based on preference datasets. However, the datasets provided in the model card are SFT datasets. I was wondering how to convert the provided SFT datasets to preference datasets. Maybe sampling from Llama-3-8B-instruct and using a reward model for rewarding?