SELM-Phi
Collection
4 items
β’
Updated
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment.
This model is a fine-tuned version of microsoft/Phi-3-mini-4k-instruct using synthetic data based on on the HuggingFaceH4/ultrafeedback_binarized dataset.
AlpacaEval 2.0 (LC WR) | MT-Bench (Average) | |
---|---|---|
SELM-Phi-3-mini-4k-instruct-iter-3 | β β ββ 27.98 | β β β 8.32 |
SELM-Phi-3-mini-4k-instruct-iter-2 | β β ββ 26.79 | β β β 8.44 |
SELM-Phi-3-mini-4k-instruct-iter-1 | β β ββ 27.33 | β β β 8.37 |
Phi-3-mini-4k-instruct | β β ββ 23.05 | β β β 8.12 |
The following hyperparameters were used during training: