Spaces:
Running
on
CPU Upgrade
Question: same model with very different scores
Hello, the leaderboard has the same model twice (they both link to the same model page). But the scores are very different. It's mlabonne/NeuralDaredevil-8B-abliterated which scores 27.01 and 21.5
Can someone explain? If I would have to guess maybe it's once with chat template and once without?
The only difference I can see is one's bfloat and the other is float16. My guess is there's a bug in the evaluation of IFeval with bfloat (41 vs 75) since the other evals match up.
Hi @Yuma42 ,
This means that the model was estimated twice, in bf16
and in f16
precisions, so
@phil111
is right. Please, checkout my screenshot where I clicked to show "Precision". Considering low IFEval, it isn't a bug. This model doesn't use the chat template in bfloat16
precision, causing low IFEval score, but as you can see in the request file, float16
version has "use_chat_template: True"
I close this discussion, please, feel free to open a new one in case of any questions!