New Leader!

by DKRacingFan - opened Jan 22

Jan 22

Congratulations on reaching an average 78.55 on the hugging face leaderboards! Now the big question is will we reach an 80% average score before February?

mirek190

Jan 22

..I do not know ...
I was making my private tests for understanding and reasoning and common sense of that llm and seems like I talk with finetuned very old llama 65b ... poor results.
For instance mistral instruct 0.2 seems to be much more advanced in understanding, reasoning and common sense . I not even mentioned mixtral 8x7b which is like on totally different level... leaps ahead.

I suspect this model is contaminated and that is why so high on the leaderboard.

bongchoi

Jan 23

•

edited Jan 23

Hi, we haven't trained our model on any datasets other than the three mentioned in our model card

Open-Orca/SlimOrca
jondurbin/truthy-dpo-v0.1
Intel/orca_dpo_pairs

and to the best of our knowledge, these three are not contaminated data.

+ we have tested contamination refer to https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/472
gsm8k: result < 0.1, %: 0.47
truthfulqa: result < 0.1, %: 0.44

contamination test results for other tasks will be updated soon

TomGrc

Jan 23

This comment has been hidden

TomGrc

Jan 23

The data contamination check result in the model card is TBU, which is different from the results mentioned above.

moreh-sungmin

Moreh, Inc. org Jan 23

We will update readme too! Thanks @TomGrc

bongchoi

Jan 23

This comment has been hidden

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment