Missing "Adding Evaluation Results" PRs for models already evaluated.

#953
by Pretergeek - opened

Hello,
It seems I am missing the usual "Adding Evaluation Results" PR for the last three models I submitted to the leaderboard that have successfully finished evaluation about 29 days ago. I would love to add the results to the models' cards. Here are the links for the requests and results of those models, hope this is enough.

Requests:
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Pretergeek/OpenChat-3.5-0106_8.11B_36Layers-Interleaved_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Pretergeek/OpenChat-3.5-0106_8.99B_40Layers-Interleaved_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Pretergeek/OpenChat-3.5-0106_10.7B_48Layers-Interleaved_eval_request_False_bfloat16_Original.json

Results:
https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/Pretergeek/OpenChat-3.5-0106_8.11B_36Layers-Interleaved/results_2024-08-31T20-59-36.400699.json
https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/Pretergeek/OpenChat-3.5-0106_8.99B_40Layers-Interleaved/results_2024-08-31T20-50-23.499875.json
https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/Pretergeek/OpenChat-3.5-0106_10.7B_48Layers-Interleaved/results_2024-08-31T20-53-58.555445.json

PS: An educated guess, but I believe the missing PRs might have been a result of the models evaluations having been restarted after a previous failure.

Thank you in advance,
@Pretergeek

Open LLM Leaderboard org

Hi @Pretergeek ,

Thanks for providing all the links!

I've checked your models, you can find them under Merge / Moerge flag now, so it should be good. I've also checked the details for these models, everything is correct โ€“ can I check something else or is it good?
Screenshot 2024-09-30 at 15.56.23.png

Thank you, but I have found the models on the leaderboard without problem. What I was referring to was the automated PR from leaderboard-pr-bot to add the evaluation results to the model's README.md as metadata, it added a nice widget with the results to the model's card. Here is an example of a PR like that received in July for a previous model: https://huggingface.co/Pretergeek/OpenChat-3.5-0106_8.99B_40Layers-Appended/commit/256ee57906e1fb7768e51204c544aee1a0b31a2f

That functionality is still described on the documentation here: https://huggingface.co/docs/hub/model-cards#evaluation-results

Edit: Looking at those old PRs by leaderboard-pr-bot I realised that they were generated by a space created by user @Weyaxi that no longer exists. So I guess the functionality was probably not part of the leaderboard itself and I will have to add the metadata to the model's card myself.

Hi @Pretergeek ,

Some users have been using the space to spam the model authors, which is something I never expected when I first created the space.

There are currently some open PRs for New models because I have an automated script that opens PRs when new models are added, but sometimes the script misses certain models.

I'll try to open up the space today or tomorrow.

In the meantime, please send me the model names you want PRs opened for.

I can help you manually :)

Hello, @Weyaxi ,

I am sorry to hear that the space is being misused. It is quite handy as I clearly mistook it as part of the leaderboard functionalities. Nonetheless, thank you for your offer to help. Here is a list of the models:
Pretergeek/OpenChat-3.5-0106_8.11B_36Layers-Interleaved
Pretergeek/OpenChat-3.5-0106_8.99B_40Layers-Interleaved
Pretergeek/OpenChat-3.5-0106_10.7B_48Layers-Interleaved

It is probably best if I close this discussion since it is not a problem with the leaderboard, my mistake @alozowski .

Pretergeek changed discussion status to closed

Hi @Pretergeek ,

I have opened the PRs for the models you specified. As I mentioned earlier, I'll try to open the space today or tomorrow. I'll notify you here once it's done.

Have a nice day!

Open LLM Leaderboard org

No worries @Pretergeek ! Unfortunately I didn't understand the situation so thank you @Weyaxi for your prompt help!

Hi @Pretergeek and everyone!

The space is now functional and public thanks to the great help of @Wauplin ! Unfortunately, due to misuse by some users, a login is now required. The user leaderboard-pr-bot will only be used by my private automated script and myself!

Space link:

https://huggingface.co/spaces/Weyaxi/leaderboard-results-to-modelcard

Sign up or log in to comment