Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

1020

Missing "Adding Evaluation Results" PRs for models already evaluated.

#953

by Pretergeek - opened Sep 29

Discussion

Pretergeek

Sep 29

Hello,
It seems I am missing the usual "Adding Evaluation Results" PR for the last three models I submitted to the leaderboard that have successfully finished evaluation about 29 days ago. I would love to add the results to the models' cards. Here are the links for the requests and results of those models, hope this is enough.

Requests:
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Pretergeek/OpenChat-3.5-0106_8.11B_36Layers-Interleaved_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Pretergeek/OpenChat-3.5-0106_8.99B_40Layers-Interleaved_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Pretergeek/OpenChat-3.5-0106_10.7B_48Layers-Interleaved_eval_request_False_bfloat16_Original.json

Results:
https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/Pretergeek/OpenChat-3.5-0106_8.11B_36Layers-Interleaved/results_2024-08-31T20-59-36.400699.json
https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/Pretergeek/OpenChat-3.5-0106_8.99B_40Layers-Interleaved/results_2024-08-31T20-50-23.499875.json
https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/Pretergeek/OpenChat-3.5-0106_10.7B_48Layers-Interleaved/results_2024-08-31T20-53-58.555445.json

PS: An educated guess, but I believe the missing PRs might have been a result of the models evaluations having been restarted after a previous failure.

Thank you in advance,
@Pretergeek

alozowski

Open LLM Leaderboard org Sep 30

Hi @Pretergeek ,

Thanks for providing all the links!

I've checked your models, you can find them under Merge / Moerge flag now, so it should be good. I've also checked the details for these models, everything is correct – can I check something else or is it good?

Pretergeek

Oct 1

•

edited Oct 1

Thank you, but I have found the models on the leaderboard without problem. What I was referring to was the automated PR from leaderboard-pr-bot to add the evaluation results to the model's README.md as metadata, it added a nice widget with the results to the model's card. Here is an example of a PR like that received in July for a previous model: https://huggingface.co/Pretergeek/OpenChat-3.5-0106_8.99B_40Layers-Appended/commit/256ee57906e1fb7768e51204c544aee1a0b31a2f

That functionality is still described on the documentation here: https://huggingface.co/docs/hub/model-cards#evaluation-results

Edit: Looking at those old PRs by leaderboard-pr-bot I realised that they were generated by a space created by user @Weyaxi that no longer exists. So I guess the functionality was probably not part of the leaderboard itself and I will have to add the metadata to the model's card myself.

Weyaxi

Oct 1

Hi @Pretergeek ,

Some users have been using the space to spam the model authors, which is something I never expected when I first created the space.

There are currently some open PRs for New models because I have an automated script that opens PRs when new models are added, but sometimes the script misses certain models.

I'll try to open up the space today or tomorrow.

In the meantime, please send me the model names you want PRs opened for.

I can help you manually :)

Pretergeek

Oct 1

Hello, @Weyaxi ,

I am sorry to hear that the space is being misused. It is quite handy as I clearly mistook it as part of the leaderboard functionalities. Nonetheless, thank you for your offer to help. Here is a list of the models:
Pretergeek/OpenChat-3.5-0106_8.11B_36Layers-Interleaved
Pretergeek/OpenChat-3.5-0106_8.99B_40Layers-Interleaved
Pretergeek/OpenChat-3.5-0106_10.7B_48Layers-Interleaved

It is probably best if I close this discussion since it is not a problem with the leaderboard, my mistake @alozowski .

Pretergeek changed discussion status to closed Oct 1

Weyaxi

Oct 1

Hi @Pretergeek ,

I have opened the PRs for the models you specified. As I mentioned earlier, I'll try to open the space today or tomorrow. I'll notify you here once it's done.

Have a nice day!

alozowski

Open LLM Leaderboard org Oct 1

No worries @Pretergeek ! Unfortunately I didn't understand the situation so thank you @Weyaxi for your prompt help!

Weyaxi

Oct 1

•

edited Oct 1

Hi @Pretergeek and everyone!

The space is now functional and public thanks to the great help of @Wauplin ! Unfortunately, due to misuse by some users, a login is now required. The user leaderboard-pr-bot will only be used by my private automated script and myself!

Space link:

https://huggingface.co/spaces/Weyaxi/leaderboard-results-to-modelcard

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment