Spaces:
Running
on
CPU Upgrade
Interpretation of result details?
Hello :)
I want to extract some example results of models for demonstration. I am currently struggling with the result details datasets (for example https://huggingface.co/datasets/open-llm-leaderboard-old/details_davidkim205__Rhea-72b-v0.5).
I want to extract the example, the corresponding choices, the predicted answer by a model and if the answer is true or false.
By looking for example on the file of https://huggingface.co/datasets/open-llm-leaderboard-old/details_davidkim205__Rhea-72b-v0.5/blob/main/2024-03-23T20-12-54.617185/details_harness%7Cwinogrande%7C5_2024-03-23T20-12-54.617185.parquet i am wondering why the choices field in the dataset is always empty. I have also seen this behaviour for example on ARC. Why are the choices not in there?
Is there any way to get the requested information out of those files?
Thank you in advance!
Hi
@nicobuko
,
Sorry, we completely missed issues open on the archived version!
Could be a parsing issue, we changed the format of saving between the v1 and the v2 to solve a couple of bugs. Tagging
@alozowski
who might have some time to investigate :)
Hi @nicobuko ,
Thank you for your question and for providing the example! Really sorry that we missed it in this July
Regarding why the choices
field is empty in datasets like the one you referenced (e.g., Winogrande or ARC), here's the reason:
At the time those results were generated, the choices
field was not included for all evaluation types. This was a design decision in the earlier versions of the evaluation setup. Specifically, the required choice details were not explicitly stored in the dataset. Instead, the input_tokens
field can be used to reconstruct the input text, including the missing choices. By using the same tokenizer that was used for encoding, you can decode the input_tokens
to retrieve the original text
Feel free to send your models to our updated Leaderboard V2 and we will be happy to help you in the discussions there!