Spaces:
Running
on
CPU Upgrade
GPTQ and Mixtral models will need to be relaunched
i don't know why or what happened, but all those failed
see
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/cognitivecomputations/dolphin-2.5-mixtral-8x7b_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/cognitivecomputations/dolphin-2.6-mixtral-8x7b_eval_request_False_bfloat16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/cognitivecomputations/dolphin-2.7-mixtral-8x7b_eval_request_False_bfloat16_Original.json
https://old.reddit.com/r/LocalLLaMA/comments/18s61fb/pressuretested_the_most_popular_opensource_llms/
they also tested dolphin-2.6-mixtral there, so i don't what is causing it to not work here, a re-run of dolphin-2.7-mixtral-8x7b still failed
@CombinHorizon I want to see the Dolphin Mixtrals evaluated to, but apparently they don't use safetensors, hence can't be evaluated.
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/517
(these have also failed)
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/TheBloke/dolphin-2.5-mixtral-8x7b-GPTQ_eval_request_False_GPTQ_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/TheBloke/dolphin-2.6-mixtral-8x7b-GPTQ_eval_request_False_GPTQ_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/TheBloke/dolphin-2.7-mixtral-8x7b-GPTQ_eval_request_False_GPTQ_Original.json
@CombinHorizon Thanks for submitting them for evaluation. I just rechecked and GGUFs and GPTQs are weights only quantizations (WOQ) so they shouldn't have failed for security reasons.
@CombinHorizon I did those long context pressure tests, referring to the screenshot from my Reddit post. But that isnβt related to this leaderboard it was done with a different eval code
Hi everyone!
Thank you @Phil337 for the link on Dolphin Mixtrals evaluation discussion! I guess it's the same problem here. Besides, I should say we're currently solving a tech problem to be able to evaluate GPTQ versions. I'll reschedule these GPTQ versions for evaluation once we'll fix the problem, hopefully by the end of the week
Hi! We have 2 issues here:
- the Mixtral evaluations were sigtermed by our cluster, most likely a TP/DP problem, we need to change something in our backend but it's going to take some time.
- the GPTQ evaluations failing however are a problem of mismatch between some of our requirements - we'll relaunch those as soon as it's updated.