Leaderboard evaluation failed.

by adamo1139 - opened Dec 21, 2023

Dec 21, 2023

I was curious how this one would pan out on the leaderboard, but it failed evaluation for some reason.
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/brucethemoose/Yi-34B-200K-DARE-merge-v5_eval_request_False_bfloat16_Original.json

brucethemoose

Owner Dec 21, 2023

•

edited Dec 21, 2023

The 200K models are unreliable on the leaderboard. I think they can make the evaluation servers OOM because they try to load the 200k context.

HF staff just have to manually fix it, perhaps we should bring their attention to it in a post on the request page? I don't even care about the leaderboard position really, I just want the best 200k model possible and want a datapoint against the other merges and 200Ks :P

adamo1139

Dec 21, 2023

I will open a discussion on the leaderboard community page about this later today (in 12 hours) unless I see you doing it first.
I assume you've tried the model yourself via exl2/gguf quants only due to limited vram, yes? Can you check whether it loads in transformers if you set load_in_4bit=True and manually edit max_position_embeddings to a lower value? I have limited bandwidth so I can't download it this month to verify that myself.

brucethemoose

Owner Dec 21, 2023

•

edited Dec 21, 2023

Yeah, bnb transformers is how I always test them first, before quantizing. In fact I go through a few merge variants with bnb 4 bit and pick the best one.

Transformers is quite a RAM hog at long context though. I can fit 3K context with bnb, and 47K context with exllamav2 4bpw.

adamo1139 changed discussion status to closed Dec 28, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment