Spaces:
Running
on
CPU Upgrade
add_followir_tab
I've added the Retrieval w/Instructions tab, which required three changes (besides adding model names):
- Rerankers got an embedding dimension of -1 (but I could do np.inf or something / or I could not include an embedding dimension)?
- Since the main metric is different from the retrieval method (and they are two different abstract tasks), I had to add it as a different tab rather than a sub-retrieval tab. However, it is not included in the main MTEB average score of course, which is left unchanged. I can make a larger code change to allow each sub-tab to have a different metric if we would prefer this to go under retrieval? Either is fine
- I had to add some code (very end of PR) to handle cases where models haven't been evaluated on all abstract tasks (just skipping results for that task, which happens frequently for instruction retrieval).
New Tab:
Showing Main Home screen tab (unchanged)
@Muennighoff thoughts on these changes?
I think this looks amazing. My main high-level comment is that it may be confusing what the difference is between models under Retrieval w/Instructions
and the various instruction-tuned models in the Retrieval
leaderboards. Is there a better way to differentiate them? cc'ing some other people who were involved in merging in FollowIR
@KennethEnevoldsen
@imenelydiaker
for thoughts!
The other comment I have is that I think it would help if we can visually differentiate Cross-Encoders & Bi-Encoders - not sure what's the best way to do it. It may also make sense to have a filtering tab for them at some point cc @tomaarsen
Re the first point, a solution might be to call add a description to the task type:
Re: the second point we might differentiate it using the data suggested modelmeta object see:
https://github.com/embeddings-benchmark/mteb/issues/618
the difference is between models under Retrieval w/Instructions and the various instruction-tuned models in the Retrieval leaderboards. Is there a better way to differentiate them?
@KennethEnevoldsen are you suggesting a description of the abstract task in place of the tab that says "English"? I'm a little confused on the placement. I could also put it in a bullet point/paragraph under the title.
@Muennighoff I could change the name to "InstructionRetrieval" but I was thinking that might get confused with the prompt retrieval abstract task that is in progress. I could also place it as a sub-tab in retrieval, but I think it may cause the same confusion between instruction retrieval models and retrieval data with instructions.
visually differentiate Cross-Encoders & Bi-Encoders
Re: differentiating cross-encoders, I could make a new manual list of cross-encoders and stick some icon/emoji in front of them in the meantime? It does seem like model metadata might be the best solution, if we want to re-update the leaderboard when that PR is done.
@KennethEnevoldsen are you suggesting a description of the abstract task in place of the tab that says "English"? I'm a little confused on the placement. I could also put it in a bullet point/paragraph under the title.
I would put it below the first tab but before the English tab. It does not need to be more than 1-2 lines. Essentially a Layman's version of the abstract
desc
The idea of @KennethEnevoldsen seems like a good solution!
visual
Icons/emojis sound great to me as an intermediate solution!
I merged in main (and all the great changes to the config files) and added a filter tab for CrossEncoders. I also included a short description of each task to resolve the issues (see pictures).
I would say this is good to go, but for some reason the huggingface Github UI is being very weird - it says I made the config file changes and doesn't seem to register they are in main. Any idea what is happening @Muennighoff @KennethEnevoldsen ?
Looks great to me - If it runs fine for you, I think we can just merge & manually check after that all looks fine.
I'm thinking if it's worth also having a Bi-Encoder model type checkbox (similar to how there's both open & prop) but up to you -- cc @tomaarsen
I think it might indeed make sense to add more checkboxes and/or separate the checkboxes into multiple categories. After all, there's
- Open VS Proprietary
- Bi-Encoder vs Cross-Encoder
- Sentence Transformers support
Perhaps we should have these 3 categories?
Makes sense to me, I've also added the bi-encoders checkbox.
We could split into three categories, but currently it's the same functionality for all of them, so it's kinda nice to have them use the same function and be in the same box. I can also make that a separate issue if we want to discuss it further?
If it sounds good to everyone I can merge it this morning and pay close attention afterwards in case it needs a hotfix - it works fine for me locally but just to be sure.
Amazing! Fine to merge from my side!
The box selector is an OR operator as it currently is implemented, so that is Bi-Encoders OR open models.
Should we change it to an AND operator?
Will remove the description for Overall!
- I think it's fine as is! (but we can change it if other people prefer)
- That'd be great 🙌
(but if people prefer 2. as is, i also don't mind; no strong opinion)
Makes sense to me to remove it - I have that in #111
Re: the OR operator I went back and forth on whether we AND or OR is preferable. I think there are valid reasons for both - I personally prefer AND but didn't want to change it without talking about it.