mteb/leaderboard · add_followir

orionweller

Massive Text Embedding Benchmark org Apr 26

No description provided.

add instruction followingcf7ddc6f

orionweller

Massive Text Embedding Benchmark org Apr 29

I've added the Retrieval w/Instructions tab, which required three changes (besides adding model names):

Rerankers got an embedding dimension of -1 (but I could do np.inf or something / or I could not include an embedding dimension)?
Since the main metric is different from the retrieval method (and they are two different abstract tasks), I had to add it as a different tab rather than a sub-retrieval tab. However, it is not included in the main MTEB average score of course, which is left unchanged. I can make a larger code change to allow each sub-tab to have a different metric if we would prefer this to go under retrieval? Either is fine
I had to add some code (very end of PR) to handle cases where models haven't been evaluated on all abstract tasks (just skipping results for that task, which happens frequently for instruction retrieval).

New Tab:

Showing Main Home screen tab (unchanged)

@Muennighoff thoughts on these changes?

orionweller changed pull request status to open Apr 29

Muennighoff

Massive Text Embedding Benchmark org May 1

I think this looks amazing. My main high-level comment is that it may be confusing what the difference is between models under Retrieval w/Instructions and the various instruction-tuned models in the Retrieval leaderboards. Is there a better way to differentiate them? cc'ing some other people who were involved in merging in FollowIR @KennethEnevoldsen @imenelydiaker for thoughts!

Muennighoff

Massive Text Embedding Benchmark org May 1

The other comment I have is that I think it would help if we can visually differentiate Cross-Encoders & Bi-Encoders - not sure what's the best way to do it. It may also make sense to have a filtering tab for them at some point cc @tomaarsen

KennethEnevoldsen

Massive Text Embedding Benchmark org May 2

Re the first point, a solution might be to call add a description to the task type:

Re: the second point we might differentiate it using the data suggested modelmeta object see:
https://github.com/embeddings-benchmark/mteb/issues/618

orionweller

Massive Text Embedding Benchmark org May 2

the difference is between models under Retrieval w/Instructions and the various instruction-tuned models in the Retrieval leaderboards. Is there a better way to differentiate them?

@KennethEnevoldsen are you suggesting a description of the abstract task in place of the tab that says "English"? I'm a little confused on the placement. I could also put it in a bullet point/paragraph under the title.

@Muennighoff I could change the name to "InstructionRetrieval" but I was thinking that might get confused with the prompt retrieval abstract task that is in progress. I could also place it as a sub-tab in retrieval, but I think it may cause the same confusion between instruction retrieval models and retrieval data with instructions.

visually differentiate Cross-Encoders & Bi-Encoders

Re: differentiating cross-encoders, I could make a new manual list of cross-encoders and stick some icon/emoji in front of them in the meantime? It does seem like model metadata might be the best solution, if we want to re-update the leaderboard when that PR is done.

KennethEnevoldsen

Massive Text Embedding Benchmark org May 2

@KennethEnevoldsen are you suggesting a description of the abstract task in place of the tab that says "English"? I'm a little confused on the placement. I could also put it in a bullet point/paragraph under the title.

I would put it below the first tab but before the English tab. It does not need to be more than 1-2 lines. Essentially a Layman's version of the abstract

Muennighoff

Massive Text Embedding Benchmark org May 3

•

edited May 3

desc

The idea of @KennethEnevoldsen seems like a good solution!

visual

Icons/emojis sound great to me as an intermediate solution!

update0d0563c2

merge in mainaeb9d609

minor cleanupb5c28bdf

orionweller

Massive Text Embedding Benchmark org May 7

I merged in main (and all the great changes to the config files) and added a filter tab for CrossEncoders. I also included a short description of each task to resolve the issues (see pictures).

New Tab for instructions:

Overall tab, for reference:

I would say this is good to go, but for some reason the huggingface Github UI is being very weird - it says I made the config file changes and doesn't seem to register they are in main. Any idea what is happening @Muennighoff @KennethEnevoldsen ?

Muennighoff

Massive Text Embedding Benchmark org May 7

•

edited May 7

Looks great to me - If it runs fine for you, I think we can just merge & manually check after that all looks fine.

I'm thinking if it's worth also having a Bi-Encoder model type checkbox (similar to how there's both open & prop) but up to you -- cc @tomaarsen

tomaarsen

Massive Text Embedding Benchmark org May 8

I think it might indeed make sense to add more checkboxes and/or separate the checkboxes into multiple categories. After all, there's

Open VS Proprietary
Bi-Encoder vs Cross-Encoder
Sentence Transformers support

Perhaps we should have these 3 categories?

add bi-encoder button77cc9e7a

orionweller

Massive Text Embedding Benchmark org May 8

Makes sense to me, I've also added the bi-encoders checkbox.

We could split into three categories, but currently it's the same functionality for all of them, so it's kinda nice to have them use the same function and be in the same box. I can also make that a separate issue if we want to discuss it further?

If it sounds good to everyone I can merge it this morning and pay close attention afterwards in case it needs a hotfix - it works fine for me locally but just to be sure.

Muennighoff

Massive Text Embedding Benchmark org May 8

Amazing! Fine to merge from my side!

orionweller changed pull request status to merged May 8

Muennighoff

Massive Text Embedding Benchmark org May 8

Is it intended that this selection shows Cross-Encoders?

Also can we remove the description for the Overall tab? It is kind of superfluous there and takes up unnecessary space imo

orionweller

Massive Text Embedding Benchmark org May 8

•

edited May 8

The box selector is an OR operator as it currently is implemented, so that is Bi-Encoders OR open models.

Should we change it to an AND operator?

Will remove the description for Overall!

Muennighoff

Massive Text Embedding Benchmark org May 8

I think it's fine as is! (but we can change it if other people prefer)
That'd be great 🙌

Muennighoff

Massive Text Embedding Benchmark org May 8

(but if people prefer 2. as is, i also don't mind; no strong opinion)

orionweller

Massive Text Embedding Benchmark org May 8

Makes sense to me to remove it - I have that in #111

Re: the OR operator I went back and forth on whether we AND or OR is preferable. I think there are valid reasons for both - I personally prefer AND but didn't want to change it without talking about it.

Spaces:

mteb
/

leaderboard

Running on CPU Upgrade

add_followir_tab