mteb/leaderboard · New model doesn't appear after Refresh

Mihaiii

Apr 22

Hi!
I made this model: https://huggingface.co/Mihaiii/gte-micro

I was following the guide from here: https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_model.md

But when I click on the Refresh button, the model still doesn't appear in the leaderboard.

Am I doing something wrong?

Mihaiii

Apr 22

I was applying filters after refresh (model size <100M). In that case, the model does't appear.

If I press refresh and don't apply filters, the model is there.

If I first filter and then press refresh, the model is there, but the filter is ignored.

Basically I can only see it in the view that has all the models, in which case I need to carefully scroll through the list and look for it.

Is this expected? The doxs say once per week you refresh the cache - in which day of the week?

Mihaiii

Apr 22

•

edited Apr 22

Also, it appear not all beckmarks ran. It's not clear to me if I interrupted the run by mistake or if there is a script dependency issue or something else.

Script: https://github.com/embeddings-benchmark/mteb/blob/main/scripts/run_mteb_english.py

tomaarsen

Massive Text Embedding Benchmark org Apr 22

Hello!

Is this expected

That is a bit odd indeed, I'm not quite sure what happened there.

The doxs say once per week you refresh the cache - in which day of the week?

Whenever someone creates an issue exactly like this one. I've restarted the leaderboard.

It indeed looks like not all benchmarks ran:

Reranking:

and Retrieval, STS and Summarization did not run at all. Can you check locally if you have results files for those benchmarks?

Tom Aarsen

Mihaiii

Apr 22

@tomaarsen

Thanks for your response.

The run just stopped at some point so I assumed it was over. I used > /dev/null 2>&1 because there are lots of print statements so I don't have logs.

But I looked into it: if I do TASK_LIST = (TASK_LIST_STS) in the script (click me), then all is good.

Next I tried TASK_LIST = (TASK_LIST_RERANKING) and I get the following error when running the script run_mteb_english.py.
Could you please confirm the issue?

********************** Evaluating MindSmallReranking **********************
INFO:mteb.evaluation.MTEB:Loading dataset for MindSmallReranking
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
Failed to read file 'gzip://7a742da40ba0425a72301598ce27d63296c468da48cd98c4ae479b1d88a755a8::/root/.cache/huggingface/datasets/downloads/7a742da40ba0425a72301598ce27d63296c468da48cd98c4ae479b1d88a755a8' with error <class 'pyarrow.lib.ArrowInvalid'>: JSON parse error: Invalid value. in row 0
ERROR:datasets.packaged_modules.json.json:Failed to read file 'gzip://7a742da40ba0425a72301598ce27d63296c468da48cd98c4ae479b1d88a755a8::/root/.cache/huggingface/datasets/downloads/7a742da40ba0425a72301598ce27d63296c468da48cd98c4ae479b1d88a755a8' with error <class 'pyarrow.lib.ArrowInvalid'>: JSON parse error: Invalid value. in row 0
ERROR:mteb.evaluation.MTEB:Error while evaluating MindSmallReranking: An error occurred while generating the dataset
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/json/json.py", line 145, in _generate_tables
    dataset = json.load(f)
  File "/usr/lib/python3.10/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1995, in _prepare_split_single
    for _, table in generator:
  File "/usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/json/json.py", line 148, in _generate_tables
    raise e
  File "/usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/json/json.py", line 122, in _generate_tables
    pa_table = paj.read_json(
  File "pyarrow/_json.pyx", line 308, in pyarrow._json.read_json
  File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: JSON parse error: Invalid value. in row 0

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/workspace/mteb/scripts/run_mteb_english.py", line 112, in <module>
    evaluation.run(
  File "/usr/local/lib/python3.10/dist-packages/mteb/evaluation/MTEB.py", line 324, in run
    raise e
  File "/usr/local/lib/python3.10/dist-packages/mteb/evaluation/MTEB.py", line 288, in run
    task.load_data(eval_splits=task_eval_splits, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/mteb/abstasks/AbsTask.py", line 39, in load_data
    self.dataset = datasets.load_dataset(**self.metadata_dict["dataset"])
  File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2609, in load_dataset
    builder_instance.download_and_prepare(
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1027, in download_and_prepare
    self._download_and_prepare(
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1122, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1882, in _prepare_split
    for job_id, done, content in self._prepare_split_single(
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 2038, in _prepare_split_single
    raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset

tomaarsen

Massive Text Embedding Benchmark org Apr 22

Hello!

There was a short outage of Hugging Face, perhaps it tried to load the dataset, but couldn't as a result: https://status.huggingface.co/
Could you retry?
Also, if you're saving the models into the same directory as before, it will recognize existing results & skip those tests. With other words, you should be able to just run it with the full task set again, and it'll speedily skip over everything that you've done already.

Tom Aarsen

Mihaiii

Apr 22

I retried right before writing that comment. I ran the script for about 4 different models today and it never provided all results.

Mihaiii

Apr 22

•

edited Apr 22

@tomaarsen besides, the initial run (for gte-micro) was before the outage. I installed mteb with pip install mteb. Let me know if I need to use a specific version.

tomaarsen

Massive Text Embedding Benchmark org Apr 22

I'll try and run MindSmallReranking myself

tomaarsen

Massive Text Embedding Benchmark org Apr 22

I have no issues myself.

********************** Evaluating MindSmallReranking **********************
INFO:mteb.evaluation.MTEB:Loading dataset for MindSmallReranking
C:\Users\tom\.conda\envs\mteb\lib\site-packages\huggingface_hub\repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
  warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
Calling download_and_prepare
Called download_and_prepare
INFO:mteb.evaluation.evaluators.RerankingEvaluator:Encoding queries...
Batches: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2308/2308 [01:48<00:00, 21.25it/s]
INFO:mteb.evaluation.evaluators.RerankingEvaluator:Encoding candidates...
Batches: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2596/2596 [01:52<00:00, 22.99it/s]
INFO:mteb.evaluation.evaluators.RerankingEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for MindSmallReranking on test took 299.14 seconds
INFO:mteb.evaluation.MTEB:Scores: {'map': 0.30999281853124455, 'mrr': 0.31961836112643854, 'evaluation_time': 299.14}

I think that your downloaded file at root/.cache/huggingface/datasets/downloads/7a742da40ba0425a72301598ce27d63296c468da48cd98c4ae479b1d88a755a8 may be corrupted, i.e. the JSON can't be loaded. You can have a look at this file and see if you can observe a problem. You might be best off to delete it and retry.

Tom Aarsen

Mihaiii

Apr 22

@tomaarsen I killed that pod, but I'll retry. Thank you for running it on your side.

Mihaiii

Apr 22

•

edited Apr 22

@tomaarsen

I rented a pod on runpod just for this.
On a clean environment, do the following:

!pip install mteb
from datasets import load_dataset
dataset = load_dataset("mteb/mind_small")

And you should get: https://gist.github.com/Mihaiii/6bf3cfb441a01daadf0cba47d7dab6dc

Mihaiii

Apr 23

@tomaarsen Could you please confirm? :)

Mihaiii

Apr 29

Closing.

For anyone else experiencing this:
This looks like a regression in the datasets library when reading tar files.

Just uninstall and use an older version:

!echo 'Y' | pip uninstall datasets
!pip install datasets==2.16.0

Mihaiii changed discussion status to closed Apr 29