New model doesn't appear after Refresh

#100
by Mihaiii - opened

Hi!
I made this model: https://huggingface.co/Mihaiii/gte-micro

I was following the guide from here: https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_model.md

But when I click on the Refresh button, the model still doesn't appear in the leaderboard.

Am I doing something wrong?

I was applying filters after refresh (model size <100M). In that case, the model does't appear.

If I press refresh and don't apply filters, the model is there.

If I first filter and then press refresh, the model is there, but the filter is ignored.

Basically I can only see it in the view that has all the models, in which case I need to carefully scroll through the list and look for it.

Is this expected? The doxs say once per week you refresh the cache - in which day of the week?

Also, it appear not all beckmarks ran. It's not clear to me if I interrupted the run by mistake or if there is a script dependency issue or something else.

Script: https://github.com/embeddings-benchmark/mteb/blob/main/scripts/run_mteb_english.py

Massive Text Embedding Benchmark org

Hello!

Is this expected

That is a bit odd indeed, I'm not quite sure what happened there.

The doxs say once per week you refresh the cache - in which day of the week?

Whenever someone creates an issue exactly like this one. I've restarted the leaderboard.

It indeed looks like not all benchmarks ran:

image.png

Reranking:

image.png

and Retrieval, STS and Summarization did not run at all. Can you check locally if you have results files for those benchmarks?

  • Tom Aarsen

@tomaarsen

Thanks for your response.

The run just stopped at some point so I assumed it was over. I used > /dev/null 2>&1 because there are lots of print statements so I don't have logs.

But I looked into it: if I do TASK_LIST = (TASK_LIST_STS) in the script (click me), then all is good.

Next I tried TASK_LIST = (TASK_LIST_RERANKING) and I get the following error when running the script run_mteb_english.py.
Could you please confirm the issue?

********************** Evaluating MindSmallReranking **********************
INFO:mteb.evaluation.MTEB:Loading dataset for MindSmallReranking
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
Failed to read file 'gzip://7a742da40ba0425a72301598ce27d63296c468da48cd98c4ae479b1d88a755a8::/root/.cache/huggingface/datasets/downloads/7a742da40ba0425a72301598ce27d63296c468da48cd98c4ae479b1d88a755a8' with error <class 'pyarrow.lib.ArrowInvalid'>: JSON parse error: Invalid value. in row 0
ERROR:datasets.packaged_modules.json.json:Failed to read file 'gzip://7a742da40ba0425a72301598ce27d63296c468da48cd98c4ae479b1d88a755a8::/root/.cache/huggingface/datasets/downloads/7a742da40ba0425a72301598ce27d63296c468da48cd98c4ae479b1d88a755a8' with error <class 'pyarrow.lib.ArrowInvalid'>: JSON parse error: Invalid value. in row 0
ERROR:mteb.evaluation.MTEB:Error while evaluating MindSmallReranking: An error occurred while generating the dataset
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/json/json.py", line 145, in _generate_tables
    dataset = json.load(f)
  File "/usr/lib/python3.10/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1995, in _prepare_split_single
    for _, table in generator:
  File "/usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/json/json.py", line 148, in _generate_tables
    raise e
  File "/usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/json/json.py", line 122, in _generate_tables
    pa_table = paj.read_json(
  File "pyarrow/_json.pyx", line 308, in pyarrow._json.read_json
  File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: JSON parse error: Invalid value. in row 0

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/workspace/mteb/scripts/run_mteb_english.py", line 112, in <module>
    evaluation.run(
  File "/usr/local/lib/python3.10/dist-packages/mteb/evaluation/MTEB.py", line 324, in run
    raise e
  File "/usr/local/lib/python3.10/dist-packages/mteb/evaluation/MTEB.py", line 288, in run
    task.load_data(eval_splits=task_eval_splits, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/mteb/abstasks/AbsTask.py", line 39, in load_data
    self.dataset = datasets.load_dataset(**self.metadata_dict["dataset"])
  File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2609, in load_dataset
    builder_instance.download_and_prepare(
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1027, in download_and_prepare
    self._download_and_prepare(
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1122, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1882, in _prepare_split
    for job_id, done, content in self._prepare_split_single(
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 2038, in _prepare_split_single
    raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset
Massive Text Embedding Benchmark org

Hello!

There was a short outage of Hugging Face, perhaps it tried to load the dataset, but couldn't as a result: https://status.huggingface.co/
Could you retry?
Also, if you're saving the models into the same directory as before, it will recognize existing results & skip those tests. With other words, you should be able to just run it with the full task set again, and it'll speedily skip over everything that you've done already.

  • Tom Aarsen

I retried right before writing that comment. I ran the script for about 4 different models today and it never provided all results.

@tomaarsen besides, the initial run (for gte-micro) was before the outage. I installed mteb with pip install mteb. Let me know if I need to use a specific version.

Massive Text Embedding Benchmark org

I'll try and run MindSmallReranking myself

Massive Text Embedding Benchmark org

I have no issues myself.

********************** Evaluating MindSmallReranking **********************
INFO:mteb.evaluation.MTEB:Loading dataset for MindSmallReranking
C:\Users\tom\.conda\envs\mteb\lib\site-packages\huggingface_hub\repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
  warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
Calling download_and_prepare
Called download_and_prepare
INFO:mteb.evaluation.evaluators.RerankingEvaluator:Encoding queries...
Batches: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 2308/2308 [01:48<00:00, 21.25it/s]
INFO:mteb.evaluation.evaluators.RerankingEvaluator:Encoding candidates...
Batches: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 2596/2596 [01:52<00:00, 22.99it/s]
INFO:mteb.evaluation.evaluators.RerankingEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for MindSmallReranking on test took 299.14 seconds
INFO:mteb.evaluation.MTEB:Scores: {'map': 0.30999281853124455, 'mrr': 0.31961836112643854, 'evaluation_time': 299.14}

I think that your downloaded file at root/.cache/huggingface/datasets/downloads/7a742da40ba0425a72301598ce27d63296c468da48cd98c4ae479b1d88a755a8 may be corrupted, i.e. the JSON can't be loaded. You can have a look at this file and see if you can observe a problem. You might be best off to delete it and retry.

  • Tom Aarsen

@tomaarsen I killed that pod, but I'll retry. Thank you for running it on your side.

@tomaarsen

I rented a pod on runpod just for this.
On a clean environment, do the following:

!pip install mteb
from datasets import load_dataset
dataset = load_dataset("mteb/mind_small")

And you should get: https://gist.github.com/Mihaiii/6bf3cfb441a01daadf0cba47d7dab6dc

@tomaarsen Could you please confirm? :)

Closing.

For anyone else experiencing this:
This looks like a regression in the datasets library when reading tar files.

Just uninstall and use an older version:

!echo 'Y' | pip uninstall datasets
!pip install datasets==2.16.0
Mihaiii changed discussion status to closed

Sign up or log in to comment