File size: 564 Bytes
62977bb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# BEIR v1.0.0 contriever-msmarco

This index was generated on 20230124 using Tevatron with following command: 

```
python -m tevatron.driver.encode \
--output_dir=temp \
--model_name_or_path facebook/contriever-msmarco \
--fp16 \
--tokenizer_name bert-base-uncased \
--per_device_eval_batch_size 156 \
--p_max_len 512 \
--dataset_name Tevatron/beir-corpus:$subdataset \
--encoded_save_path beir_embeddings/corpus_emb.$subdataset.pkl
```

where the `subdataset` is one of the BEIR dataset, e.g. `scifact`.

The Embedding is then converted to Pyserini index format.