Pyserini Reproductions: MS MARCO V1 Passage

The two-click^* reproduction matrix below provides commands for reproducing experimental results reported in a number of papers, denoted by the references in square brackets. Instructions for programmatic execution are shown at the bottom of this page (scroll down).

$rows

			TREC 2019				TREC 2020				dev
			AP	nDCG@10	R@1K		AP	nDCG@10	R@1K		RR@10	R@1K

[1] Xueguang Ma, Ronak Pradeep, Rodrigo Nogueira, and Jimmy Lin. Document Expansions and Learned Sparse Lexical Representations for MS MARCO V1 and V2. Proceedings of the 45th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022), July 2022.

Numbers in parentheses correspond to rows in Table 1 of the paper.
[2] Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022), May 2022.
[3] Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul N. Bennett, Junaid Ahmed, and Arnold Overwijk. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. Proceedings of the 9th International Conference on Learning Representations (ICLR 2021), May 2021.
[4] Sebastian Hofstätter, Sophia Althammer, Michael Schröder, Mete Sertkan, and Allan Hanbury. Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation. arXiv:2010.02666, October 2020.
[5] Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin, and Allan Hanbury. Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling. Proceedings of the 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), pages 113-122, July 2021.
[6] Sheng-Chieh Lin, Jheng-Hong Yang, and Jimmy Lin. In-Batch Negatives for Knowledge Distillation with Tightly-Coupled Teachers for Dense Retrieval. Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021), pages 163-173, August 2021.
[7] Minghan Li, Sheng-Chieh Lin, Xueguang Ma, Jimmy Lin. SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes. arXiv:2302.06587, Feburary 2023.
[8] Sheng-Chieh Lin, Minghan Li and Jimmy Lin. Aggretriever: A Simple Approach to Aggregate Textual Representation for Robust Dense Passage Retrieval. arXiv:2208.00511, July 2022.

Programmatic Execution

All experimental runs shown in the above table can be programmatically executed based on the instructions below. To list all the experimental conditions:

python -m pyserini.2cr.msmarco --collection v1-passage --list-conditions

These conditions correspond to the table rows above.

For all conditions, just show the commands in a "dry run":

python -m pyserini.2cr.msmarco --collection v1-passage --all --display-commands --dry-run

To actually run all the experimental conditions:

python -m pyserini.2cr.msmarco --collection v1-passage --all --display-commands

With the above command, run files will be placed in the current directory. Use the option --directory runs/ to place the runs in a sub-directory.

To show the commands for a specific condition:

python -m pyserini.2cr.msmarco --collection v1-passage --condition bm25-default --display-commands --dry-run

This will generate exactly the commands for a specific condition above (corresponding to a row in the table).

To actually run a specific condition:

python -m pyserini.2cr.msmarco --collection v1-passage --condition bm25-default --display-commands

Again, with the above command, run files will be placed in the current directory. Use the option --directory runs/ to place the runs in a sub-directory.

Finally, to generate this page:

python -m pyserini.2cr.msmarco --collection v1-passage --generate-report --output msmarco-v1-passage.html

The output file msmarco-v1-passage.html should be identical to this page.

$title

Programmatic Execution