ahnyeonchan2's picture
update readme
4944334

A newer version of the Gradio SDK is available: 5.5.0

Upgrade
metadata
title: Mean Reciprocal Rank
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.0.2
app_file: app.py
pinned: false
tags:
  - evaluate
  - metric
description: >-
  Mean Reciprocal Rank is a statistic measure for evaluating any process that
  produces a list of possible responses to a sample of queries, ordered by
  probability of correctness.

Metric Card for Mean Reciprocal Rank

a statistic measure for evaluating any process that produces a list of possible responses to a sample of queries, ordered by probability of correctness.

Metric Description

The reciprocal rank of a query response is the multiplicative inverse of the rank of the first correct answer: 1 for first place, 1⁄2 for second place, 1⁄3 for third place and so on. The mean reciprocal rank is the average of the reciprocal ranks of results for a sample of queries Q

{\text{MRR}}={\frac {1}{|Q|}}\sum {{i=1}}^{{|Q|}}{\frac {1}{{\text{rank}}{i}}}.!

How to Use

Provide a list of gold ranks, where each item is rank of gold item of which the first rank starts with zero.

Inputs

List all input arguments in the format below

  • input_field *(List[int]): a list of integer where each integer is the rank of gold item

Output Values

Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}

State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."

Values from Popular Papers

Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.

Examples

Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.

Limitations and Bias

Note any known limitations or biases that the metric has, with links and references if possible.

Citation

Cite the source where this metric was introduced.

Further References

Add any useful further references.