Spaces:

opencompass
/

open_vlm_leaderboard

Running on CPU Upgrade

KennyUTC commited on 4 days ago

Commit

c48f969

•

1 Parent(s): e77f84c

update leaderboard

Files changed (2) hide show

gen_table.py CHANGED Viewed

@@ -128,6 +128,8 @@ def BUILD_L2_DF(results, dataset):
         df = df.sort_values('Final Score')
     elif dataset == 'COCO_VAL':
         df = df.sort_values('CIDEr')
     else:
         df = df.sort_values('Overall')
     df = df.iloc[::-1]

         df = df.sort_values('Final Score')
     elif dataset == 'COCO_VAL':
         df = df.sort_values('CIDEr')
+    elif dataset == 'VCR':
+        df = df.sort_values('Overall-Jaccard')
     else:
         df = df.sort_values('Overall')
     df = df.iloc[::-1]

meta_data.py CHANGED Viewed

@@ -227,3 +227,12 @@ LEADERBOARD_MD['BLINK'] = """
 - BLINK is a benchmark containing 14 visual perception tasks that can be solved by humans “within a blink”, but pose significant challenges for current multimodal large language models (LLMs).
 - We evaluate BLINK on the test set of the benchmark, which contains 1901 visual questions in multi-choice format.
 """

 - BLINK is a benchmark containing 14 visual perception tasks that can be solved by humans “within a blink”, but pose significant challenges for current multimodal large language models (LLMs).
 - We evaluate BLINK on the test set of the benchmark, which contains 1901 visual questions in multi-choice format.
 """
+LEADERBOARD_MD['VCR'] = """
+## VCR Evaluation Results
+- VCR challenges models to restore partially obscured text within images, leveraging pixel-level hints and contextual cues. Unlike traditional text-based tasks, VCR necessitates a synergistic understanding of visual image (VI), string text (ST), and text embedded in image (TEI). Our dataset is crafted using a pipeline that generates synthetic images from image-caption pairs with adjustable caption visibility, allowing for varied difficulty levels.
+- We report the Jaccard / Exact Match score for VCR, evaluated on the 500 samples subsets of each track in VCR with VLMEvalKit.
+- The evaluation results are officially provided by the VCR authors, thanks Tianyu Zhang for his help.
+"""