Spaces:

sled-umich
/

3D-POPE-leaderboard

Running

App Files Files Community

Madhavan Iyengar commited on Jun 12

Commit

8b5c523

•

1 Parent(s): 94f782b

update about and submit pages

Browse files

Files changed (2) hide show

app.py +2 -2
src/about.py +5 -5

app.py CHANGED Viewed

@@ -282,8 +282,8 @@ with demo:
             with gr.Row():
                     model_name_textbox = gr.Textbox(label="Model name")
-                    model_zip_file = gr.File(label="Upload model ZIP file")
-                    model_link_textbox = gr.Textbox(label="Model link")
             with gr.Row():
                 gr.Column()
                 with gr.Column(scale=2):

             with gr.Row():
                     model_name_textbox = gr.Textbox(label="Model name")
+                    model_zip_file = gr.File(label="Upload model prediction result ZIP file")
+                    model_link_textbox = gr.Textbox(label="Link to model page")
             with gr.Row():
                 gr.Column()
                 with gr.Column(scale=2):

src/about.py CHANGED Viewed

@@ -39,7 +39,7 @@ NUM_FEWSHOT = 0 # Change with your few shot
 # Your leaderboard name
-TITLE = """<h1 align="center" id="space-title">3D-POPE Leaderboard</h1>
 <p><center>
 <a href="https://3d-grand.github.io/" target="_blank">[Project Page]</a>
 <a href="https://www.dropbox.com/scl/fo/5p9nb4kalnz407sbqgemg/AG1KcxeIS_SUoJ1hoLPzv84?rlkey=weunabtbiz17jitfv3f4jpmm1&dl=0" target="_blank">[3D-GRAND Data]</a>
@@ -49,7 +49,7 @@ TITLE = """<h1 align="center" id="space-title">3D-POPE Leaderboard</h1>
 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
-#### This is the official leaderboard for the 3D Polling-based Object Probing Evaluation (3D-POPE) benchmark.
 """
 # Which evaluations are you running? how can people reproduce what you have?
@@ -58,13 +58,13 @@ LLM_BENCHMARKS_TEXT = f"""
 ### To systematically evaluate the hallucination behavior of 3D-LLMs, we introduce the 3D Polling-based Object Probing Evaluation (3D-POPE) benchmark. 3D-POPE is designed to assess a model's ability to accurately identify the presence or absence of objects in a given 3D scene.
 ## Dataset
-To facilitate the 3D-POPE benchmark, we curate a dedicated dataset from the ScanNet dataset, utilizing the semantic classes from ScanNet200. Specifically, we use the ScanNet validation set as the foundation for evaluating 3D-LLMs on the 3D-POPE benchmark.
-Benchmark design. 3D-POPE consists of a set of triples, each comprising a 3D scene, a posed question, and a binary answer (“Yes” or “No”) indicating the presence or absence of an object (Fig. 1 middle). To ensure a balanced dataset, we maintain a 1:1 ratio of existent to nonexistent objects when constructing these triples. For the selection of negative samples (i.e., nonexistent objects), we employ three distinct sampling strategies:
 • Random Sampling: Nonexistent objects are randomly selected from the set of objects not present in the 3D scene.\n
 • Popular Sampling: We select the top-k most frequent objects not present in the 3D scene, where k equals the number of objects currently in the scene.\n
-• Adversarial Sampling: For each positively identified object in the scene, we rank objects that are not present and have not been used as negative samples based on their frequency of co-occurrence with the positive object in the training dataset. The highest-ranking co-occurring object is then selected as the adversarial sample. This approach differs from the original POPE [41] to avoid adversarial samples mirroring popular samples, as indoor scenes often contain similar objects.\n
 These sampling strategies are designed to challenge the model's robustness and assess its susceptibility to different levels of object hallucination.
 ## Metrics

 # Your leaderboard name
+TITLE = """<h1 align="center" id="space-title">🏠💬 3D-POPE Leaderboard 🏅</h1>
 <p><center>
 <a href="https://3d-grand.github.io/" target="_blank">[Project Page]</a>
 <a href="https://www.dropbox.com/scl/fo/5p9nb4kalnz407sbqgemg/AG1KcxeIS_SUoJ1hoLPzv84?rlkey=weunabtbiz17jitfv3f4jpmm1&dl=0" target="_blank">[3D-GRAND Data]</a>
 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
+#### This is the official leaderboard for the 3D Polling-based Object Probing Evaluation (3D-POPE) benchmark. 3D-POPE is a benchmark to evaluate object hallucination in 3D LLMs from the work [3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination](https://3d-grand.github.io/).
 """
 # Which evaluations are you running? how can people reproduce what you have?
 ### To systematically evaluate the hallucination behavior of 3D-LLMs, we introduce the 3D Polling-based Object Probing Evaluation (3D-POPE) benchmark. 3D-POPE is designed to assess a model's ability to accurately identify the presence or absence of objects in a given 3D scene.
 ## Dataset
+To facilitate the 3D-POPE benchmark, we curate a dedicated dataset from the [ScanNet](https://arxiv.org/abs/1702.04405) dataset, utilizing the semantic classes from [ScanNet200](https://arxiv.org/abs/2204.07761). Specifically, we use the ScanNet validation set as the foundation for evaluating 3D-LLMs on the 3D-POPE benchmark.
+Benchmark design. 3D-POPE consists of a set of triples, each comprising a 3D scene, a posed question, and a binary answer (“Yes” or “No”) indicating the presence or absence of an object. To ensure a balanced dataset, we maintain a 1:1 ratio of existent to nonexistent objects when constructing these triples. For the selection of negative samples (i.e., nonexistent objects), we employ three distinct sampling strategies:
 • Random Sampling: Nonexistent objects are randomly selected from the set of objects not present in the 3D scene.\n
 • Popular Sampling: We select the top-k most frequent objects not present in the 3D scene, where k equals the number of objects currently in the scene.\n
+• Adversarial Sampling: For each positively identified object in the scene, we rank objects that are not present and have not been used as negative samples based on their frequency of co-occurrence with the positive object in the training dataset. The highest-ranking co-occurring object is then selected as the adversarial sample. This approach differs from the original [POPE](https://arxiv.org/abs/2305.10355) to avoid adversarial samples mirroring popular samples, as indoor scenes often contain similar objects.\n
 These sampling strategies are designed to challenge the model's robustness and assess its susceptibility to different levels of object hallucination.
 ## Metrics