Madhavan Iyengar commited on
Commit
8b5c523
1 Parent(s): 94f782b

update about and submit pages

Browse files
Files changed (2) hide show
  1. app.py +2 -2
  2. src/about.py +5 -5
app.py CHANGED
@@ -282,8 +282,8 @@ with demo:
282
 
283
  with gr.Row():
284
  model_name_textbox = gr.Textbox(label="Model name")
285
- model_zip_file = gr.File(label="Upload model ZIP file")
286
- model_link_textbox = gr.Textbox(label="Model link")
287
  with gr.Row():
288
  gr.Column()
289
  with gr.Column(scale=2):
 
282
 
283
  with gr.Row():
284
  model_name_textbox = gr.Textbox(label="Model name")
285
+ model_zip_file = gr.File(label="Upload model prediction result ZIP file")
286
+ model_link_textbox = gr.Textbox(label="Link to model page")
287
  with gr.Row():
288
  gr.Column()
289
  with gr.Column(scale=2):
src/about.py CHANGED
@@ -39,7 +39,7 @@ NUM_FEWSHOT = 0 # Change with your few shot
39
 
40
 
41
  # Your leaderboard name
42
- TITLE = """<h1 align="center" id="space-title">3D-POPE Leaderboard</h1>
43
  <p><center>
44
  <a href="https://3d-grand.github.io/" target="_blank">[Project Page]</a>
45
  <a href="https://www.dropbox.com/scl/fo/5p9nb4kalnz407sbqgemg/AG1KcxeIS_SUoJ1hoLPzv84?rlkey=weunabtbiz17jitfv3f4jpmm1&dl=0" target="_blank">[3D-GRAND Data]</a>
@@ -49,7 +49,7 @@ TITLE = """<h1 align="center" id="space-title">3D-POPE Leaderboard</h1>
49
 
50
  # What does your leaderboard evaluate?
51
  INTRODUCTION_TEXT = """
52
- #### This is the official leaderboard for the 3D Polling-based Object Probing Evaluation (3D-POPE) benchmark.
53
  """
54
 
55
  # Which evaluations are you running? how can people reproduce what you have?
@@ -58,13 +58,13 @@ LLM_BENCHMARKS_TEXT = f"""
58
  ### To systematically evaluate the hallucination behavior of 3D-LLMs, we introduce the 3D Polling-based Object Probing Evaluation (3D-POPE) benchmark. 3D-POPE is designed to assess a model's ability to accurately identify the presence or absence of objects in a given 3D scene.
59
 
60
  ## Dataset
61
- To facilitate the 3D-POPE benchmark, we curate a dedicated dataset from the ScanNet dataset, utilizing the semantic classes from ScanNet200. Specifically, we use the ScanNet validation set as the foundation for evaluating 3D-LLMs on the 3D-POPE benchmark.
62
 
63
- Benchmark design. 3D-POPE consists of a set of triples, each comprising a 3D scene, a posed question, and a binary answer (“Yes” or “No”) indicating the presence or absence of an object (Fig. 1 middle). To ensure a balanced dataset, we maintain a 1:1 ratio of existent to nonexistent objects when constructing these triples. For the selection of negative samples (i.e., nonexistent objects), we employ three distinct sampling strategies:
64
 
65
  • Random Sampling: Nonexistent objects are randomly selected from the set of objects not present in the 3D scene.\n
66
  • Popular Sampling: We select the top-k most frequent objects not present in the 3D scene, where k equals the number of objects currently in the scene.\n
67
- • Adversarial Sampling: For each positively identified object in the scene, we rank objects that are not present and have not been used as negative samples based on their frequency of co-occurrence with the positive object in the training dataset. The highest-ranking co-occurring object is then selected as the adversarial sample. This approach differs from the original POPE [41] to avoid adversarial samples mirroring popular samples, as indoor scenes often contain similar objects.\n
68
  These sampling strategies are designed to challenge the model's robustness and assess its susceptibility to different levels of object hallucination.
69
 
70
  ## Metrics
 
39
 
40
 
41
  # Your leaderboard name
42
+ TITLE = """<h1 align="center" id="space-title">🏠💬 3D-POPE Leaderboard 🏅</h1>
43
  <p><center>
44
  <a href="https://3d-grand.github.io/" target="_blank">[Project Page]</a>
45
  <a href="https://www.dropbox.com/scl/fo/5p9nb4kalnz407sbqgemg/AG1KcxeIS_SUoJ1hoLPzv84?rlkey=weunabtbiz17jitfv3f4jpmm1&dl=0" target="_blank">[3D-GRAND Data]</a>
 
49
 
50
  # What does your leaderboard evaluate?
51
  INTRODUCTION_TEXT = """
52
+ #### This is the official leaderboard for the 3D Polling-based Object Probing Evaluation (3D-POPE) benchmark. 3D-POPE is a benchmark to evaluate object hallucination in 3D LLMs from the work [3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination](https://3d-grand.github.io/).
53
  """
54
 
55
  # Which evaluations are you running? how can people reproduce what you have?
 
58
  ### To systematically evaluate the hallucination behavior of 3D-LLMs, we introduce the 3D Polling-based Object Probing Evaluation (3D-POPE) benchmark. 3D-POPE is designed to assess a model's ability to accurately identify the presence or absence of objects in a given 3D scene.
59
 
60
  ## Dataset
61
+ To facilitate the 3D-POPE benchmark, we curate a dedicated dataset from the [ScanNet](https://arxiv.org/abs/1702.04405) dataset, utilizing the semantic classes from [ScanNet200](https://arxiv.org/abs/2204.07761). Specifically, we use the ScanNet validation set as the foundation for evaluating 3D-LLMs on the 3D-POPE benchmark.
62
 
63
+ Benchmark design. 3D-POPE consists of a set of triples, each comprising a 3D scene, a posed question, and a binary answer (“Yes” or “No”) indicating the presence or absence of an object. To ensure a balanced dataset, we maintain a 1:1 ratio of existent to nonexistent objects when constructing these triples. For the selection of negative samples (i.e., nonexistent objects), we employ three distinct sampling strategies:
64
 
65
  • Random Sampling: Nonexistent objects are randomly selected from the set of objects not present in the 3D scene.\n
66
  • Popular Sampling: We select the top-k most frequent objects not present in the 3D scene, where k equals the number of objects currently in the scene.\n
67
+ • Adversarial Sampling: For each positively identified object in the scene, we rank objects that are not present and have not been used as negative samples based on their frequency of co-occurrence with the positive object in the training dataset. The highest-ranking co-occurring object is then selected as the adversarial sample. This approach differs from the original [POPE](https://arxiv.org/abs/2305.10355) to avoid adversarial samples mirroring popular samples, as indoor scenes often contain similar objects.\n
68
  These sampling strategies are designed to challenge the model's robustness and assess its susceptibility to different levels of object hallucination.
69
 
70
  ## Metrics