Spaces:

esiga
/

simp_demo

Runtime error

App Files Files Community

Avijit Ghosh commited on Apr 9

Commit

5bb95f3

•

1 Parent(s): e6e82b8

change type to level

Browse files

Files changed (13) hide show

app.py +9 -9
configs/crowspairs.yaml +1 -1
configs/homoglyphbias.yaml +1 -1
configs/honest.yaml +1 -1
configs/ieat.yaml +1 -1
configs/imagedataleak.yaml +1 -1
configs/notmyvoice.yaml +1 -1
configs/palms.yaml +1 -1
configs/stablebias.yaml +1 -1
configs/stereoset.yaml +1 -1
configs/tango.yaml +1 -1
configs/videodiversemisinfo.yaml +1 -1
configs/weat.yaml +1 -1

app.py CHANGED Viewed

@@ -29,19 +29,19 @@ globaldf['Link'] = '<u>'+globaldf['Link']+'</u>'
 modality_order = ["Text", "Image", "Audio", "Video"]
 type_order = ["Model", "Dataset", "Output", "Taxonomy"]
-# Convert Modality and Type columns to categorical with specified order
 globaldf['Modality'] = pd.Categorical(globaldf['Modality'], categories=modality_order, ordered=True)
-globaldf['Type'] = pd.Categorical(globaldf['Type'], categories=type_order, ordered=True)
-# Sort DataFrame by Modality and Type
-globaldf.sort_values(by=['Modality', 'Type'], inplace=True)
 # create a gradio page with tabs and accordions
 # Path: taxonomy.py
 def filter_modality_type(fulltable, modality_filter, type_filter):
-    filteredtable = fulltable[fulltable['Modality'].isin(modality_filter) & fulltable['Type'].isin(type_filter)]
     return filteredtable
 def showmodal(evt: gr.SelectData):
@@ -100,7 +100,7 @@ The following categories are high-level, non-exhaustive, and present a synthesis
     with gr.Tabs(elem_classes="tab-buttons") as tabs1:
         with gr.TabItem("Bias/Stereotypes"):
             fulltable = globaldf[globaldf['Group'] == 'BiasEvals']
-            fulltable = fulltable[['Modality','Type', 'Suggested Evaluation', 'What it is evaluating', 'Considerations', 'Link']]
             gr.Markdown("""
             Generative AI systems can perpetuate harmful biases from various sources, including systemic, human, and statistical biases. These biases, also known as "fairness" considerations, can manifest in the final system due to choices made throughout the development process. They include harmful associations and stereotypes related to protected classes, such as race, gender, and sexuality. Evaluating biases involves assessing correlations, co-occurrences, sentiment, and toxicity across different modalities, both within the model itself and in the outputs of downstream tasks.
@@ -114,7 +114,7 @@ The following categories are high-level, non-exhaustive, and present a synthesis
                                                  )
                 type_filter = gr.CheckboxGroup(["Model", "Dataset", "Output", "Taxonomy"],
                                                  value=["Model", "Dataset", "Output", "Taxonomy"],
-                                                 label="Type",
                                                  show_label=True,
                                                 #  info="Which modality to show."
                                                  )
@@ -138,7 +138,7 @@ The following categories are high-level, non-exhaustive, and present a synthesis
         with gr.TabItem("Cultural Values/Sensitive Content"):
             fulltable = globaldf[globaldf['Group'] == 'CulturalEvals']
-            fulltable = fulltable[['Modality','Type', 'Suggested Evaluation', 'What it is evaluating', 'Considerations', 'Link']]
             gr.Markdown("""Cultural values are specific to groups and sensitive content is normative. Sensitive topics also vary by culture and can include hate speech. What is considered a sensitive topic, such as egregious violence or adult sexual content, can vary widely by viewpoint. Due to norms differing by culture, region, and language, there is no standard for what constitutes sensitive content.
                         Distinct cultural values present a challenge for deploying models into a global sphere, as what may be appropriate in one culture may be unsafe in others. Generative AI systems cannot be neutral or objective, nor can they encompass truly universal values. There is no “view from nowhere”; in quantifying anything, a particular frame of reference is imposed.
@@ -152,7 +152,7 @@ The following categories are high-level, non-exhaustive, and present a synthesis
                                                  )
                 type_filter = gr.CheckboxGroup(["Model", "Dataset", "Output", "Taxonomy"],
                                                  value=["Model", "Dataset", "Output", "Taxonomy"],
-                                                 label="Type",
                                                  show_label=True,
                                                 #  info="Which modality to show."
                                                  )

 modality_order = ["Text", "Image", "Audio", "Video"]
 type_order = ["Model", "Dataset", "Output", "Taxonomy"]
+# Convert Modality and Level columns to categorical with specified order
 globaldf['Modality'] = pd.Categorical(globaldf['Modality'], categories=modality_order, ordered=True)
+globaldf['Level'] = pd.Categorical(globaldf['Level'], categories=type_order, ordered=True)
+# Sort DataFrame by Modality and Level
+globaldf.sort_values(by=['Modality', 'Level'], inplace=True)
 # create a gradio page with tabs and accordions
 # Path: taxonomy.py
 def filter_modality_type(fulltable, modality_filter, type_filter):
+    filteredtable = fulltable[fulltable['Modality'].isin(modality_filter) & fulltable['Level'].isin(type_filter)]
     return filteredtable
 def showmodal(evt: gr.SelectData):
     with gr.Tabs(elem_classes="tab-buttons") as tabs1:
         with gr.TabItem("Bias/Stereotypes"):
             fulltable = globaldf[globaldf['Group'] == 'BiasEvals']
+            fulltable = fulltable[['Modality','Level', 'Suggested Evaluation', 'What it is evaluating', 'Considerations', 'Link']]
             gr.Markdown("""
             Generative AI systems can perpetuate harmful biases from various sources, including systemic, human, and statistical biases. These biases, also known as "fairness" considerations, can manifest in the final system due to choices made throughout the development process. They include harmful associations and stereotypes related to protected classes, such as race, gender, and sexuality. Evaluating biases involves assessing correlations, co-occurrences, sentiment, and toxicity across different modalities, both within the model itself and in the outputs of downstream tasks.
                                                  )
                 type_filter = gr.CheckboxGroup(["Model", "Dataset", "Output", "Taxonomy"],
                                                  value=["Model", "Dataset", "Output", "Taxonomy"],
+                                                 label="Level",
                                                  show_label=True,
                                                 #  info="Which modality to show."
                                                  )
         with gr.TabItem("Cultural Values/Sensitive Content"):
             fulltable = globaldf[globaldf['Group'] == 'CulturalEvals']
+            fulltable = fulltable[['Modality','Level', 'Suggested Evaluation', 'What it is evaluating', 'Considerations', 'Link']]
             gr.Markdown("""Cultural values are specific to groups and sensitive content is normative. Sensitive topics also vary by culture and can include hate speech. What is considered a sensitive topic, such as egregious violence or adult sexual content, can vary widely by viewpoint. Due to norms differing by culture, region, and language, there is no standard for what constitutes sensitive content.
                         Distinct cultural values present a challenge for deploying models into a global sphere, as what may be appropriate in one culture may be unsafe in others. Generative AI systems cannot be neutral or objective, nor can they encompass truly universal values. There is no “view from nowhere”; in quantifying anything, a particular frame of reference is imposed.
                                                  )
                 type_filter = gr.CheckboxGroup(["Model", "Dataset", "Output", "Taxonomy"],
                                                  value=["Model", "Dataset", "Output", "Taxonomy"],
+                                                 label="Level",
                                                  show_label=True,
                                                 #  info="Which modality to show."
                                                  )

configs/crowspairs.yaml CHANGED Viewed

@@ -14,6 +14,6 @@ Screenshots:
 - Images/CrowsPairs1.png
 - Images/CrowsPairs2.png
 Suggested Evaluation: Crow-S Pairs
-Type: Dataset
 URL: https://arxiv.org/abs/2010.00133
 What it is evaluating: Protected class stereotypes

 - Images/CrowsPairs1.png
 - Images/CrowsPairs2.png
 Suggested Evaluation: Crow-S Pairs
+Level: Dataset
 URL: https://arxiv.org/abs/2010.00133
 What it is evaluating: Protected class stereotypes

configs/homoglyphbias.yaml CHANGED Viewed

@@ -9,7 +9,7 @@ Link: Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis
 Modality: Image
 Screenshots: []
 Suggested Evaluation: Effect of different scripts on text-to-image generation
-Type: Output
 URL: https://arxiv.org/pdf/2209.08891.pdf
 What it is evaluating: It evaluates generated images for cultural stereotypes, when
   using different scripts (homoglyphs). It somewhat measures the suceptibility of

 Modality: Image
 Screenshots: []
 Suggested Evaluation: Effect of different scripts on text-to-image generation
+Level: Output
 URL: https://arxiv.org/pdf/2209.08891.pdf
 What it is evaluating: It evaluates generated images for cultural stereotypes, when
   using different scripts (homoglyphs). It somewhat measures the suceptibility of

configs/honest.yaml CHANGED Viewed

@@ -11,6 +11,6 @@ Link: 'HONEST: Measuring Hurtful Sentence Completion in Language Models'
 Modality: Text
 Screenshots: []
 Suggested Evaluation: 'HONEST: Measuring Hurtful Sentence Completion in Language Models'
-Type: Output
 URL: https://aclanthology.org/2021.naacl-main.191.pdf
 What it is evaluating: Protected class stereotypes and hurtful language

 Modality: Text
 Screenshots: []
 Suggested Evaluation: 'HONEST: Measuring Hurtful Sentence Completion in Language Models'
+Level: Output
 URL: https://aclanthology.org/2021.naacl-main.191.pdf
 What it is evaluating: Protected class stereotypes and hurtful language

configs/ieat.yaml CHANGED Viewed

@@ -12,6 +12,6 @@ Link: Image Representations Learned With Unsupervised Pre-Training Contain Human
 Modality: Image
 Screenshots: []
 Suggested Evaluation: Image Embedding Association Test (iEAT)
-Type: Model
 URL: https://dl.acm.org/doi/abs/10.1145/3442188.3445932
 What it is evaluating: Embedding associations

 Modality: Image
 Screenshots: []
 Suggested Evaluation: Image Embedding Association Test (iEAT)
+Level: Model
 URL: https://dl.acm.org/doi/abs/10.1145/3442188.3445932
 What it is evaluating: Embedding associations

configs/imagedataleak.yaml CHANGED Viewed

@@ -10,6 +10,6 @@ Link: 'Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias i
 Modality: Image
 Screenshots: []
 Suggested Evaluation: Dataset leakage and model leakage
-Type: Dataset
 URL: https://arxiv.org/abs/1811.08489
 What it is evaluating: Gender and label bias

 Modality: Image
 Screenshots: []
 Suggested Evaluation: Dataset leakage and model leakage
+Level: Dataset
 URL: https://arxiv.org/abs/1811.08489
 What it is evaluating: Gender and label bias

configs/notmyvoice.yaml CHANGED Viewed

@@ -11,6 +11,6 @@ Modality: Audio
 Screenshots: []
 Suggested Evaluation: Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech
   Generators
-Type: Taxonomy
 URL: https://arxiv.org/pdf/2402.01708.pdf
 What it is evaluating: Lists harms of audio/speech generators

 Screenshots: []
 Suggested Evaluation: Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech
   Generators
+Level: Taxonomy
 URL: https://arxiv.org/pdf/2402.01708.pdf
 What it is evaluating: Lists harms of audio/speech generators

configs/palms.yaml CHANGED Viewed

@@ -9,6 +9,6 @@ Link: 'Process for Adapting Language Models to Society (PALMS) with Values-Targe
 Modality: Text
 Screenshots: .nan
 Suggested Evaluation: Human and Toxicity Evals of Cultural Value Categories
-Type: Output
 URL: http://arxiv.org/abs/2106.10328
 What it is evaluating: Adherence to defined norms for a set of cultural categories

 Modality: Text
 Screenshots: .nan
 Suggested Evaluation: Human and Toxicity Evals of Cultural Value Categories
+Level: Output
 URL: http://arxiv.org/abs/2106.10328
 What it is evaluating: Adherence to defined norms for a set of cultural categories

configs/stablebias.yaml CHANGED Viewed

@@ -9,6 +9,6 @@ Link: 'Stable bias: Analyzing societal representations in diffusion models'
 Modality: Image
 Screenshots: []
 Suggested Evaluation: Characterizing the variation in generated images
-Type: Output
 URL: https://arxiv.org/abs/2303.11408
 What it is evaluating: .nan

 Modality: Image
 Screenshots: []
 Suggested Evaluation: Characterizing the variation in generated images
+Level: Output
 URL: https://arxiv.org/abs/2303.11408
 What it is evaluating: .nan

configs/stereoset.yaml CHANGED Viewed

@@ -11,6 +11,6 @@ Link: 'StereoSet: Measuring stereotypical bias in pretrained language models'
 Modality: Text
 Screenshots: []
 Suggested Evaluation: StereoSet
-Type: Dataset
 URL: https://arxiv.org/abs/2004.09456
 What it is evaluating: Protected class stereotypes

 Modality: Text
 Screenshots: []
 Suggested Evaluation: StereoSet
+Level: Dataset
 URL: https://arxiv.org/abs/2004.09456
 What it is evaluating: Protected class stereotypes

configs/tango.yaml CHANGED Viewed

@@ -14,6 +14,6 @@ Screenshots:
 - Images/TANGO1.png
 - Images/TANGO2.png
 Suggested Evaluation: Human and Toxicity Evals of Cultural Value Categories
-Type: Output
 URL: http://arxiv.org/abs/2106.10328
 What it is evaluating: Bias measurement for trans and nonbinary community via measuring gender non-affirmative language, specifically 1) misgendering 2), negative responses to gender disclosure

 - Images/TANGO1.png
 - Images/TANGO2.png
 Suggested Evaluation: Human and Toxicity Evals of Cultural Value Categories
+Level: Output
 URL: http://arxiv.org/abs/2106.10328
 What it is evaluating: Bias measurement for trans and nonbinary community via measuring gender non-affirmative language, specifically 1) misgendering 2), negative responses to gender disclosure

configs/videodiversemisinfo.yaml CHANGED Viewed

@@ -13,7 +13,7 @@ Modality: Video
 Screenshots: []
 Suggested Evaluation: 'Diverse Misinformation: Impacts of Human Biases on Detection
   of Deepfakes on Networks'
-Type: Output
 URL: https://arxiv.org/abs/2210.10026
 What it is evaluating: Human led evaluations of deepfakes to understand susceptibility
   and representational harms (including political violence)

 Screenshots: []
 Suggested Evaluation: 'Diverse Misinformation: Impacts of Human Biases on Detection
   of Deepfakes on Networks'
+Level: Output
 URL: https://arxiv.org/abs/2210.10026
 What it is evaluating: Human led evaluations of deepfakes to understand susceptibility
   and representational harms (including political violence)

configs/weat.yaml CHANGED Viewed

@@ -36,7 +36,7 @@ Screenshots:
 - Images/WEAT1.png
 - Images/WEAT2.png
 Suggested Evaluation: Word Embedding Association Test (WEAT)
-Type: Model
 URL: https://researchportal.bath.ac.uk/en/publications/semantics-derived-automatically-from-language-corpora-necessarily
 What it is evaluating: Associations and word embeddings based on Implicit Associations
   Test (IAT)

 - Images/WEAT1.png
 - Images/WEAT2.png
 Suggested Evaluation: Word Embedding Association Test (WEAT)
+Level: Model
 URL: https://researchportal.bath.ac.uk/en/publications/semantics-derived-automatically-from-language-corpora-necessarily
 What it is evaluating: Associations and word embeddings based on Implicit Associations
   Test (IAT)