Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
patrickvonplaten
commited on
Merge branch 'main' of https://huggingface.co/spaces/OpenGenAI/parti-prompts-leaderboard
Browse files
app.py
CHANGED
@@ -23,8 +23,10 @@ MODEL_KEYS = "-".join(SUBMISSIONS.keys())
|
|
23 |
SUBMISSION_ORG = f"results-{MODEL_KEYS}"
|
24 |
|
25 |
submission_names = list(SUBMISSIONS.keys())
|
26 |
-
|
27 |
-
|
|
|
|
|
28 |
|
29 |
|
30 |
def load_submissions():
|
@@ -86,15 +88,26 @@ def get_dataframe_all():
|
|
86 |
|
87 |
TITLE = "# Open Parti Prompts Leaderboard"
|
88 |
DESCRIPTION = """
|
89 |
-
*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
90 |
"""
|
91 |
|
92 |
EXPLANATION = """\n\n
|
93 |
## How the is data collected π \n\n
|
94 |
|
95 |
-
In the [
|
96 |
-
|
97 |
-
|
|
|
|
|
98 |
|
99 |
Currently the leaderboard includes the following models:
|
100 |
- [sd-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5)
|
@@ -102,7 +115,8 @@ Currently the leaderboard includes the following models:
|
|
102 |
- [if-v1-0](https://huggingface.co/DeepFloyd/IF-I-XL-v1.0)
|
103 |
- [karlo](https://huggingface.co/kakaobrain/karlo-v1-alpha) \n\n
|
104 |
|
105 |
-
In the following you can see three result tables. The first shows
|
|
|
106 |
show you a breakdown analysis per category and per type of challenge as defined by [Parti Prompts](https://huggingface.co/datasets/nateraw/parti-prompts).
|
107 |
"""
|
108 |
|
|
|
23 |
SUBMISSION_ORG = f"results-{MODEL_KEYS}"
|
24 |
|
25 |
submission_names = list(SUBMISSIONS.keys())
|
26 |
+
ds = load_dataset("nateraw/parti-prompts")["train"]
|
27 |
+
|
28 |
+
parti_prompt_categories = ds["Category"]
|
29 |
+
parti_prompt_challenge = ds["Challenge"]
|
30 |
|
31 |
|
32 |
def load_submissions():
|
|
|
88 |
|
89 |
TITLE = "# Open Parti Prompts Leaderboard"
|
90 |
DESCRIPTION = """
|
91 |
+
The *Open Parti Prompts Leaderboard* compares state-of-the-art, open-source text-to-image models to each other according to **human preferences**. \n\n
|
92 |
+
Text-to-image models are notoriously difficult to evaluate. [FID](https://en.wikipedia.org/wiki/Fr%C3%A9chet_inception_distance) and
|
93 |
+
[CLIP Score](https://en.wikipedia.org/wiki/Fr%C3%A9chet_inception_distance) are not enough to accurately state whether a text-to-image model can
|
94 |
+
**generate "good" images**. "Good" is extremely difficult to put into numbers. \n\n
|
95 |
+
Instead, the **Open Parti Prompts Leaderboard** uses human feedback from the community to compare images from different text-to-image models to each other.
|
96 |
+
|
97 |
+
\n\n
|
98 |
+
|
99 |
+
β€οΈ ***Please take 3 minutes to contribute to the benchmark.*** \n
|
100 |
+
π ***Play one round of [Open Parti Prompts Game](https://huggingface.co/spaces/OpenGenAI/open-parti-prompts) to contribute 10 answers.*** π€
|
101 |
"""
|
102 |
|
103 |
EXPLANATION = """\n\n
|
104 |
## How the is data collected π \n\n
|
105 |
|
106 |
+
In more detail, the [Open Parti Prompts Game](https://huggingface.co/spaces/OpenGenAI/open-parti-prompts) collects human preferences that state which generated image
|
107 |
+
best fits a given prompt from the [Parti Prompts](https://huggingface.co/datasets/nateraw/parti-prompts) dataset. Parti Prompts has been designed to challenge
|
108 |
+
text-to-image models on prompts of varying categories and difficulty. The images have been pre-generated from the models that are compared in this space.
|
109 |
+
For more information of how the images were created, please refer to [Open Parti Prompts](https://huggingface.co/spaces/OpenGenAI/open-parti-prompts).
|
110 |
+
The community's answers are then stored and used in this space to give a human-preference-based comparison of the different models. \n\n
|
111 |
|
112 |
Currently the leaderboard includes the following models:
|
113 |
- [sd-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5)
|
|
|
115 |
- [if-v1-0](https://huggingface.co/DeepFloyd/IF-I-XL-v1.0)
|
116 |
- [karlo](https://huggingface.co/kakaobrain/karlo-v1-alpha) \n\n
|
117 |
|
118 |
+
In the following you can see three result tables. The first shows the overall comparison of the 4 models. The score states,
|
119 |
+
**the percentage at which images generated from the corresponding model are preferred over the image from all other models**. The second and third tables
|
120 |
show you a breakdown analysis per category and per type of challenge as defined by [Parti Prompts](https://huggingface.co/datasets/nateraw/parti-prompts).
|
121 |
"""
|
122 |
|