DontPlanToEnd
commited on
Commit
•
e61cdd9
1
Parent(s):
347b17b
Update app.py
Browse files
app.py
CHANGED
@@ -218,7 +218,7 @@ with GraInter:
|
|
218 |
<h2 style="margin-bottom: 0; font-size: 1.8em;">About</h2>
|
219 |
<strong>UGI:</strong> Uncensored General Intelligence. A measurement of the amount of uncensored/controversial information an LLM knows and is willing to tell the user. It is calculated from the average score of 5 subjects LLMs commonly refuse to talk about. The leaderboard is made of roughly 65 questions/tasks, measuring both willingness to answer and accuracy in fact-based controversial questions. I'm choosing to keep the questions private so people can't train on them and devalue the leaderboard.
|
220 |
|
221 |
-
**W/10:** Willingness/10. A more narrow subset of the UGI questions,
|
222 |
<br>
|
223 |
**I/10:** Intelligence/10. A 10-point score made up of the UGI questions with the highest correlation with parameter size. This metric shows how much a model's knowledge and reasoning play a role in its UGI score.
|
224 |
<br><br>
|
|
|
218 |
<h2 style="margin-bottom: 0; font-size: 1.8em;">About</h2>
|
219 |
<strong>UGI:</strong> Uncensored General Intelligence. A measurement of the amount of uncensored/controversial information an LLM knows and is willing to tell the user. It is calculated from the average score of 5 subjects LLMs commonly refuse to talk about. The leaderboard is made of roughly 65 questions/tasks, measuring both willingness to answer and accuracy in fact-based controversial questions. I'm choosing to keep the questions private so people can't train on them and devalue the leaderboard.
|
220 |
|
221 |
+
**W/10:** Willingness/10. A more narrow subset of the UGI questions, creating a 10-point score which measures how far the model can be pushed before going against its instructions, refusing to answer, or adding an ethical disclaimer to its response.
|
222 |
<br>
|
223 |
**I/10:** Intelligence/10. A 10-point score made up of the UGI questions with the highest correlation with parameter size. This metric shows how much a model's knowledge and reasoning play a role in its UGI score.
|
224 |
<br><br>
|