Decorative changes
Browse files- templates/about.html +29 -12
- templates/model_detail.html +3 -1
templates/about.html
CHANGED
@@ -65,12 +65,32 @@
|
|
65 |
border-top-right-radius: 10px;
|
66 |
}
|
67 |
.section{
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
68 |
padding-left: 150px;
|
69 |
padding-right: 150px;
|
|
|
|
|
|
|
70 |
text-align: left;
|
|
|
71 |
}
|
|
|
72 |
.citation-section {
|
73 |
-
|
|
|
74 |
text-align: center;
|
75 |
}
|
76 |
.citation-box {
|
@@ -100,7 +120,6 @@
|
|
100 |
font-size: 24px;
|
101 |
font-weight: bold;
|
102 |
text-align: center;
|
103 |
-
margin-top: 40px;
|
104 |
margin-bottom: 40px;
|
105 |
padding: 20px; /* Add padding for more margin around text */
|
106 |
background-color: #610b5d;
|
@@ -109,7 +128,7 @@
|
|
109 |
}
|
110 |
.back-button {
|
111 |
text-align: center;
|
112 |
-
margin-top:
|
113 |
}
|
114 |
.custom-button {
|
115 |
background-color: #610b5d;
|
@@ -135,33 +154,31 @@
|
|
135 |
<div class="section">
|
136 |
<div class="section-title">Motivation</div>
|
137 |
<p>
|
138 |
-
Benchmarks usually compare models with MANY QUESTIONS from A SINGLE MINIMAL CONTEXT
|
139 |
This kind of evaluation is little informative of LLMs' behavior in deployment when exposed to new contexts (especially when we consider the LLMs highly context-dependant nature).
|
140 |
-
We argue that CONTEXT-DEPENDENCE can be seen as a PROPERTY of LLMs
|
141 |
-
We evaluate LLMs by asking the SAME QUESTIONS from MANY DIFFERENT CONTEXTS
|
142 |
</p>
|
143 |
<p>
|
144 |
LLMs are often used to simulate personas and populations.
|
145 |
We study the coherence of simulated populations over different contexts (conversations on different topics).
|
146 |
To do that we leverage the psychological methodology to study the interpersonal stability of personal value expression of those simulated populations.
|
147 |
-
We adopt the Schwartz Theory of Basic Personal Values that defines 10 values: Self-Direction, Stimulation, Hedonism, Achievement, Power, Security, Conformity, Tradition, Benevolence, and Universalism
|
148 |
-
|
149 |
</p>
|
150 |
</div>
|
151 |
<div class="section">
|
152 |
<div class="section-title">Administering a questionnaire in context to a simulated persona</div>
|
153 |
-
<p>To evaluate the stability on a population level we need to be able to evaluate a value profile expressed by a simulated individual in a specific context (conversation topic)
|
154 |
<ol>
|
155 |
<li> The Tested model is instructed to simulate a persona</li>
|
156 |
<li> A separate model instance - The Interlocutor - is instructed to simulate a “human using a chatbot”
|
157 |
<li> A conversation topic is induced by manually setting the first Interlocutor’s message (e.g. Tell me a
|
158 |
joke)
|
159 |
<li> A conversation is simulated
|
160 |
-
<li> A question from the questionnaire is set as the last Interlocutor’s message and The Tested model’s
|
161 |
response is recorded (this is repeated for every item in the questionnaire)
|
162 |
<li> The questionnaire is scored to obtain scores for the 10 personal values
|
163 |
-
<li> The whole process is repeated for each persona with five different conversation topics
|
164 |
-
<li> Rank-Order and Ipsative stability are estimated between pairs of contexts and then averaged
|
165 |
</ol>
|
166 |
<div class="image-container">
|
167 |
<a href="{{ url_for('static', filename='figures/admin_questionnaire.svg') }}" target="_blank">
|
|
|
65 |
border-top-right-radius: 10px;
|
66 |
}
|
67 |
.section{
|
68 |
+
padding-top: 19px;
|
69 |
+
text-align: left;
|
70 |
+
}
|
71 |
+
|
72 |
+
.section p {
|
73 |
+
padding-left: 150px;
|
74 |
+
padding-right: 150px;
|
75 |
+
text-indent: 2em;
|
76 |
+
margin: auto;
|
77 |
+
margin-bottom: 10px;
|
78 |
+
text-align: left;
|
79 |
+
}
|
80 |
+
|
81 |
+
.section ol,ul {
|
82 |
padding-left: 150px;
|
83 |
padding-right: 150px;
|
84 |
+
margin: auto;
|
85 |
+
margin-bottom: 20px;
|
86 |
+
margin-left: 50px;
|
87 |
text-align: left;
|
88 |
+
margin-top: 0px;
|
89 |
}
|
90 |
+
|
91 |
.citation-section {
|
92 |
+
width: 100%;
|
93 |
+
margin-top: 50px;
|
94 |
text-align: center;
|
95 |
}
|
96 |
.citation-box {
|
|
|
120 |
font-size: 24px;
|
121 |
font-weight: bold;
|
122 |
text-align: center;
|
|
|
123 |
margin-bottom: 40px;
|
124 |
padding: 20px; /* Add padding for more margin around text */
|
125 |
background-color: #610b5d;
|
|
|
128 |
}
|
129 |
.back-button {
|
130 |
text-align: center;
|
131 |
+
margin-top: 50px;
|
132 |
}
|
133 |
.custom-button {
|
134 |
background-color: #610b5d;
|
|
|
154 |
<div class="section">
|
155 |
<div class="section-title">Motivation</div>
|
156 |
<p>
|
157 |
+
Benchmarks usually compare models with <b>MANY QUESTIONS</b> from <b>A SINGLE MINIMAL CONTEXT</b>, e.g. as multiple choices questions.
|
158 |
This kind of evaluation is little informative of LLMs' behavior in deployment when exposed to new contexts (especially when we consider the LLMs highly context-dependant nature).
|
159 |
+
We argue that <b>CONTEXT-DEPENDENCE</b> can be seen as a <b>PROPERTY of LLMs</b>: a dimension of LLM comparison alongside others like size, speed, or knowledge.
|
160 |
+
We evaluate LLMs by asking the <b> SAME QUESTIONS </b> from <b> MANY DIFFERENT CONTEXTS </b>.
|
161 |
</p>
|
162 |
<p>
|
163 |
LLMs are often used to simulate personas and populations.
|
164 |
We study the coherence of simulated populations over different contexts (conversations on different topics).
|
165 |
To do that we leverage the psychological methodology to study the interpersonal stability of personal value expression of those simulated populations.
|
166 |
+
We adopt the Schwartz Theory of Basic Personal Values that defines 10 values: Self-Direction, Stimulation, Hedonism, Achievement, Power, Security, Conformity, Tradition, Benevolence, and Universalism,
|
167 |
+
to evaluate their expression we use the associated questionnaires: PVQ-40, and SVS.
|
168 |
</p>
|
169 |
</div>
|
170 |
<div class="section">
|
171 |
<div class="section-title">Administering a questionnaire in context to a simulated persona</div>
|
172 |
+
<p>To evaluate the stability on a population level we need to be able to evaluate a <b>value profile</b> expressed by a <b>simulated individual</b> in a <b>specific context</b> (conversation topic). To do that we use the following procedure:</p>
|
173 |
<ol>
|
174 |
<li> The Tested model is instructed to simulate a persona</li>
|
175 |
<li> A separate model instance - The Interlocutor - is instructed to simulate a “human using a chatbot”
|
176 |
<li> A conversation topic is induced by manually setting the first Interlocutor’s message (e.g. Tell me a
|
177 |
joke)
|
178 |
<li> A conversation is simulated
|
179 |
+
<li> A question from the questionnaire is set as the last Interlocutor’s last message and The Tested model’s
|
180 |
response is recorded (this is repeated for every item in the questionnaire)
|
181 |
<li> The questionnaire is scored to obtain scores for the 10 personal values
|
|
|
|
|
182 |
</ol>
|
183 |
<div class="image-container">
|
184 |
<a href="{{ url_for('static', filename='figures/admin_questionnaire.svg') }}" target="_blank">
|
templates/model_detail.html
CHANGED
@@ -44,10 +44,12 @@
|
|
44 |
margin-bottom: 20px;
|
45 |
}
|
46 |
.image-section p {
|
47 |
-
width: 80%;
|
48 |
margin: auto;
|
|
|
|
|
49 |
margin-bottom: 20px;
|
50 |
text-align: left;
|
|
|
51 |
}
|
52 |
.image-container {
|
53 |
width: 100%;
|
|
|
44 |
margin-bottom: 20px;
|
45 |
}
|
46 |
.image-section p {
|
|
|
47 |
margin: auto;
|
48 |
+
padding-left: 150px;
|
49 |
+
padding-right: 150px;
|
50 |
margin-bottom: 20px;
|
51 |
text-align: left;
|
52 |
+
text-indent: 2em;
|
53 |
}
|
54 |
.image-container {
|
55 |
width: 100%;
|