Decorative changes: uppercase to lowercase for some paragraphs
Browse files- templates/about.html +3 -3
- templates/index.html +3 -3
templates/about.html
CHANGED
@@ -182,10 +182,10 @@
|
|
182 |
<div class="section">
|
183 |
<div class="section-title">Motivation</div>
|
184 |
<p>
|
185 |
-
Benchmarks usually compare models with <b>
|
186 |
This kind of evaluation is little informative of LLMs' behavior in deployment when exposed to new contexts (especially when we consider the LLMs highly context-dependant nature).
|
187 |
-
We argue that <b>
|
188 |
-
We evaluate LLMs by asking the <b>
|
189 |
</p>
|
190 |
<p>
|
191 |
LLMs are often used to simulate personas and populations.
|
|
|
182 |
<div class="section">
|
183 |
<div class="section-title">Motivation</div>
|
184 |
<p>
|
185 |
+
Benchmarks usually compare models with <b>many questions</b> from <b>a single minimal context</b>, e.g. as multiple choices questions.
|
186 |
This kind of evaluation is little informative of LLMs' behavior in deployment when exposed to new contexts (especially when we consider the LLMs highly context-dependant nature).
|
187 |
+
We argue that <b>context-dependence</b> can be seen as a <b>property of LLMs</b>: a dimension of LLM comparison alongside others like size, speed, or knowledge.
|
188 |
+
We evaluate LLMs by asking the <b> same questions </b> from <b> many different contexts </b>.
|
189 |
</p>
|
190 |
<p>
|
191 |
LLMs are often used to simulate personas and populations.
|
templates/index.html
CHANGED
@@ -203,8 +203,8 @@
|
|
203 |
As proposed in our <a href="https://arxiv.org/abs/2402.14846">paper</a>,
|
204 |
unwanted context-dependence should be seen as a <b>property of LLMs</b> - a dimension of LLM comparison (alongside others such as model size speed or expressed knowledge).
|
205 |
This leaderboard aims to provide such a comparison and extends our paper with a more focused and elaborate experimental setup.
|
206 |
-
Standard benchmarks present <b>
|
207 |
-
we present <b>
|
208 |
</p>
|
209 |
<div class="table-responsive main-table">
|
210 |
<!-- Render the table HTML here -->
|
@@ -282,7 +282,7 @@
|
|
282 |
<li>Contact: <a href="mailto: [email protected]">[email protected]</a></li>
|
283 |
<li>See the <a href="https://sites.google.com/view/llmvaluestability">Project website<a/></li>
|
284 |
<li>See the Flowers team <a href="http://developmentalsystems.org">blog</a> and <a href="https://flowers.inria.fr/">website</a></li>
|
285 |
-
<li>See Grgur's website and other projects: <a href="https://grgkovac.github.io
|
286 |
</ul>
|
287 |
</div>
|
288 |
|
|
|
203 |
As proposed in our <a href="https://arxiv.org/abs/2402.14846">paper</a>,
|
204 |
unwanted context-dependence should be seen as a <b>property of LLMs</b> - a dimension of LLM comparison (alongside others such as model size speed or expressed knowledge).
|
205 |
This leaderboard aims to provide such a comparison and extends our paper with a more focused and elaborate experimental setup.
|
206 |
+
Standard benchmarks present <b>many</b> questions from the <b>same minimal contexts</b> (e.g. multiple choice questions),
|
207 |
+
we present <b>same</b> questions from <b>many different contexts</b>.
|
208 |
</p>
|
209 |
<div class="table-responsive main-table">
|
210 |
<!-- Render the table HTML here -->
|
|
|
282 |
<li>Contact: <a href="mailto: [email protected]">[email protected]</a></li>
|
283 |
<li>See the <a href="https://sites.google.com/view/llmvaluestability">Project website<a/></li>
|
284 |
<li>See the Flowers team <a href="http://developmentalsystems.org">blog</a> and <a href="https://flowers.inria.fr/">website</a></li>
|
285 |
+
<li>See Grgur's website and other projects: <a href="https://grgkovac.github.io">https://grgkovac.github.io</a></li>
|
286 |
</ul>
|
287 |
</div>
|
288 |
|