grg commited on
Commit
fa96c3a
1 Parent(s): e0d1b89

Decorative changes: uppercase to lowercase for some paragraphs

Browse files
Files changed (2) hide show
  1. templates/about.html +3 -3
  2. templates/index.html +3 -3
templates/about.html CHANGED
@@ -182,10 +182,10 @@
182
  <div class="section">
183
  <div class="section-title">Motivation</div>
184
  <p>
185
- Benchmarks usually compare models with <b>MANY QUESTIONS</b> from <b>A SINGLE MINIMAL CONTEXT</b>, e.g. as multiple choices questions.
186
  This kind of evaluation is little informative of LLMs' behavior in deployment when exposed to new contexts (especially when we consider the LLMs highly context-dependant nature).
187
- We argue that <b>CONTEXT-DEPENDENCE</b> can be seen as a <b>PROPERTY of LLMs</b>: a dimension of LLM comparison alongside others like size, speed, or knowledge.
188
- We evaluate LLMs by asking the <b> SAME QUESTIONS </b> from <b> MANY DIFFERENT CONTEXTS </b>.
189
  </p>
190
  <p>
191
  LLMs are often used to simulate personas and populations.
 
182
  <div class="section">
183
  <div class="section-title">Motivation</div>
184
  <p>
185
+ Benchmarks usually compare models with <b>many questions</b> from <b>a single minimal context</b>, e.g. as multiple choices questions.
186
  This kind of evaluation is little informative of LLMs' behavior in deployment when exposed to new contexts (especially when we consider the LLMs highly context-dependant nature).
187
+ We argue that <b>context-dependence</b> can be seen as a <b>property of LLMs</b>: a dimension of LLM comparison alongside others like size, speed, or knowledge.
188
+ We evaluate LLMs by asking the <b> same questions </b> from <b> many different contexts </b>.
189
  </p>
190
  <p>
191
  LLMs are often used to simulate personas and populations.
templates/index.html CHANGED
@@ -203,8 +203,8 @@
203
  As proposed in our <a href="https://arxiv.org/abs/2402.14846">paper</a>,
204
  unwanted context-dependence should be seen as a <b>property of LLMs</b> - a dimension of LLM comparison (alongside others such as model size speed or expressed knowledge).
205
  This leaderboard aims to provide such a comparison and extends our paper with a more focused and elaborate experimental setup.
206
- Standard benchmarks present <b>MANY</b> questions from the <b>SAME MINIMAL contexts</b> (e.g. multiple choice questions),
207
- we present <b>SAME</b> questions from <b>MANY different contexts</b>.
208
  </p>
209
  <div class="table-responsive main-table">
210
  <!-- Render the table HTML here -->
@@ -282,7 +282,7 @@
282
  <li>Contact: <a href="mailto: [email protected]">[email protected]</a></li>
283
  <li>See the <a href="https://sites.google.com/view/llmvaluestability">Project website<a/></li>
284
  <li>See the Flowers team <a href="http://developmentalsystems.org">blog</a> and <a href="https://flowers.inria.fr/">website</a></li>
285
- <li>See Grgur's website and other projects: <a href="https://grgkovac.github.io/">https://grgkovac.github.io/</a></li>
286
  </ul>
287
  </div>
288
 
 
203
  As proposed in our <a href="https://arxiv.org/abs/2402.14846">paper</a>,
204
  unwanted context-dependence should be seen as a <b>property of LLMs</b> - a dimension of LLM comparison (alongside others such as model size speed or expressed knowledge).
205
  This leaderboard aims to provide such a comparison and extends our paper with a more focused and elaborate experimental setup.
206
+ Standard benchmarks present <b>many</b> questions from the <b>same minimal contexts</b> (e.g. multiple choice questions),
207
+ we present <b>same</b> questions from <b>many different contexts</b>.
208
  </p>
209
  <div class="table-responsive main-table">
210
  <!-- Render the table HTML here -->
 
282
  <li>Contact: <a href="mailto: [email protected]">[email protected]</a></li>
283
  <li>See the <a href="https://sites.google.com/view/llmvaluestability">Project website<a/></li>
284
  <li>See the Flowers team <a href="http://developmentalsystems.org">blog</a> and <a href="https://flowers.inria.fr/">website</a></li>
285
+ <li>See Grgur's website and other projects: <a href="https://grgkovac.github.io">https://grgkovac.github.io</a></li>
286
  </ul>
287
  </div>
288