Spaces:

flowers-team
/

StickToYourRoleLeaderboard

Running

grg commited on Aug 8

Commit

fa96c3a

•

1 Parent(s): e0d1b89

Decorative changes: uppercase to lowercase for some paragraphs

Files changed (2) hide show

templates/about.html CHANGED Viewed

@@ -182,10 +182,10 @@
         <div class="section">
             <div class="section-title">Motivation</div>
             <p>
-                Benchmarks usually compare models with <b>MANY QUESTIONS</b> from <b>A SINGLE MINIMAL CONTEXT</b>, e.g. as multiple choices questions.
                 This kind of evaluation is little informative of LLMs' behavior in deployment when exposed to new contexts (especially when we consider the LLMs highly context-dependant nature).
-                We argue that <b>CONTEXT-DEPENDENCE</b> can be seen as a <b>PROPERTY of LLMs</b>: a dimension of LLM comparison alongside others like size, speed, or knowledge.
-                We evaluate LLMs by asking the <b> SAME QUESTIONS </b> from <b> MANY DIFFERENT CONTEXTS </b>.
             </p>
             <p>
                 LLMs are often used to simulate personas and populations.

         <div class="section">
             <div class="section-title">Motivation</div>
             <p>
+                Benchmarks usually compare models with <b>many questions</b> from <b>a single minimal context</b>, e.g. as multiple choices questions.
                 This kind of evaluation is little informative of LLMs' behavior in deployment when exposed to new contexts (especially when we consider the LLMs highly context-dependant nature).
+                We argue that <b>context-dependence</b> can be seen as a <b>property of LLMs</b>: a dimension of LLM comparison alongside others like size, speed, or knowledge.
+                We evaluate LLMs by asking the <b> same questions </b> from <b> many different contexts </b>.
             </p>
             <p>
                 LLMs are often used to simulate personas and populations.

templates/index.html CHANGED Viewed

@@ -203,8 +203,8 @@
             As proposed in our <a href="https://arxiv.org/abs/2402.14846">paper</a>,
             unwanted context-dependence should be seen as a <b>property of LLMs</b> - a dimension of LLM comparison (alongside others such as model size speed or expressed knowledge).
             This leaderboard aims to provide such a comparison and extends our paper with a more focused and elaborate experimental setup.
-            Standard benchmarks present <b>MANY</b> questions from the <b>SAME MINIMAL contexts</b> (e.g. multiple choice questions),
-            we present <b>SAME</b> questions from <b>MANY different contexts</b>.
         </p>
         <div class="table-responsive main-table">
             <!-- Render the table HTML here -->
@@ -282,7 +282,7 @@
             <li>Contact: <a href="mailto: [email protected]">[email protected]</a></li>
             <li>See the <a href="https://sites.google.com/view/llmvaluestability">Project website<a/></li>
             <li>See the Flowers team <a href="http://developmentalsystems.org">blog</a> and <a href="https://flowers.inria.fr/">website</a></li>
-            <li>See Grgur's website and other projects: <a href="https://grgkovac.github.io/">https://grgkovac.github.io/</a></li>
         </ul>
     </div>

             As proposed in our <a href="https://arxiv.org/abs/2402.14846">paper</a>,
             unwanted context-dependence should be seen as a <b>property of LLMs</b> - a dimension of LLM comparison (alongside others such as model size speed or expressed knowledge).
             This leaderboard aims to provide such a comparison and extends our paper with a more focused and elaborate experimental setup.
+            Standard benchmarks present <b>many</b> questions from the <b>same minimal contexts</b> (e.g. multiple choice questions),
+            we present <b>same</b> questions from <b>many different contexts</b>.
         </p>
         <div class="table-responsive main-table">
             <!-- Render the table HTML here -->
             <li>Contact: <a href="mailto: [email protected]">[email protected]</a></li>
             <li>See the <a href="https://sites.google.com/view/llmvaluestability">Project website<a/></li>
             <li>See the Flowers team <a href="http://developmentalsystems.org">blog</a> and <a href="https://flowers.inria.fr/">website</a></li>
+            <li>See Grgur's website and other projects: <a href="https://grgkovac.github.io">https://grgkovac.github.io</a></li>
         </ul>
     </div>