Spaces:

flowers-team
/

StickToYourRoleLeaderboard

Running

App Files Files Community

grg commited on Aug 8

Commit

e0d1b89

•

1 Parent(s): 1c7b1ad

Adding the metric list in index

Browse files

Files changed (1) hide show

templates/index.html +13 -0

templates/index.html CHANGED Viewed

@@ -239,6 +239,19 @@
             We <b>aggregate</b> Rank-Order stability and validation metrics to rank the models. We do so in two ways: <b>Cardinal</b> and <b>Ordinal</b>.
             Following, <a href="https://arxiv.org/abs/2405.01719">this paper</a>, we compute the stability and diversity of those rankings. See <a href="{{ url_for('about', _anchor='aggregate_metrics') }}">here</a> for more details.
         </p>
         <div class="table-responsive full-table">
             <!-- Render the table HTML here -->
             {{ full_table_html|safe }}

             We <b>aggregate</b> Rank-Order stability and validation metrics to rank the models. We do so in two ways: <b>Cardinal</b> and <b>Ordinal</b>.
             Following, <a href="https://arxiv.org/abs/2405.01719">this paper</a>, we compute the stability and diversity of those rankings. See <a href="{{ url_for('about', _anchor='aggregate_metrics') }}">here</a> for more details.
         </p>
+        <p>
+            To sum up here are the metrics used:
+            <ul>
+                <li><b>RO-stability</b>: the correlation in the order of simulated participants (ordered based on the expression of the same values) over different contexts</li>
+                <!--Validation metrics:-->
+                <li><b>Stress</b>: the MDS fit of the observed value structure to the theoretical circular structure. Stress of 0 indicates 'perfect' fit, 0.025 excellent, 0.05 good, 0.1 fair, and 0.2 poor.</li>
+                <li><b>Separability</b>: the extent to which questions corresponding to different values are linearly separable in the 2D MDS space (linear multi-label classifier accuracy)</li>
+                <li><b>CFI, SRMR, RMSEA</b>: Common Confirmatory Factor Analysis (CFA) metrics showing the fit between the data and the posited model of the relation of items (questions) to factors (values), applied here with Magnifying Glass CFA. For CFI >.90 is considered acceptable fit, for SRMR and RMSEA is <.05 considered good fit and <.08 reasonable.</li>
+                <!--Aggregate metrics:-->
+                <li><b>Ordinal - Win Rate</b>: the score averaged over all metrics (with descending metrics inverted), context pairs (for stability) and contexts (for validity metrics)</li>
+                <li><b>Cardinal - Score</b>: percentage of won games against all models, where a game is a comparison between two models for each metric, each context pair (for stability) and each context (for validity metrics)</li>
+            </ul>
+        </p>
         <div class="table-responsive full-table">
             <!-- Render the table HTML here -->
             {{ full_table_html|safe }}