ferran-espuna
commited on
Commit
•
ba0d108
1
Parent(s):
5b61f57
Update README.md
Browse filesAdded the correct robustness to LLM as a Judge table
README.md
CHANGED
@@ -927,20 +927,20 @@ Further details on all tasks and criteria, a full list of results compared to ot
|
|
927 |
|
928 |
| **Category** | **Dataset** | **Metric** | **es** | **ca** | **gl** | **eu** | **en** |
|
929 |
|---------|---------|-----------|-------|-------|-------|-------|-------|
|
930 |
-
| **Commonsense Reasoning** | **XStoryCloze** | Ending Coherence (1 to 5) | 2.36/0.
|
931 |
-
| **Paraphrasing** | **PAWS** | Paraphrase Completeness (0/1) | 0.60/0.
|
932 |
-
| | | Paraphrase Generation (1 to 5) | 2.89/
|
933 |
-
| | | Paraphrase Grammatical Correctness (0/1) | 0.74/0.
|
934 |
-
| **Reading Comprehension** | **Belebele** | Passage Comprehension (1 to 5) | 3.05/0.
|
935 |
-
| | | Answer Relevance (0/1) | 0.74/0.
|
936 |
-
| **Extreme Summarization** | **XLSum & caBreu & summarization_gl** | Extreme Summarization Informativeness (1 to 5) | 3.07/0.
|
937 |
-
| | | Extreme Summarization Conciseness (1 to 5) | 2.92/0.
|
938 |
-
| **Mathematics** | **mgsm** | Reasoning Capability (1 to 5) | 1.89/0.
|
939 |
-
| | | Mathematical Correctness (0/1) | 0.24/0.
|
940 |
-
| **Translation form Language** | **FLoRes** | Translation Fluency (1 to 5) | 3.74/0.
|
941 |
-
| | | Translation Accuracy (1 to 5) | 4.01/0.
|
942 |
-
| **Translation to Language** | **FLoRes** | Translation Fluency (1 to 5) | 3.75/0.
|
943 |
-
| | | Translation Accuracy (1 to 5) | 4.08/0.
|
944 |
|
945 |
---
|
946 |
|
|
|
927 |
|
928 |
| **Category** | **Dataset** | **Metric** | **es** | **ca** | **gl** | **eu** | **en** |
|
929 |
|---------|---------|-----------|-------|-------|-------|-------|-------|
|
930 |
+
| **Commonsense Reasoning** | **XStoryCloze** | Ending Coherence (1 to 5) | 2.36/0.66 | 2.49/0.76 | 2.45/0.68 | 2.30/0.67 | 3.06/0.77 |
|
931 |
+
| **Paraphrasing** | **PAWS** | Paraphrase Completeness (0/1) | 0.60/0.15 | 0.54/0.17 | 0.64/0.14 | ----/---- | 0.79/0.11 |
|
932 |
+
| | | Paraphrase Generation (1 to 5) | 2.89/1.46 | 2.71/1.70 | 2.80/1.21 | ----/---- | 3.64/0.80 |
|
933 |
+
| | | Paraphrase Grammatical Correctness (0/1) | 0.74/0.13 | 0.68/0.15 | 0.78/0.10 | ----/---- | 0.89/0.07 |
|
934 |
+
| **Reading Comprehension** | **Belebele** | Passage Comprehension (1 to 5) | 3.05/0.60 | 2.81/0.66 | 2.74/0.78 | 2.52/0.46 | 3.11/0.71 |
|
935 |
+
| | | Answer Relevance (0/1) | 0.74/0.09 | 0.66/0.11 | 0.65/0.12 | 0.59/0.12 | 0.75/0.09 |
|
936 |
+
| **Extreme Summarization** | **XLSum & caBreu & summarization_gl** | Extreme Summarization Informativeness (1 to 5) | 3.07/0.39 | 3.33/0.43 | 3.11/0.36 | ----/---- | 3.06/0.35 |
|
937 |
+
| | | Extreme Summarization Conciseness (1 to 5) | 2.92/0.42 | 2.67/0.54 | 2.93/0.39 | ----/---- | 3.13/0.31 |
|
938 |
+
| **Mathematics** | **mgsm** | Reasoning Capability (1 to 5) | 1.89/0.47 | 1.91/0.45 | 1.97/0.43 | 2.17/0.44 | 2.16/0.56 |
|
939 |
+
| | | Mathematical Correctness (0/1) | 0.24/0.10 | 0.28/0.11 | 0.27/0.11 | 0.44/0.13 | 0.27/0.10 |
|
940 |
+
| **Translation form Language** | **FLoRes** | Translation Fluency (1 to 5) | 3.74/0.15 | 3.69/0.22 | ----/---- | ----/---- | 3.69/0.18 |
|
941 |
+
| | | Translation Accuracy (1 to 5) | 4.01/0.24 | 3.98/0.31 | ----/---- | ----/---- | 3.98/0.25 |
|
942 |
+
| **Translation to Language** | **FLoRes** | Translation Fluency (1 to 5) | 3.75/0.14 | 3.69/0.17 | ----/---- | ----/---- | 4.09/0.16 |
|
943 |
+
| | | Translation Accuracy (1 to 5) | 4.08/0.22 | 3.98/0.24 | ----/---- | ----/---- | 4.47/0.18 |
|
944 |
|
945 |
---
|
946 |
|