javier-ab-bsc commited on
Commit
a788ce9
1 Parent(s): ba0d108

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +147 -16
README.md CHANGED
@@ -925,22 +925,153 @@ Here, we present results for seven categories of tasks in Spanish, Catalan, Basq
925
 
926
  Further details on all tasks and criteria, a full list of results compared to other baselines, a discussion of the model's performance across tasks and its implications, and details regarding problem-solving with task implementation will soon be available in the technical report.
927
 
928
- | **Category** | **Dataset** | **Metric** | **es** | **ca** | **gl** | **eu** | **en** |
929
- |---------|---------|-----------|-------|-------|-------|-------|-------|
930
- | **Commonsense Reasoning** | **XStoryCloze** | Ending Coherence (1 to 5) | 2.36/0.66 | 2.49/0.76 | 2.45/0.68 | 2.30/0.67 | 3.06/0.77 |
931
- | **Paraphrasing** | **PAWS** | Paraphrase Completeness (0/1) | 0.60/0.15 | 0.54/0.17 | 0.64/0.14 | ----/---- | 0.79/0.11 |
932
- | | | Paraphrase Generation (1 to 5) | 2.89/1.46 | 2.71/1.70 | 2.80/1.21 | ----/---- | 3.64/0.80 |
933
- | | | Paraphrase Grammatical Correctness (0/1) | 0.74/0.13 | 0.68/0.15 | 0.78/0.10 | ----/---- | 0.89/0.07 |
934
- | **Reading Comprehension** | **Belebele** | Passage Comprehension (1 to 5) | 3.05/0.60 | 2.81/0.66 | 2.74/0.78 | 2.52/0.46 | 3.11/0.71 |
935
- | | | Answer Relevance (0/1) | 0.74/0.09 | 0.66/0.11 | 0.65/0.12 | 0.59/0.12 | 0.75/0.09 |
936
- | **Extreme Summarization** | **XLSum & caBreu & summarization_gl** | Extreme Summarization Informativeness (1 to 5) | 3.07/0.39 | 3.33/0.43 | 3.11/0.36 | ----/---- | 3.06/0.35 |
937
- | | | Extreme Summarization Conciseness (1 to 5) | 2.92/0.42 | 2.67/0.54 | 2.93/0.39 | ----/---- | 3.13/0.31 |
938
- | **Mathematics** | **mgsm** | Reasoning Capability (1 to 5) | 1.89/0.47 | 1.91/0.45 | 1.97/0.43 | 2.17/0.44 | 2.16/0.56 |
939
- | | | Mathematical Correctness (0/1) | 0.24/0.10 | 0.28/0.11 | 0.27/0.11 | 0.44/0.13 | 0.27/0.10 |
940
- | **Translation form Language** | **FLoRes** | Translation Fluency (1 to 5) | 3.74/0.15 | 3.69/0.22 | ----/---- | ----/---- | 3.69/0.18 |
941
- | | | Translation Accuracy (1 to 5) | 4.01/0.24 | 3.98/0.31 | ----/---- | ----/---- | 3.98/0.25 |
942
- | **Translation to Language** | **FLoRes** | Translation Fluency (1 to 5) | 3.75/0.14 | 3.69/0.17 | ----/---- | ----/---- | 4.09/0.16 |
943
- | | | Translation Accuracy (1 to 5) | 4.08/0.22 | 3.98/0.24 | ----/---- | ----/---- | 4.47/0.18 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
944
 
945
  ---
946
 
 
925
 
926
  Further details on all tasks and criteria, a full list of results compared to other baselines, a discussion of the model's performance across tasks and its implications, and details regarding problem-solving with task implementation will soon be available in the technical report.
927
 
928
+ <style type="text/css">
929
+ .tg {border-collapse:collapse;border-spacing:0;}
930
+ .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
931
+ overflow:hidden;padding:10px 5px;word-break:normal;}
932
+ .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
933
+ font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
934
+ .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
935
+ </style>
936
+ <table class="tg"><thead>
937
+ <tr>
938
+ <th class="tg-0pky"><span style="font-weight:bold">Category</span></th>
939
+ <th class="tg-0pky"><span style="font-weight:bold">Dataset</span></th>
940
+ <th class="tg-0pky"><span style="font-weight:bold">Criteria</span></th>
941
+ <th class="tg-0pky"><span style="font-weight:bold">es</span></th>
942
+ <th class="tg-0pky"><span style="font-weight:bold">ca</span></th>
943
+ <th class="tg-0pky"><span style="font-weight:bold">gl</span></th>
944
+ <th class="tg-0pky"><span style="font-weight:bold">eu</span></th>
945
+ <th class="tg-0pky"><span style="font-weight:bold">en</span></th>
946
+ </tr></thead>
947
+ <tbody>
948
+ <tr>
949
+ <td class="tg-0pky">Commonsense Reasoning</td>
950
+ <td class="tg-0pky">XStoryCloze</td>
951
+ <td class="tg-0pky">Ending coherence</td>
952
+ <td class="tg-0pky">2.36/0.66</td>
953
+ <td class="tg-0pky">2.49/0.76</td>
954
+ <td class="tg-0pky">2.45/0.68</td>
955
+ <td class="tg-0pky">2.30/0.67</td>
956
+ <td class="tg-0pky">3.06/0.77</td>
957
+ </tr>
958
+ <tr>
959
+ <td class="tg-0pky" rowspan="3">Paraphrasing</td>
960
+ <td class="tg-0pky" rowspan="3">PAWS</td>
961
+ <td class="tg-0pky">Completeness `(B)`</td>
962
+ <td class="tg-0pky">0.60/0.15</td>
963
+ <td class="tg-0pky">0.54/0.17</td>
964
+ <td class="tg-0pky">0.64/0.14</td>
965
+ <td class="tg-0pky">-- / --</td>
966
+ <td class="tg-0pky">0.79/0.11</td>
967
+ </tr>
968
+ <tr>
969
+ <td class="tg-0pky">Paraphrase generation</td>
970
+ <td class="tg-0pky">2.89/1.46</td>
971
+ <td class="tg-0pky">2.71/1.70</td>
972
+ <td class="tg-0pky">2.80/1.21</td>
973
+ <td class="tg-0pky">-- / --</td>
974
+ <td class="tg-0pky">3.64/0.80</td>
975
+ </tr>
976
+ <tr>
977
+ <td class="tg-0pky">Grammatical correctness `(B)`</td>
978
+ <td class="tg-0pky">0.74/0.13</td>
979
+ <td class="tg-0pky">0.68/0.15</td>
980
+ <td class="tg-0pky">0.78/0.10</td>
981
+ <td class="tg-0pky">-- / --</td>
982
+ <td class="tg-0pky">0.89/0.07</td>
983
+ </tr>
984
+ <tr>
985
+ <td class="tg-0pky" rowspan="2">Reading Comprehension</td>
986
+ <td class="tg-0pky" rowspan="2">Belebele</td>
987
+ <td class="tg-0pky">Passage comprehension</td>
988
+ <td class="tg-0pky">3.05/0.60</td>
989
+ <td class="tg-0pky">2.81/0.66</td>
990
+ <td class="tg-0pky">2.74/0.78</td>
991
+ <td class="tg-0pky">2.52/0.46</td>
992
+ <td class="tg-0pky">3.11/0.71</td>
993
+ </tr>
994
+ <tr>
995
+ <td class="tg-0pky">Answer relevance `(B)`</td>
996
+ <td class="tg-0pky">0.74/0.09</td>
997
+ <td class="tg-0pky">0.66/0.11</td>
998
+ <td class="tg-0pky">0.65/0.12</td>
999
+ <td class="tg-0pky">0.59/0.12</td>
1000
+ <td class="tg-0pky">0.75/0.09</td>
1001
+ </tr>
1002
+ <tr>
1003
+ <td class="tg-0pky" rowspan="2">Extreme Summarization</td>
1004
+ <td class="tg-0pky" rowspan="2">XLSum &amp; caBreu &amp; summarization_gl</td>
1005
+ <td class="tg-0pky">Informativeness</td>
1006
+ <td class="tg-0pky">3.07/0.39</td>
1007
+ <td class="tg-0pky">3.33/0.43</td>
1008
+ <td class="tg-0pky">3.11/0.36</td>
1009
+ <td class="tg-0pky">-- / --</td>
1010
+ <td class="tg-0pky">3.06/0.35</td>
1011
+ </tr>
1012
+ <tr>
1013
+ <td class="tg-0pky">Conciseness</td>
1014
+ <td class="tg-0pky">2.92/0.42</td>
1015
+ <td class="tg-0pky">2.67/0.54</td>
1016
+ <td class="tg-0pky">2.93/0.39</td>
1017
+ <td class="tg-0pky">-- / --</td>
1018
+ <td class="tg-0pky">3.13/0.31</td>
1019
+ </tr>
1020
+ <tr>
1021
+ <td class="tg-0pky" rowspan="2">Math</td>
1022
+ <td class="tg-0pky" rowspan="2">MGSM</td>
1023
+ <td class="tg-0pky">Reasoning capability</td>
1024
+ <td class="tg-0pky">1.89/0.47</td>
1025
+ <td class="tg-0pky">1.91/0.45</td>
1026
+ <td class="tg-0pky">1.97/0.43</td>
1027
+ <td class="tg-0pky">2.17/0.44</td>
1028
+ <td class="tg-0pky">2.16/0.56</td>
1029
+ </tr>
1030
+ <tr>
1031
+ <td class="tg-0pky">Mathematical correctness `(B)`</td>
1032
+ <td class="tg-0pky">0.24/0.10</td>
1033
+ <td class="tg-0pky">0.28/0.11</td>
1034
+ <td class="tg-0pky">0.27/0.11</td>
1035
+ <td class="tg-0pky">0.44/0.13</td>
1036
+ <td class="tg-0pky">0.27/0.10</td>
1037
+ </tr>
1038
+ <tr>
1039
+ <td class="tg-0pky" rowspan="2">Translation form Language</td>
1040
+ <td class="tg-0pky" rowspan="2">FLORES-200</td>
1041
+ <td class="tg-0pky">Fluency</td>
1042
+ <td class="tg-0pky">3.74/0.15</td>
1043
+ <td class="tg-0pky">3.69/0.22</td>
1044
+ <td class="tg-0pky">-- / --</td>
1045
+ <td class="tg-0pky">-- / --</td>
1046
+ <td class="tg-0pky">3.69/0.18</td>
1047
+ </tr>
1048
+ <tr>
1049
+ <td class="tg-0pky">Accuracy</td>
1050
+ <td class="tg-0pky">4.01/0.24</td>
1051
+ <td class="tg-0pky">3.98/0.31</td>
1052
+ <td class="tg-0pky">-- / --</td>
1053
+ <td class="tg-0pky">-- / --</td>
1054
+ <td class="tg-0pky">3.98/0.25</td>
1055
+ </tr>
1056
+ <tr>
1057
+ <td class="tg-0pky" rowspan="2">Translation to Language</td>
1058
+ <td class="tg-0pky" rowspan="2">FLORES-200</td>
1059
+ <td class="tg-0pky">Fluency</td>
1060
+ <td class="tg-0pky">3.75/0.14</td>
1061
+ <td class="tg-0pky">3.69/0.17</td>
1062
+ <td class="tg-0pky">-- / --</td>
1063
+ <td class="tg-0pky">-- / --</td>
1064
+ <td class="tg-0pky">4.09/0.16</td>
1065
+ </tr>
1066
+ <tr>
1067
+ <td class="tg-0pky">Accuracy</td>
1068
+ <td class="tg-0pky">4.08/0.22</td>
1069
+ <td class="tg-0pky">3.98/0.24</td>
1070
+ <td class="tg-0pky">-- / --</td>
1071
+ <td class="tg-0pky">-- / --</td>
1072
+ <td class="tg-0pky">4.47/0.18</td>
1073
+ </tr>
1074
+ </tbody></table>
1075
 
1076
  ---
1077