javier-ab-bsc
commited on
Commit
•
a788ce9
1
Parent(s):
ba0d108
Update README.md
Browse files
README.md
CHANGED
@@ -925,22 +925,153 @@ Here, we present results for seven categories of tasks in Spanish, Catalan, Basq
|
|
925 |
|
926 |
Further details on all tasks and criteria, a full list of results compared to other baselines, a discussion of the model's performance across tasks and its implications, and details regarding problem-solving with task implementation will soon be available in the technical report.
|
927 |
|
928 |
-
|
929 |
-
|
930 |
-
|
931 |
-
|
932 |
-
|
933 |
-
|
934 |
-
|
935 |
-
|
936 |
-
|
937 |
-
|
938 |
-
|
939 |
-
|
940 |
-
|
941 |
-
|
942 |
-
|
943 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
944 |
|
945 |
---
|
946 |
|
|
|
925 |
|
926 |
Further details on all tasks and criteria, a full list of results compared to other baselines, a discussion of the model's performance across tasks and its implications, and details regarding problem-solving with task implementation will soon be available in the technical report.
|
927 |
|
928 |
+
<style type="text/css">
|
929 |
+
.tg {border-collapse:collapse;border-spacing:0;}
|
930 |
+
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
|
931 |
+
overflow:hidden;padding:10px 5px;word-break:normal;}
|
932 |
+
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
|
933 |
+
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
|
934 |
+
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
|
935 |
+
</style>
|
936 |
+
<table class="tg"><thead>
|
937 |
+
<tr>
|
938 |
+
<th class="tg-0pky"><span style="font-weight:bold">Category</span></th>
|
939 |
+
<th class="tg-0pky"><span style="font-weight:bold">Dataset</span></th>
|
940 |
+
<th class="tg-0pky"><span style="font-weight:bold">Criteria</span></th>
|
941 |
+
<th class="tg-0pky"><span style="font-weight:bold">es</span></th>
|
942 |
+
<th class="tg-0pky"><span style="font-weight:bold">ca</span></th>
|
943 |
+
<th class="tg-0pky"><span style="font-weight:bold">gl</span></th>
|
944 |
+
<th class="tg-0pky"><span style="font-weight:bold">eu</span></th>
|
945 |
+
<th class="tg-0pky"><span style="font-weight:bold">en</span></th>
|
946 |
+
</tr></thead>
|
947 |
+
<tbody>
|
948 |
+
<tr>
|
949 |
+
<td class="tg-0pky">Commonsense Reasoning</td>
|
950 |
+
<td class="tg-0pky">XStoryCloze</td>
|
951 |
+
<td class="tg-0pky">Ending coherence</td>
|
952 |
+
<td class="tg-0pky">2.36/0.66</td>
|
953 |
+
<td class="tg-0pky">2.49/0.76</td>
|
954 |
+
<td class="tg-0pky">2.45/0.68</td>
|
955 |
+
<td class="tg-0pky">2.30/0.67</td>
|
956 |
+
<td class="tg-0pky">3.06/0.77</td>
|
957 |
+
</tr>
|
958 |
+
<tr>
|
959 |
+
<td class="tg-0pky" rowspan="3">Paraphrasing</td>
|
960 |
+
<td class="tg-0pky" rowspan="3">PAWS</td>
|
961 |
+
<td class="tg-0pky">Completeness `(B)`</td>
|
962 |
+
<td class="tg-0pky">0.60/0.15</td>
|
963 |
+
<td class="tg-0pky">0.54/0.17</td>
|
964 |
+
<td class="tg-0pky">0.64/0.14</td>
|
965 |
+
<td class="tg-0pky">-- / --</td>
|
966 |
+
<td class="tg-0pky">0.79/0.11</td>
|
967 |
+
</tr>
|
968 |
+
<tr>
|
969 |
+
<td class="tg-0pky">Paraphrase generation</td>
|
970 |
+
<td class="tg-0pky">2.89/1.46</td>
|
971 |
+
<td class="tg-0pky">2.71/1.70</td>
|
972 |
+
<td class="tg-0pky">2.80/1.21</td>
|
973 |
+
<td class="tg-0pky">-- / --</td>
|
974 |
+
<td class="tg-0pky">3.64/0.80</td>
|
975 |
+
</tr>
|
976 |
+
<tr>
|
977 |
+
<td class="tg-0pky">Grammatical correctness `(B)`</td>
|
978 |
+
<td class="tg-0pky">0.74/0.13</td>
|
979 |
+
<td class="tg-0pky">0.68/0.15</td>
|
980 |
+
<td class="tg-0pky">0.78/0.10</td>
|
981 |
+
<td class="tg-0pky">-- / --</td>
|
982 |
+
<td class="tg-0pky">0.89/0.07</td>
|
983 |
+
</tr>
|
984 |
+
<tr>
|
985 |
+
<td class="tg-0pky" rowspan="2">Reading Comprehension</td>
|
986 |
+
<td class="tg-0pky" rowspan="2">Belebele</td>
|
987 |
+
<td class="tg-0pky">Passage comprehension</td>
|
988 |
+
<td class="tg-0pky">3.05/0.60</td>
|
989 |
+
<td class="tg-0pky">2.81/0.66</td>
|
990 |
+
<td class="tg-0pky">2.74/0.78</td>
|
991 |
+
<td class="tg-0pky">2.52/0.46</td>
|
992 |
+
<td class="tg-0pky">3.11/0.71</td>
|
993 |
+
</tr>
|
994 |
+
<tr>
|
995 |
+
<td class="tg-0pky">Answer relevance `(B)`</td>
|
996 |
+
<td class="tg-0pky">0.74/0.09</td>
|
997 |
+
<td class="tg-0pky">0.66/0.11</td>
|
998 |
+
<td class="tg-0pky">0.65/0.12</td>
|
999 |
+
<td class="tg-0pky">0.59/0.12</td>
|
1000 |
+
<td class="tg-0pky">0.75/0.09</td>
|
1001 |
+
</tr>
|
1002 |
+
<tr>
|
1003 |
+
<td class="tg-0pky" rowspan="2">Extreme Summarization</td>
|
1004 |
+
<td class="tg-0pky" rowspan="2">XLSum & caBreu & summarization_gl</td>
|
1005 |
+
<td class="tg-0pky">Informativeness</td>
|
1006 |
+
<td class="tg-0pky">3.07/0.39</td>
|
1007 |
+
<td class="tg-0pky">3.33/0.43</td>
|
1008 |
+
<td class="tg-0pky">3.11/0.36</td>
|
1009 |
+
<td class="tg-0pky">-- / --</td>
|
1010 |
+
<td class="tg-0pky">3.06/0.35</td>
|
1011 |
+
</tr>
|
1012 |
+
<tr>
|
1013 |
+
<td class="tg-0pky">Conciseness</td>
|
1014 |
+
<td class="tg-0pky">2.92/0.42</td>
|
1015 |
+
<td class="tg-0pky">2.67/0.54</td>
|
1016 |
+
<td class="tg-0pky">2.93/0.39</td>
|
1017 |
+
<td class="tg-0pky">-- / --</td>
|
1018 |
+
<td class="tg-0pky">3.13/0.31</td>
|
1019 |
+
</tr>
|
1020 |
+
<tr>
|
1021 |
+
<td class="tg-0pky" rowspan="2">Math</td>
|
1022 |
+
<td class="tg-0pky" rowspan="2">MGSM</td>
|
1023 |
+
<td class="tg-0pky">Reasoning capability</td>
|
1024 |
+
<td class="tg-0pky">1.89/0.47</td>
|
1025 |
+
<td class="tg-0pky">1.91/0.45</td>
|
1026 |
+
<td class="tg-0pky">1.97/0.43</td>
|
1027 |
+
<td class="tg-0pky">2.17/0.44</td>
|
1028 |
+
<td class="tg-0pky">2.16/0.56</td>
|
1029 |
+
</tr>
|
1030 |
+
<tr>
|
1031 |
+
<td class="tg-0pky">Mathematical correctness `(B)`</td>
|
1032 |
+
<td class="tg-0pky">0.24/0.10</td>
|
1033 |
+
<td class="tg-0pky">0.28/0.11</td>
|
1034 |
+
<td class="tg-0pky">0.27/0.11</td>
|
1035 |
+
<td class="tg-0pky">0.44/0.13</td>
|
1036 |
+
<td class="tg-0pky">0.27/0.10</td>
|
1037 |
+
</tr>
|
1038 |
+
<tr>
|
1039 |
+
<td class="tg-0pky" rowspan="2">Translation form Language</td>
|
1040 |
+
<td class="tg-0pky" rowspan="2">FLORES-200</td>
|
1041 |
+
<td class="tg-0pky">Fluency</td>
|
1042 |
+
<td class="tg-0pky">3.74/0.15</td>
|
1043 |
+
<td class="tg-0pky">3.69/0.22</td>
|
1044 |
+
<td class="tg-0pky">-- / --</td>
|
1045 |
+
<td class="tg-0pky">-- / --</td>
|
1046 |
+
<td class="tg-0pky">3.69/0.18</td>
|
1047 |
+
</tr>
|
1048 |
+
<tr>
|
1049 |
+
<td class="tg-0pky">Accuracy</td>
|
1050 |
+
<td class="tg-0pky">4.01/0.24</td>
|
1051 |
+
<td class="tg-0pky">3.98/0.31</td>
|
1052 |
+
<td class="tg-0pky">-- / --</td>
|
1053 |
+
<td class="tg-0pky">-- / --</td>
|
1054 |
+
<td class="tg-0pky">3.98/0.25</td>
|
1055 |
+
</tr>
|
1056 |
+
<tr>
|
1057 |
+
<td class="tg-0pky" rowspan="2">Translation to Language</td>
|
1058 |
+
<td class="tg-0pky" rowspan="2">FLORES-200</td>
|
1059 |
+
<td class="tg-0pky">Fluency</td>
|
1060 |
+
<td class="tg-0pky">3.75/0.14</td>
|
1061 |
+
<td class="tg-0pky">3.69/0.17</td>
|
1062 |
+
<td class="tg-0pky">-- / --</td>
|
1063 |
+
<td class="tg-0pky">-- / --</td>
|
1064 |
+
<td class="tg-0pky">4.09/0.16</td>
|
1065 |
+
</tr>
|
1066 |
+
<tr>
|
1067 |
+
<td class="tg-0pky">Accuracy</td>
|
1068 |
+
<td class="tg-0pky">4.08/0.22</td>
|
1069 |
+
<td class="tg-0pky">3.98/0.24</td>
|
1070 |
+
<td class="tg-0pky">-- / --</td>
|
1071 |
+
<td class="tg-0pky">-- / --</td>
|
1072 |
+
<td class="tg-0pky">4.47/0.18</td>
|
1073 |
+
</tr>
|
1074 |
+
</tbody></table>
|
1075 |
|
1076 |
---
|
1077 |
|