added results on grammatical error correction
Browse files
README.md
CHANGED
@@ -93,7 +93,7 @@ We release [our codebase here](https://github.com/ltgoslo/norallm). We compare a
|
|
93 |
We use the binary formulation of this task (positive vs. negative).
|
94 |
|
95 |
<details>
|
96 |
-
<summary>Method</summary>
|
97 |
|
98 |
* Evaluation setting: zero-shot and few-shot perplexity-based evaluation.
|
99 |
* Prompt: ```"Tekst: {text}\nSentiment:{label}"```, where the ```label``` is either "positiv" or "negativ".
|
@@ -126,7 +126,7 @@ We use the binary formulation of this task (positive vs. negative).
|
|
126 |
[NorQuAD](https://huggingface.co/datasets/ltg/norquad) ([Ivanova et al., 2023](https://aclanthology.org/2023.nodalida-1.17/)) is a dataset for extractive question answering in Norwegian designed similarly to [SQuAD (Rajpurkar et al., 2016)](https://aclanthology.org/D16-1264/).
|
127 |
|
128 |
<details>
|
129 |
-
<summary>Method</summary>
|
130 |
|
131 |
* Evaluation setting: zero-shot and few-shot settings via natural language generation using the greedy decoding strategy.
|
132 |
* Prompt: ```"Tittel: {title}\n\nTekst: {text}\n\nSpørsmål: {question}\n\nSvar:{answer}"``` Based on [Brown et al. (2020)](https://arxiv.org/abs/2005.14165).
|
@@ -153,13 +153,45 @@ We use the binary formulation of this task (positive vs. negative).
|
|
153 |
</details>
|
154 |
|
155 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
156 |
|
157 |
### Machine translation
|
158 |
|
159 |
[Tatoeba](https://huggingface.co/datasets/Helsinki-NLP/tatoeba_mt) [(Tiedemann, 2020)](https://aclanthology.org/2020.wmt-1.139/) is a benchmark for machine translation, which includes hundreds of language pairs. We consider six language pairs (English <-> Bokmål, English <-> Nynorsk, and Bokmål <-> Nynorsk).
|
160 |
|
161 |
<details>
|
162 |
-
<summary>Method</summary>
|
163 |
|
164 |
* Evaluation setting: zero-shot and few-shot settings via natural language generation using the greedy decoding strategy.
|
165 |
* Prompt: ```"{source_language}: {source_text}\n{target_language}:{target_text}"```, where the ```source_language``` and ```target_language``` are ```Engelsk```, ```Bokmål```, or ```Nynorsk```. Based on [Garcia et al. (2023)](https://arxiv.org/abs/2302.01398).
|
|
|
93 |
We use the binary formulation of this task (positive vs. negative).
|
94 |
|
95 |
<details>
|
96 |
+
<summary>Method (click to expand)</summary>
|
97 |
|
98 |
* Evaluation setting: zero-shot and few-shot perplexity-based evaluation.
|
99 |
* Prompt: ```"Tekst: {text}\nSentiment:{label}"```, where the ```label``` is either "positiv" or "negativ".
|
|
|
126 |
[NorQuAD](https://huggingface.co/datasets/ltg/norquad) ([Ivanova et al., 2023](https://aclanthology.org/2023.nodalida-1.17/)) is a dataset for extractive question answering in Norwegian designed similarly to [SQuAD (Rajpurkar et al., 2016)](https://aclanthology.org/D16-1264/).
|
127 |
|
128 |
<details>
|
129 |
+
<summary>Method (click to expand)</summary>
|
130 |
|
131 |
* Evaluation setting: zero-shot and few-shot settings via natural language generation using the greedy decoding strategy.
|
132 |
* Prompt: ```"Tittel: {title}\n\nTekst: {text}\n\nSpørsmål: {question}\n\nSvar:{answer}"``` Based on [Brown et al. (2020)](https://arxiv.org/abs/2005.14165).
|
|
|
153 |
</details>
|
154 |
|
155 |
|
156 |
+
### Grammatical error correction
|
157 |
+
|
158 |
+
[ASK-RAW](https://huggingface.co/datasets/ltg/ask-gec) is dataset for Norwegian grammatical error correction (GEC) created by [Matias Jentoft (2023)](https://www.duo.uio.no/handle/10852/103885).
|
159 |
+
|
160 |
+
<details>
|
161 |
+
<summary>Method (click to expand)</summary>
|
162 |
+
|
163 |
+
* Evaluation setting: zero-shot and few-shot settings via natural language generation using the greedy decoding strategy.
|
164 |
+
* Prompt: ```"Her er eksempler på perfekt korrigering av grammatiske feil:\n\nTekst: {source_text}\nKorreksjon:{target_text}"```
|
165 |
+
* Few-shot results show the average scores across 5 repetitions
|
166 |
+
* Evaluation script: https://github.com/ltgoslo/norallm/blob/main/initial_evaluation/gec.py
|
167 |
+
* Performance metrics: the evaluation metric uses [ERRANT](https://github.com/chrisjbryant/errant/tree/main), which identifies edit-spans and then calculates the F_{0.5} scores between the gold edits and predicted edits.
|
168 |
+
|
169 |
+
</details>
|
170 |
+
|
171 |
+
<details open>
|
172 |
+
<summary>Results on [the ASK corpus](https://huggingface.co/datasets/ltg/ask-gec) (ERRANT F_{0.5})</summary>
|
173 |
+
|
174 |
+
|Model|0-shot (F0.5)|1-shot (F0.5)|32-shot (F0.5)|
|
175 |
+
|---|---|---|---|
|
176 |
+
|NorMistral-7b-warm|**40.8**|41.8|48.5|
|
177 |
+
|NorMistral-7b-scratch|22.1|28.8|42.1|
|
178 |
+
|NorBLOOM-7b|8.7|24.5|32.0|
|
179 |
+
|NB-GPT-J|9.1|28.2|30.6|
|
180 |
+
|GPT-Sw3-6.7B|30.5|42.9|**50.6**|
|
181 |
+
|GPT-Sw3-6.7B-v2|40.6|**43.4**|49.8|
|
182 |
+
|Falcon-7B|10.8|12.4|15.5|
|
183 |
+
|Mistral-7B-v0.1|26.0|27.4|30.6|
|
184 |
+
|
185 |
+
</details>
|
186 |
+
|
187 |
+
|
188 |
|
189 |
### Machine translation
|
190 |
|
191 |
[Tatoeba](https://huggingface.co/datasets/Helsinki-NLP/tatoeba_mt) [(Tiedemann, 2020)](https://aclanthology.org/2020.wmt-1.139/) is a benchmark for machine translation, which includes hundreds of language pairs. We consider six language pairs (English <-> Bokmål, English <-> Nynorsk, and Bokmål <-> Nynorsk).
|
192 |
|
193 |
<details>
|
194 |
+
<summary>Method (click to expand)</summary>
|
195 |
|
196 |
* Evaluation setting: zero-shot and few-shot settings via natural language generation using the greedy decoding strategy.
|
197 |
* Prompt: ```"{source_language}: {source_text}\n{target_language}:{target_text}"```, where the ```source_language``` and ```target_language``` are ```Engelsk```, ```Bokmål```, or ```Nynorsk```. Based on [Garcia et al. (2023)](https://arxiv.org/abs/2302.01398).
|