davda54 commited on
Commit
a1b89ac
1 Parent(s): c24decd

added results on grammatical error correction

Browse files
Files changed (1) hide show
  1. README.md +35 -3
README.md CHANGED
@@ -93,7 +93,7 @@ We release [our codebase here](https://github.com/ltgoslo/norallm). We compare a
93
  We use the binary formulation of this task (positive vs. negative).
94
 
95
  <details>
96
- <summary>Method</summary>
97
 
98
  * Evaluation setting: zero-shot and few-shot perplexity-based evaluation.
99
  * Prompt: ```"Tekst: {text}\nSentiment:{label}"```, where the ```label``` is either "positiv" or "negativ".
@@ -126,7 +126,7 @@ We use the binary formulation of this task (positive vs. negative).
126
  [NorQuAD](https://huggingface.co/datasets/ltg/norquad) ([Ivanova et al., 2023](https://aclanthology.org/2023.nodalida-1.17/)) is a dataset for extractive question answering in Norwegian designed similarly to [SQuAD (Rajpurkar et al., 2016)](https://aclanthology.org/D16-1264/).
127
 
128
  <details>
129
- <summary>Method</summary>
130
 
131
  * Evaluation setting: zero-shot and few-shot settings via natural language generation using the greedy decoding strategy.
132
  * Prompt: ```"Tittel: {title}\n\nTekst: {text}\n\nSpørsmål: {question}\n\nSvar:{answer}"``` Based on [Brown et al. (2020)](https://arxiv.org/abs/2005.14165).
@@ -153,13 +153,45 @@ We use the binary formulation of this task (positive vs. negative).
153
  </details>
154
 
155
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
156
 
157
  ### Machine translation
158
 
159
  [Tatoeba](https://huggingface.co/datasets/Helsinki-NLP/tatoeba_mt) [(Tiedemann, 2020)](https://aclanthology.org/2020.wmt-1.139/) is a benchmark for machine translation, which includes hundreds of language pairs. We consider six language pairs (English <-> Bokmål, English <-> Nynorsk, and Bokmål <-> Nynorsk).
160
 
161
  <details>
162
- <summary>Method</summary>
163
 
164
  * Evaluation setting: zero-shot and few-shot settings via natural language generation using the greedy decoding strategy.
165
  * Prompt: ```"{source_language}: {source_text}\n{target_language}:{target_text}"```, where the ```source_language``` and ```target_language``` are ```Engelsk```, ```Bokmål```, or ```Nynorsk```. Based on [Garcia et al. (2023)](https://arxiv.org/abs/2302.01398).
 
93
  We use the binary formulation of this task (positive vs. negative).
94
 
95
  <details>
96
+ <summary>Method (click to expand)</summary>
97
 
98
  * Evaluation setting: zero-shot and few-shot perplexity-based evaluation.
99
  * Prompt: ```"Tekst: {text}\nSentiment:{label}"```, where the ```label``` is either "positiv" or "negativ".
 
126
  [NorQuAD](https://huggingface.co/datasets/ltg/norquad) ([Ivanova et al., 2023](https://aclanthology.org/2023.nodalida-1.17/)) is a dataset for extractive question answering in Norwegian designed similarly to [SQuAD (Rajpurkar et al., 2016)](https://aclanthology.org/D16-1264/).
127
 
128
  <details>
129
+ <summary>Method (click to expand)</summary>
130
 
131
  * Evaluation setting: zero-shot and few-shot settings via natural language generation using the greedy decoding strategy.
132
  * Prompt: ```"Tittel: {title}\n\nTekst: {text}\n\nSpørsmål: {question}\n\nSvar:{answer}"``` Based on [Brown et al. (2020)](https://arxiv.org/abs/2005.14165).
 
153
  </details>
154
 
155
 
156
+ ### Grammatical error correction
157
+
158
+ [ASK-RAW](https://huggingface.co/datasets/ltg/ask-gec) is dataset for Norwegian grammatical error correction (GEC) created by [Matias Jentoft (2023)](https://www.duo.uio.no/handle/10852/103885).
159
+
160
+ <details>
161
+ <summary>Method (click to expand)</summary>
162
+
163
+ * Evaluation setting: zero-shot and few-shot settings via natural language generation using the greedy decoding strategy.
164
+ * Prompt: ```"Her er eksempler på perfekt korrigering av grammatiske feil:\n\nTekst: {source_text}\nKorreksjon:{target_text}"```
165
+ * Few-shot results show the average scores across 5 repetitions
166
+ * Evaluation script: https://github.com/ltgoslo/norallm/blob/main/initial_evaluation/gec.py
167
+ * Performance metrics: the evaluation metric uses [ERRANT](https://github.com/chrisjbryant/errant/tree/main), which identifies edit-spans and then calculates the F_{0.5} scores between the gold edits and predicted edits.
168
+
169
+ </details>
170
+
171
+ <details open>
172
+ <summary>Results on [the ASK corpus](https://huggingface.co/datasets/ltg/ask-gec) (ERRANT F_{0.5})</summary>
173
+
174
+ |Model|0-shot (F0.5)|1-shot (F0.5)|32-shot (F0.5)|
175
+ |---|---|---|---|
176
+ |NorMistral-7b-warm|**40.8**|41.8|48.5|
177
+ |NorMistral-7b-scratch|22.1|28.8|42.1|
178
+ |NorBLOOM-7b|8.7|24.5|32.0|
179
+ |NB-GPT-J|9.1|28.2|30.6|
180
+ |GPT-Sw3-6.7B|30.5|42.9|**50.6**|
181
+ |GPT-Sw3-6.7B-v2|40.6|**43.4**|49.8|
182
+ |Falcon-7B|10.8|12.4|15.5|
183
+ |Mistral-7B-v0.1|26.0|27.4|30.6|
184
+
185
+ </details>
186
+
187
+
188
 
189
  ### Machine translation
190
 
191
  [Tatoeba](https://huggingface.co/datasets/Helsinki-NLP/tatoeba_mt) [(Tiedemann, 2020)](https://aclanthology.org/2020.wmt-1.139/) is a benchmark for machine translation, which includes hundreds of language pairs. We consider six language pairs (English <-> Bokmål, English <-> Nynorsk, and Bokmål <-> Nynorsk).
192
 
193
  <details>
194
+ <summary>Method (click to expand)</summary>
195
 
196
  * Evaluation setting: zero-shot and few-shot settings via natural language generation using the greedy decoding strategy.
197
  * Prompt: ```"{source_language}: {source_text}\n{target_language}:{target_text}"```, where the ```source_language``` and ```target_language``` are ```Engelsk```, ```Bokmål```, or ```Nynorsk```. Based on [Garcia et al. (2023)](https://arxiv.org/abs/2302.01398).