raalst
/

RobBERT-v2-nl-qa

@@ -5,40 +5,46 @@ language:
 - nl
 ---
-The used dataset raalst/squad_v2_dutch was kindly provided by Henryk
-it contains train and validation.
-I declared 20% of Train to function as Test
 when using raalst/squad_v2_dutch, be sure to clean up quotes and double quotes in the contexts
-def cleanup(mylist):
-    for item in mylist:
         if '"' in item["context"]:
-            item["context"] = item["context"].replace('"','\\"')
         if "'" in item["context"]:
-            item["context"] = item["context"].replace("'","\\'")
 The pretrained model was pdelobelle/robbert-v2-dutch-base, a dutch RoBERTa model
 results obtained in training are :
-{'exact': 61.75389109958193,
- 'f1': 66.89717170237417,
- 'total': 19853,
- 'HasAns_exact': 48.967182330322814,
- 'HasAns_f1': 58.09796564493008,
- 'HasAns_total': 11183,
- 'NoAns_exact': 78.24682814302192,
- 'NoAns_f1': 78.24682814302192,
- 'NoAns_total': 8670,
- 'best_exact': 61.75389109958193,
- 'best_exact_thresh': 0.0,
- 'best_f1': 66.89717170237276,
- 'best_f1_thresh': 0.0}
 settings (until I figured out how to report them properly):
-DatasetDict({
-    train: Dataset({
         features: ['id', 'title', 'context', 'question', 'answers'],
         num_rows: 79412
     })
@@ -50,42 +56,43 @@ DatasetDict({
         features: ['id', 'title', 'context', 'question', 'answers'],
         num_rows: 9669
     })
-})
-tokenizer = AutoTokenizer.from_pretrained("pdelobelle/robbert-v2-dutch-base")
-from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer
-model = AutoModelForQuestionAnswering.from_pretrained("pdelobelle/robbert-v2-dutch-base")
-training_args = TrainingArguments(
-    output_dir="./qa_model",
-    evaluation_strategy="epoch",
-    learning_rate=2e-5,
-    per_device_train_batch_size=16,
-    per_device_eval_batch_size=16,
-    num_train_epochs=3,
-    weight_decay=0.01,
-    push_to_hub=False,
-)
-trainer = Trainer(
     model=model,
     args=training_args,
     train_dataset=tokenized_squad["train"],
     eval_dataset=tokenized_squad["validation"],
     tokenizer=tokenizer,
     data_collator=data_collator,
-)
-trainer.train()
-[15198/15198 2:57:03, Epoch 3/3]
-Epoch 	Training Loss 	Validation Loss
-1 	1.380700 	1.177431
-2 	1.093000 	1.052601
-3 	0.849700 	1.143632
-TrainOutput(global_step=15198, training_loss=1.1917077029499668, metrics={'train_runtime': 10623.9565,
-'train_samples_per_second': 22.886, 'train_steps_per_second': 1.431, 'total_flos': 4.764955396486349e+16,
-'train_loss': 1.1917077029499668, 'epoch': 3.0})
 Trained on Ubuntu with 1080Ti

 - nl
 ---
+The used dataset raalst/squad_v2_dutch was kindly provided by Henryk Borzymowski.
+It is a translated version of SQuAD V2. I converted it from json to jsonl format.
+it contains train and validation splits, no test split.
+I declared 20% of Train to be used as Testset in my finetuning run.
 when using raalst/squad_v2_dutch, be sure to clean up quotes and double quotes in the contexts
+    def cleanup(mylist):
+      for item in mylist:
         if '"' in item["context"]:
+          item["context"] = item["context"].replace('"','\\"')
         if "'" in item["context"]:
+          item["context"] = item["context"].replace("'","\\'")
 The pretrained model was pdelobelle/robbert-v2-dutch-base, a dutch RoBERTa model
 results obtained in training are :
+    metric = load("evaluate-metric/squad_v2" if squad_v2 else "evaluate-metric/squad")
+    {'exact': 61.75389109958193,
+     'f1': 66.89717170237417,
+     'total': 19853,
+     'HasAns_exact': 48.967182330322814,
+     'HasAns_f1': 58.09796564493008,
+     'HasAns_total': 11183,
+     'NoAns_exact': 78.24682814302192,
+     'NoAns_f1': 78.24682814302192,
+     'NoAns_total': 8670,
+     'best_exact': 61.75389109958193,
+     'best_exact_thresh': 0.0,
+     'best_f1': 66.89717170237276,
+     'best_f1_thresh': 0.0}
+This seems mediocre to me.
 settings (until I figured out how to report them properly):
+    DatasetDict({
+      train: Dataset({
         features: ['id', 'title', 'context', 'question', 'answers'],
         num_rows: 79412
     })
         features: ['id', 'title', 'context', 'question', 'answers'],
         num_rows: 9669
     })
+    })
+    tokenizer = AutoTokenizer.from_pretrained("pdelobelle/robbert-v2-dutch-base")
+    from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer
+    model = AutoModelForQuestionAnswering.from_pretrained("pdelobelle/robbert-v2-dutch-base")
+    training_args = TrainingArguments(
+      output_dir="./qa_model",
+      evaluation_strategy="epoch",
+      learning_rate=2e-5,
+      per_device_train_batch_size=16,
+      per_device_eval_batch_size=16,
+      num_train_epochs=3,
+      weight_decay=0.01,
+      push_to_hub=False,
+    )
+    trainer = Trainer(
     model=model,
     args=training_args,
     train_dataset=tokenized_squad["train"],
     eval_dataset=tokenized_squad["validation"],
     tokenizer=tokenizer,
     data_collator=data_collator,
+    )
+    trainer.train()
+    [15198/15198 2:57:03, Epoch 3/3]
+    Epoch 	Training Loss 	Validation Loss
+    1 	1.380700 	1.177431
+    2 	1.093000 	1.052601
+    3 	0.849700 	1.143632
+    TrainOutput(global_step=15198, training_loss=1.1917077029499668, metrics={'train_runtime': 10623.9565,
+    'train_samples_per_second': 22.886, 'train_steps_per_second': 1.431, 'total_flos': 4.764955396486349e+16,
+    'train_loss': 1.1917077029499668, 'epoch': 3.0})
 Trained on Ubuntu with 1080Ti