raalst commited on
Commit
e937a8f
1 Parent(s): ad9f07d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -49
README.md CHANGED
@@ -5,40 +5,46 @@ language:
5
  - nl
6
  ---
7
 
8
- The used dataset raalst/squad_v2_dutch was kindly provided by Henryk
9
- it contains train and validation.
10
- I declared 20% of Train to function as Test
 
 
11
  when using raalst/squad_v2_dutch, be sure to clean up quotes and double quotes in the contexts
12
 
13
- def cleanup(mylist):
14
- for item in mylist:
15
  if '"' in item["context"]:
16
- item["context"] = item["context"].replace('"','\\"')
17
  if "'" in item["context"]:
18
- item["context"] = item["context"].replace("'","\\'")
19
 
20
  The pretrained model was pdelobelle/robbert-v2-dutch-base, a dutch RoBERTa model
21
 
22
  results obtained in training are :
23
 
24
- {'exact': 61.75389109958193,
25
- 'f1': 66.89717170237417,
26
- 'total': 19853,
27
- 'HasAns_exact': 48.967182330322814,
28
- 'HasAns_f1': 58.09796564493008,
29
- 'HasAns_total': 11183,
30
- 'NoAns_exact': 78.24682814302192,
31
- 'NoAns_f1': 78.24682814302192,
32
- 'NoAns_total': 8670,
33
- 'best_exact': 61.75389109958193,
34
- 'best_exact_thresh': 0.0,
35
- 'best_f1': 66.89717170237276,
36
- 'best_f1_thresh': 0.0}
 
 
 
 
37
 
38
  settings (until I figured out how to report them properly):
39
 
40
- DatasetDict({
41
- train: Dataset({
42
  features: ['id', 'title', 'context', 'question', 'answers'],
43
  num_rows: 79412
44
  })
@@ -50,42 +56,43 @@ DatasetDict({
50
  features: ['id', 'title', 'context', 'question', 'answers'],
51
  num_rows: 9669
52
  })
53
- })
54
 
55
- tokenizer = AutoTokenizer.from_pretrained("pdelobelle/robbert-v2-dutch-base")
56
 
57
- from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer
58
 
59
- model = AutoModelForQuestionAnswering.from_pretrained("pdelobelle/robbert-v2-dutch-base")
60
- training_args = TrainingArguments(
61
- output_dir="./qa_model",
62
- evaluation_strategy="epoch",
63
- learning_rate=2e-5,
64
- per_device_train_batch_size=16,
65
- per_device_eval_batch_size=16,
66
- num_train_epochs=3,
67
- weight_decay=0.01,
68
- push_to_hub=False,
69
- )
70
 
71
- trainer = Trainer(
72
  model=model,
73
  args=training_args,
74
  train_dataset=tokenized_squad["train"],
75
  eval_dataset=tokenized_squad["validation"],
76
  tokenizer=tokenizer,
77
  data_collator=data_collator,
78
- )
79
-
80
- trainer.train()
81
- [15198/15198 2:57:03, Epoch 3/3]
82
- Epoch Training Loss Validation Loss
83
- 1 1.380700 1.177431
84
- 2 1.093000 1.052601
85
- 3 0.849700 1.143632
86
-
87
- TrainOutput(global_step=15198, training_loss=1.1917077029499668, metrics={'train_runtime': 10623.9565,
88
- 'train_samples_per_second': 22.886, 'train_steps_per_second': 1.431, 'total_flos': 4.764955396486349e+16,
89
- 'train_loss': 1.1917077029499668, 'epoch': 3.0})
 
90
 
91
  Trained on Ubuntu with 1080Ti
 
5
  - nl
6
  ---
7
 
8
+ The used dataset raalst/squad_v2_dutch was kindly provided by Henryk Borzymowski.
9
+ It is a translated version of SQuAD V2. I converted it from json to jsonl format.
10
+ it contains train and validation splits, no test split.
11
+ I declared 20% of Train to be used as Testset in my finetuning run.
12
+
13
  when using raalst/squad_v2_dutch, be sure to clean up quotes and double quotes in the contexts
14
 
15
+ def cleanup(mylist):
16
+ for item in mylist:
17
  if '"' in item["context"]:
18
+ item["context"] = item["context"].replace('"','\\"')
19
  if "'" in item["context"]:
20
+ item["context"] = item["context"].replace("'","\\'")
21
 
22
  The pretrained model was pdelobelle/robbert-v2-dutch-base, a dutch RoBERTa model
23
 
24
  results obtained in training are :
25
 
26
+ metric = load("evaluate-metric/squad_v2" if squad_v2 else "evaluate-metric/squad")
27
+
28
+ {'exact': 61.75389109958193,
29
+ 'f1': 66.89717170237417,
30
+ 'total': 19853,
31
+ 'HasAns_exact': 48.967182330322814,
32
+ 'HasAns_f1': 58.09796564493008,
33
+ 'HasAns_total': 11183,
34
+ 'NoAns_exact': 78.24682814302192,
35
+ 'NoAns_f1': 78.24682814302192,
36
+ 'NoAns_total': 8670,
37
+ 'best_exact': 61.75389109958193,
38
+ 'best_exact_thresh': 0.0,
39
+ 'best_f1': 66.89717170237276,
40
+ 'best_f1_thresh': 0.0}
41
+
42
+ This seems mediocre to me.
43
 
44
  settings (until I figured out how to report them properly):
45
 
46
+ DatasetDict({
47
+ train: Dataset({
48
  features: ['id', 'title', 'context', 'question', 'answers'],
49
  num_rows: 79412
50
  })
 
56
  features: ['id', 'title', 'context', 'question', 'answers'],
57
  num_rows: 9669
58
  })
59
+ })
60
 
61
+ tokenizer = AutoTokenizer.from_pretrained("pdelobelle/robbert-v2-dutch-base")
62
 
63
+ from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer
64
 
65
+ model = AutoModelForQuestionAnswering.from_pretrained("pdelobelle/robbert-v2-dutch-base")
66
+ training_args = TrainingArguments(
67
+ output_dir="./qa_model",
68
+ evaluation_strategy="epoch",
69
+ learning_rate=2e-5,
70
+ per_device_train_batch_size=16,
71
+ per_device_eval_batch_size=16,
72
+ num_train_epochs=3,
73
+ weight_decay=0.01,
74
+ push_to_hub=False,
75
+ )
76
 
77
+ trainer = Trainer(
78
  model=model,
79
  args=training_args,
80
  train_dataset=tokenized_squad["train"],
81
  eval_dataset=tokenized_squad["validation"],
82
  tokenizer=tokenizer,
83
  data_collator=data_collator,
84
+ )
85
+
86
+ trainer.train()
87
+
88
+ [15198/15198 2:57:03, Epoch 3/3]
89
+ Epoch Training Loss Validation Loss
90
+ 1 1.380700 1.177431
91
+ 2 1.093000 1.052601
92
+ 3 0.849700 1.143632
93
+
94
+ TrainOutput(global_step=15198, training_loss=1.1917077029499668, metrics={'train_runtime': 10623.9565,
95
+ 'train_samples_per_second': 22.886, 'train_steps_per_second': 1.431, 'total_flos': 4.764955396486349e+16,
96
+ 'train_loss': 1.1917077029499668, 'epoch': 3.0})
97
 
98
  Trained on Ubuntu with 1080Ti