Joeran Bosma commited on
Commit
ae6f68f
1 Parent(s): 95305e9

Initial release

Browse files
README.md CHANGED
@@ -1,3 +1,106 @@
1
- ---
2
- license: cc-by-nc-sa-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ ---
4
+
5
+ # DRAGON Longformer large domain-specific
6
+
7
+ Pretrained model on Dutch clinical reports using a masked language modeling (MLM) objective. It was introduced in [this](#pending) paper. The model was pretrained using domain-specific data (i.e., clinical reports) from scratch. The architecture is the same as [`allenai/longformer-large-4096`](https://huggingface.co/allenai/longformer-large-4096) from HuggingFace. The tokenizer was fitted to the dataset of Dutch medical reports, using the same settings for the tokenizer as [`roberta-base`](https://huggingface.co/FacebookAI/roberta-base).
8
+
9
+
10
+
11
+ ## Model description
12
+ Longformer is a transformers model that was pretrained on a large corpus of Dutch clinical reports in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labeling them in any way with an automatic process to generate inputs and labels from those texts.
13
+
14
+ This way, the model learns an inner representation of the Dutch medical language that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled reports, for instance, you can train a standard classifier using the features produced by the BERT model as inputs.
15
+
16
+ ## Model variations
17
+ Multiple architectures were pretrained for the DRAGON challenge.
18
+
19
+ | Model | #params | Language |
20
+ |------------------------|--------------------------------|-------|
21
+ | [`joeranbosma/dragon-bert-base-mixed-domain`](https://huggingface.co/joeranbosma/dragon-bert-base-mixed-domain) | 109M | Dutch → Dutch |
22
+ | [`joeranbosma/dragon-roberta-base-mixed-domain`](https://huggingface.co/joeranbosma/dragon-roberta-base-mixed-domain) | 278M | Multiple → Dutch |
23
+ | [`joeranbosma/dragon-roberta-large-mixed-domain`](https://huggingface.co/joeranbosma/dragon-roberta-large-mixed-domain) | 560M | Multiple → Dutch |
24
+ | [`joeranbosma/dragon-longformer-base-mixed-domain`](https://huggingface.co/joeranbosma/dragon-longformer-base-mixed-domain) | 149M | English → Dutch |
25
+ | [`joeranbosma/dragon-longformer-large-mixed-domain`](https://huggingface.co/joeranbosma/dragon-longformer-large-mixed-domain) | 435M | English → Dutch |
26
+ | [`joeranbosma/dragon-bert-base-domain-specific`](https://huggingface.co/joeranbosma/dragon-bert-base-domain-specific) | 109M | Dutch |
27
+ | [`joeranbosma/dragon-roberta-base-domain-specific`](https://huggingface.co/joeranbosma/dragon-roberta-base-domain-specific) | 278M | Dutch |
28
+ | [`joeranbosma/dragon-roberta-large-domain-specific`](https://huggingface.co/joeranbosma/dragon-roberta-large-domain-specific) | 560M | Dutch |
29
+ | [`joeranbosma/dragon-longformer-base-domain-specific`](https://huggingface.co/joeranbosma/dragon-longformer-base-domain-specific) | 149M | Dutch |
30
+ | [`joeranbosma/dragon-longformer-large-domain-specific`](https://huggingface.co/joeranbosma/dragon-longformer-large-domain-specific) | 435M | Dutch |
31
+
32
+
33
+ ## Intended uses & limitations
34
+ You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task.
35
+
36
+ Note that this model is primarily aimed at being fine-tuned on tasks that use the whole text (e.g., a clinical report) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2.
37
+
38
+ ## How to use
39
+ You can use this model directly with a pipeline for masked language modeling:
40
+
41
+ ```python
42
+ from transformers import pipeline
43
+ unmasker = pipeline("fill-mask", model="joeranbosma/dragon-longformer-large-domain-specific")
44
+ unmasker("Dit onderzoek geen aanwijzingen voor significant carcinoom. PIRADS <mask>.")
45
+ ```
46
+
47
+ Here is how to use this model to get the features of a given text in PyTorch:
48
+
49
+ ```python
50
+ from transformers import AutoTokenizer, AutoModel
51
+ tokenizer = AutoTokenizer.from_pretrained("joeranbosma/dragon-longformer-large-domain-specific")
52
+ model = AutoModel.from_pretrained("joeranbosma/dragon-longformer-large-domain-specific")
53
+ text = "Replace me by any text you'd like."
54
+ encoded_input = tokenizer(text, return_tensors="pt")
55
+ output = model(**encoded_input)
56
+ ```
57
+
58
+ ## Limitations and bias
59
+ Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions. This bias will also affect all fine-tuned versions of this model.
60
+
61
+ ## Training data
62
+ For pretraining, 4,333,201 clinical reports (466,351 consecutive patients) were selected from Ziekenhuisgroep Twente from patients with a diagnostic or interventional visit between 13 July 2000 and 25 April 2023. 180,439 duplicate clinical reports (179,808 patients) were excluded, resulting in 4,152,762 included reports (463,692 patients). These reports were split into training (80%, 3,322,209 reports), validation (10%, 415,276 reports), and testing (10%, 415,277 reports). The testing reports were set aside for future analysis and are not used for pretraining.
63
+
64
+ ## Training procedure
65
+
66
+ ### Pretraining
67
+ The model was pretrained using masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then runs the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally masks the future tokens. It allows the model to learn a bidirectional representation of the sentence.
68
+
69
+ The details of the masking procedure for each sentence are the following:
70
+ - 15% of the tokens are masked.
71
+ - In 80% of the cases, the masked tokens are replaced by `[MASK]`.
72
+ - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace.
73
+ - In the 10% remaining cases, the masked tokens are left as is.
74
+
75
+ The HuggingFace implementation was used for pretraining: [`run_mlm.py`](https://github.com/huggingface/transformers/blob/7c6ec195adbfcd22cb6baeee64dd3c24a4b80c74/examples/pytorch/language-modeling/run_mlm.py).
76
+
77
+ ### Pretraining hyperparameters
78
+
79
+ The following hyperparameters were used during pretraining:
80
+ - `learning_rate`: 1e-4
81
+ - `train_batch_size`: 4
82
+ - `eval_batch_size`: 4
83
+ - `seed`: 42
84
+ - `gradient_accumulation_steps`: 64
85
+ - `total_train_batch_size`: 256
86
+ - `optimizer`: Adam with betas=(0.9,0.999) and epsilon=1e-08
87
+ - `lr_scheduler_type`: linear
88
+ - `num_epochs`: 10.0
89
+ - `max_seq_length`: 4096
90
+
91
+ ### Framework versions
92
+
93
+ - Transformers 4.29.0.dev0
94
+ - Pytorch 2.0.0+cu117
95
+ - Datasets 2.11.0
96
+ - Tokenizers 0.13.3
97
+
98
+ ## Evaluation results
99
+
100
+ Pending evaluation on the DRAGON benchmark.
101
+
102
+ ### BibTeX entry and citation info
103
+
104
+ ```bibtex
105
+ @article{PENDING}
106
+ ```
all_results.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.8085033547768835,
4
+ "eval_loss": 0.829306423664093,
5
+ "eval_runtime": 6124.6567,
6
+ "eval_samples": 103187,
7
+ "eval_samples_per_second": 16.848,
8
+ "eval_steps_per_second": 4.212,
9
+ "perplexity": 2.2917287001260904,
10
+ "train_loss": 1.5035098029663845,
11
+ "train_runtime": 2595975.6547,
12
+ "train_samples": 824387,
13
+ "train_samples_per_second": 3.176,
14
+ "train_steps_per_second": 0.012
15
+ }
config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "allenai/longformer-large-4096",
3
+ "architectures": [
4
+ "LongformerForMaskedLM"
5
+ ],
6
+ "attention_mode": "longformer",
7
+ "attention_probs_dropout_prob": 0.1,
8
+ "attention_window": [
9
+ 512,
10
+ 512,
11
+ 512,
12
+ 512,
13
+ 512,
14
+ 512,
15
+ 512,
16
+ 512,
17
+ 512,
18
+ 512,
19
+ 512,
20
+ 512,
21
+ 512,
22
+ 512,
23
+ 512,
24
+ 512,
25
+ 512,
26
+ 512,
27
+ 512,
28
+ 512,
29
+ 512,
30
+ 512,
31
+ 512,
32
+ 512
33
+ ],
34
+ "bos_token_id": 0,
35
+ "eos_token_id": 2,
36
+ "gradient_checkpointing": false,
37
+ "hidden_act": "gelu",
38
+ "hidden_dropout_prob": 0.1,
39
+ "hidden_size": 1024,
40
+ "ignore_attention_mask": false,
41
+ "initializer_range": 0.02,
42
+ "intermediate_size": 4096,
43
+ "layer_norm_eps": 1e-05,
44
+ "max_position_embeddings": 4098,
45
+ "model_type": "longformer",
46
+ "num_attention_heads": 16,
47
+ "num_hidden_layers": 24,
48
+ "onnx_export": false,
49
+ "pad_token_id": 1,
50
+ "position_embedding_type": "absolute",
51
+ "sep_token_id": 2,
52
+ "torch_dtype": "float32",
53
+ "transformers_version": "4.29.0.dev0",
54
+ "type_vocab_size": 1,
55
+ "vocab_size": 50265
56
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "eval_accuracy": 0.8085033547768835,
4
+ "eval_loss": 0.829306423664093,
5
+ "eval_runtime": 6124.6567,
6
+ "eval_samples": 103187,
7
+ "eval_samples_per_second": 16.848,
8
+ "eval_steps_per_second": 4.212,
9
+ "perplexity": 2.2917287001260904
10
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:29608550946e78eb00ee9da1a41384f7b6d56e4907f6ca0bba645cfdcb6a025d
3
+ size 1738801909
special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": {
6
+ "content": "<mask>",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "</s>",
14
+ "unk_token": "<unk>"
15
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": "<s>",
4
+ "clean_up_tokenization_spaces": true,
5
+ "cls_token": "<s>",
6
+ "eos_token": "</s>",
7
+ "errors": "replace",
8
+ "mask_token": {
9
+ "__type": "AddedToken",
10
+ "content": "<mask>",
11
+ "lstrip": true,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "model_max_length": 512,
17
+ "pad_token": "<pad>",
18
+ "sep_token": "</s>",
19
+ "tokenizer_class": "RobertaTokenizer",
20
+ "trim_offsets": true,
21
+ "unk_token": "<unk>"
22
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.0,
3
+ "train_loss": 1.5035098029663845,
4
+ "train_runtime": 2595975.6547,
5
+ "train_samples": 824387,
6
+ "train_samples_per_second": 3.176,
7
+ "train_steps_per_second": 0.012
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1327 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.8284289836883545,
3
+ "best_model_checkpoint": "/output/longformer-large-4096-scratch-mlm-zgt-radpat/checkpoint-31300",
4
+ "epoch": 9.999175145683829,
5
+ "global_step": 32200,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.1,
12
+ "eval_accuracy": 0.12266339212733007,
13
+ "eval_loss": 6.975634574890137,
14
+ "eval_runtime": 6116.0321,
15
+ "eval_samples_per_second": 16.872,
16
+ "eval_steps_per_second": 4.218,
17
+ "step": 313
18
+ },
19
+ {
20
+ "epoch": 0.16,
21
+ "learning_rate": 1.5527950310559007e-05,
22
+ "loss": 7.8221,
23
+ "step": 500
24
+ },
25
+ {
26
+ "epoch": 0.19,
27
+ "eval_accuracy": 0.15375473317214908,
28
+ "eval_loss": 6.221883773803711,
29
+ "eval_runtime": 6119.4463,
30
+ "eval_samples_per_second": 16.862,
31
+ "eval_steps_per_second": 4.216,
32
+ "step": 626
33
+ },
34
+ {
35
+ "epoch": 0.29,
36
+ "eval_accuracy": 0.16615663705726413,
37
+ "eval_loss": 6.070300102233887,
38
+ "eval_runtime": 6123.864,
39
+ "eval_samples_per_second": 16.85,
40
+ "eval_steps_per_second": 4.213,
41
+ "step": 939
42
+ },
43
+ {
44
+ "epoch": 0.31,
45
+ "learning_rate": 3.1055900621118014e-05,
46
+ "loss": 6.2078,
47
+ "step": 1000
48
+ },
49
+ {
50
+ "epoch": 0.39,
51
+ "eval_accuracy": 0.17121056433657572,
52
+ "eval_loss": 5.859891414642334,
53
+ "eval_runtime": 6127.1504,
54
+ "eval_samples_per_second": 16.841,
55
+ "eval_steps_per_second": 4.21,
56
+ "step": 1252
57
+ },
58
+ {
59
+ "epoch": 0.47,
60
+ "learning_rate": 4.658385093167702e-05,
61
+ "loss": 5.8885,
62
+ "step": 1500
63
+ },
64
+ {
65
+ "epoch": 0.49,
66
+ "eval_accuracy": 0.2018564191205144,
67
+ "eval_loss": 5.480071544647217,
68
+ "eval_runtime": 6125.2208,
69
+ "eval_samples_per_second": 16.846,
70
+ "eval_steps_per_second": 4.212,
71
+ "step": 1565
72
+ },
73
+ {
74
+ "epoch": 0.58,
75
+ "eval_accuracy": 0.28117280447637577,
76
+ "eval_loss": 4.865741729736328,
77
+ "eval_runtime": 6125.2116,
78
+ "eval_samples_per_second": 16.846,
79
+ "eval_steps_per_second": 4.212,
80
+ "step": 1878
81
+ },
82
+ {
83
+ "epoch": 0.62,
84
+ "learning_rate": 6.211180124223603e-05,
85
+ "loss": 5.222,
86
+ "step": 2000
87
+ },
88
+ {
89
+ "epoch": 0.68,
90
+ "eval_accuracy": 0.3429139079332977,
91
+ "eval_loss": 4.355594158172607,
92
+ "eval_runtime": 6130.4382,
93
+ "eval_samples_per_second": 16.832,
94
+ "eval_steps_per_second": 4.208,
95
+ "step": 2191
96
+ },
97
+ {
98
+ "epoch": 0.78,
99
+ "learning_rate": 7.763975155279503e-05,
100
+ "loss": 4.4722,
101
+ "step": 2500
102
+ },
103
+ {
104
+ "epoch": 0.78,
105
+ "eval_accuracy": 0.40118303502111985,
106
+ "eval_loss": 3.8668248653411865,
107
+ "eval_runtime": 6127.9717,
108
+ "eval_samples_per_second": 16.839,
109
+ "eval_steps_per_second": 4.21,
110
+ "step": 2504
111
+ },
112
+ {
113
+ "epoch": 0.87,
114
+ "eval_accuracy": 0.5023100325479124,
115
+ "eval_loss": 3.0883595943450928,
116
+ "eval_runtime": 6125.5473,
117
+ "eval_samples_per_second": 16.845,
118
+ "eval_steps_per_second": 4.211,
119
+ "step": 2817
120
+ },
121
+ {
122
+ "epoch": 0.93,
123
+ "learning_rate": 9.316770186335404e-05,
124
+ "loss": 3.4756,
125
+ "step": 3000
126
+ },
127
+ {
128
+ "epoch": 0.97,
129
+ "eval_accuracy": 0.5704291572279797,
130
+ "eval_loss": 2.500981569290161,
131
+ "eval_runtime": 6130.9045,
132
+ "eval_samples_per_second": 16.831,
133
+ "eval_steps_per_second": 4.208,
134
+ "step": 3130
135
+ },
136
+ {
137
+ "epoch": 1.07,
138
+ "eval_accuracy": 0.6179390492627002,
139
+ "eval_loss": 2.098602771759033,
140
+ "eval_runtime": 6130.0855,
141
+ "eval_samples_per_second": 16.833,
142
+ "eval_steps_per_second": 4.208,
143
+ "step": 3443
144
+ },
145
+ {
146
+ "epoch": 1.09,
147
+ "learning_rate": 9.903381642512077e-05,
148
+ "loss": 2.473,
149
+ "step": 3500
150
+ },
151
+ {
152
+ "epoch": 1.17,
153
+ "eval_accuracy": 0.6461125725768823,
154
+ "eval_loss": 1.8769867420196533,
155
+ "eval_runtime": 6125.6206,
156
+ "eval_samples_per_second": 16.845,
157
+ "eval_steps_per_second": 4.211,
158
+ "step": 3756
159
+ },
160
+ {
161
+ "epoch": 1.24,
162
+ "learning_rate": 9.730848861283644e-05,
163
+ "loss": 1.9842,
164
+ "step": 4000
165
+ },
166
+ {
167
+ "epoch": 1.26,
168
+ "eval_accuracy": 0.6658163018931873,
169
+ "eval_loss": 1.7306807041168213,
170
+ "eval_runtime": 6126.2778,
171
+ "eval_samples_per_second": 16.843,
172
+ "eval_steps_per_second": 4.211,
173
+ "step": 4069
174
+ },
175
+ {
176
+ "epoch": 1.36,
177
+ "eval_accuracy": 0.6793036581603035,
178
+ "eval_loss": 1.6312057971954346,
179
+ "eval_runtime": 6129.7016,
180
+ "eval_samples_per_second": 16.834,
181
+ "eval_steps_per_second": 4.209,
182
+ "step": 4382
183
+ },
184
+ {
185
+ "epoch": 1.4,
186
+ "learning_rate": 9.558316080055211e-05,
187
+ "loss": 1.7588,
188
+ "step": 4500
189
+ },
190
+ {
191
+ "epoch": 1.46,
192
+ "eval_accuracy": 0.6910826606245232,
193
+ "eval_loss": 1.5486171245574951,
194
+ "eval_runtime": 6126.9734,
195
+ "eval_samples_per_second": 16.841,
196
+ "eval_steps_per_second": 4.21,
197
+ "step": 4695
198
+ },
199
+ {
200
+ "epoch": 1.55,
201
+ "learning_rate": 9.385783298826778e-05,
202
+ "loss": 1.6227,
203
+ "step": 5000
204
+ },
205
+ {
206
+ "epoch": 1.56,
207
+ "eval_accuracy": 0.7005288313548309,
208
+ "eval_loss": 1.4852144718170166,
209
+ "eval_runtime": 6125.5835,
210
+ "eval_samples_per_second": 16.845,
211
+ "eval_steps_per_second": 4.211,
212
+ "step": 5008
213
+ },
214
+ {
215
+ "epoch": 1.65,
216
+ "eval_accuracy": 0.7085754746399128,
217
+ "eval_loss": 1.4299465417861938,
218
+ "eval_runtime": 6127.5749,
219
+ "eval_samples_per_second": 16.84,
220
+ "eval_steps_per_second": 4.21,
221
+ "step": 5321
222
+ },
223
+ {
224
+ "epoch": 1.71,
225
+ "learning_rate": 9.213250517598345e-05,
226
+ "loss": 1.5262,
227
+ "step": 5500
228
+ },
229
+ {
230
+ "epoch": 1.75,
231
+ "eval_accuracy": 0.714780717522465,
232
+ "eval_loss": 1.3879172801971436,
233
+ "eval_runtime": 6127.0567,
234
+ "eval_samples_per_second": 16.841,
235
+ "eval_steps_per_second": 4.21,
236
+ "step": 5634
237
+ },
238
+ {
239
+ "epoch": 1.85,
240
+ "eval_accuracy": 0.7206512153561178,
241
+ "eval_loss": 1.3517948389053345,
242
+ "eval_runtime": 6124.3738,
243
+ "eval_samples_per_second": 16.849,
244
+ "eval_steps_per_second": 4.212,
245
+ "step": 5947
246
+ },
247
+ {
248
+ "epoch": 1.86,
249
+ "learning_rate": 9.04071773636991e-05,
250
+ "loss": 1.4504,
251
+ "step": 6000
252
+ },
253
+ {
254
+ "epoch": 1.94,
255
+ "eval_accuracy": 0.7259066376655939,
256
+ "eval_loss": 1.3164656162261963,
257
+ "eval_runtime": 6118.789,
258
+ "eval_samples_per_second": 16.864,
259
+ "eval_steps_per_second": 4.216,
260
+ "step": 6260
261
+ },
262
+ {
263
+ "epoch": 2.02,
264
+ "learning_rate": 8.868184955141477e-05,
265
+ "loss": 1.3953,
266
+ "step": 6500
267
+ },
268
+ {
269
+ "epoch": 2.04,
270
+ "eval_accuracy": 0.730818673639555,
271
+ "eval_loss": 1.285917043685913,
272
+ "eval_runtime": 6120.6477,
273
+ "eval_samples_per_second": 16.859,
274
+ "eval_steps_per_second": 4.215,
275
+ "step": 6573
276
+ },
277
+ {
278
+ "epoch": 2.14,
279
+ "eval_accuracy": 0.734559869460255,
280
+ "eval_loss": 1.2613903284072876,
281
+ "eval_runtime": 6121.6933,
282
+ "eval_samples_per_second": 16.856,
283
+ "eval_steps_per_second": 4.214,
284
+ "step": 6886
285
+ },
286
+ {
287
+ "epoch": 2.17,
288
+ "learning_rate": 8.695652173913044e-05,
289
+ "loss": 1.3444,
290
+ "step": 7000
291
+ },
292
+ {
293
+ "epoch": 2.24,
294
+ "eval_accuracy": 0.7384604131666711,
295
+ "eval_loss": 1.236000895500183,
296
+ "eval_runtime": 6122.4897,
297
+ "eval_samples_per_second": 16.854,
298
+ "eval_steps_per_second": 4.213,
299
+ "step": 7199
300
+ },
301
+ {
302
+ "epoch": 2.33,
303
+ "learning_rate": 8.523119392684611e-05,
304
+ "loss": 1.3047,
305
+ "step": 7500
306
+ },
307
+ {
308
+ "epoch": 2.33,
309
+ "eval_accuracy": 0.7415128788148121,
310
+ "eval_loss": 1.2168104648590088,
311
+ "eval_runtime": 6126.7013,
312
+ "eval_samples_per_second": 16.842,
313
+ "eval_steps_per_second": 4.211,
314
+ "step": 7512
315
+ },
316
+ {
317
+ "epoch": 2.43,
318
+ "eval_accuracy": 0.7450029545595697,
319
+ "eval_loss": 1.1964406967163086,
320
+ "eval_runtime": 6123.0501,
321
+ "eval_samples_per_second": 16.852,
322
+ "eval_steps_per_second": 4.213,
323
+ "step": 7825
324
+ },
325
+ {
326
+ "epoch": 2.48,
327
+ "learning_rate": 8.350586611456177e-05,
328
+ "loss": 1.2713,
329
+ "step": 8000
330
+ },
331
+ {
332
+ "epoch": 2.53,
333
+ "eval_accuracy": 0.7467751766581295,
334
+ "eval_loss": 1.1841331720352173,
335
+ "eval_runtime": 6122.3999,
336
+ "eval_samples_per_second": 16.854,
337
+ "eval_steps_per_second": 4.214,
338
+ "step": 8138
339
+ },
340
+ {
341
+ "epoch": 2.62,
342
+ "eval_accuracy": 0.750416850580808,
343
+ "eval_loss": 1.1633927822113037,
344
+ "eval_runtime": 6127.4486,
345
+ "eval_samples_per_second": 16.84,
346
+ "eval_steps_per_second": 4.21,
347
+ "step": 8451
348
+ },
349
+ {
350
+ "epoch": 2.64,
351
+ "learning_rate": 8.178053830227743e-05,
352
+ "loss": 1.2431,
353
+ "step": 8500
354
+ },
355
+ {
356
+ "epoch": 2.72,
357
+ "eval_accuracy": 0.7527193981891372,
358
+ "eval_loss": 1.146986722946167,
359
+ "eval_runtime": 6131.9044,
360
+ "eval_samples_per_second": 16.828,
361
+ "eval_steps_per_second": 4.207,
362
+ "step": 8764
363
+ },
364
+ {
365
+ "epoch": 2.79,
366
+ "learning_rate": 8.00552104899931e-05,
367
+ "loss": 1.2164,
368
+ "step": 9000
369
+ },
370
+ {
371
+ "epoch": 2.82,
372
+ "eval_accuracy": 0.7551538736391906,
373
+ "eval_loss": 1.132608413696289,
374
+ "eval_runtime": 6121.9035,
375
+ "eval_samples_per_second": 16.855,
376
+ "eval_steps_per_second": 4.214,
377
+ "step": 9077
378
+ },
379
+ {
380
+ "epoch": 2.92,
381
+ "eval_accuracy": 0.7571211907517355,
382
+ "eval_loss": 1.1203465461730957,
383
+ "eval_runtime": 6121.527,
384
+ "eval_samples_per_second": 16.856,
385
+ "eval_steps_per_second": 4.214,
386
+ "step": 9390
387
+ },
388
+ {
389
+ "epoch": 2.95,
390
+ "learning_rate": 7.832988267770877e-05,
391
+ "loss": 1.1951,
392
+ "step": 9500
393
+ },
394
+ {
395
+ "epoch": 3.01,
396
+ "eval_accuracy": 0.7589963980672606,
397
+ "eval_loss": 1.1114239692687988,
398
+ "eval_runtime": 6126.4612,
399
+ "eval_samples_per_second": 16.843,
400
+ "eval_steps_per_second": 4.211,
401
+ "step": 9703
402
+ },
403
+ {
404
+ "epoch": 3.11,
405
+ "learning_rate": 7.660455486542444e-05,
406
+ "loss": 1.1705,
407
+ "step": 10000
408
+ },
409
+ {
410
+ "epoch": 3.11,
411
+ "eval_accuracy": 0.7612426818924412,
412
+ "eval_loss": 1.0974253416061401,
413
+ "eval_runtime": 6122.547,
414
+ "eval_samples_per_second": 16.854,
415
+ "eval_steps_per_second": 4.213,
416
+ "step": 10016
417
+ },
418
+ {
419
+ "epoch": 3.21,
420
+ "eval_accuracy": 0.7631302412738202,
421
+ "eval_loss": 1.0867012739181519,
422
+ "eval_runtime": 6126.2709,
423
+ "eval_samples_per_second": 16.843,
424
+ "eval_steps_per_second": 4.211,
425
+ "step": 10329
426
+ },
427
+ {
428
+ "epoch": 3.26,
429
+ "learning_rate": 7.48792270531401e-05,
430
+ "loss": 1.1516,
431
+ "step": 10500
432
+ },
433
+ {
434
+ "epoch": 3.3,
435
+ "eval_accuracy": 0.7646139267496522,
436
+ "eval_loss": 1.0770790576934814,
437
+ "eval_runtime": 6130.429,
438
+ "eval_samples_per_second": 16.832,
439
+ "eval_steps_per_second": 4.208,
440
+ "step": 10642
441
+ },
442
+ {
443
+ "epoch": 3.4,
444
+ "eval_accuracy": 0.7660438596581639,
445
+ "eval_loss": 1.0668072700500488,
446
+ "eval_runtime": 6129.5434,
447
+ "eval_samples_per_second": 16.834,
448
+ "eval_steps_per_second": 4.209,
449
+ "step": 10955
450
+ },
451
+ {
452
+ "epoch": 3.42,
453
+ "learning_rate": 7.315389924085577e-05,
454
+ "loss": 1.1345,
455
+ "step": 11000
456
+ },
457
+ {
458
+ "epoch": 3.5,
459
+ "eval_accuracy": 0.7675726293257004,
460
+ "eval_loss": 1.05952787399292,
461
+ "eval_runtime": 6126.4998,
462
+ "eval_samples_per_second": 16.843,
463
+ "eval_steps_per_second": 4.211,
464
+ "step": 11268
465
+ },
466
+ {
467
+ "epoch": 3.57,
468
+ "learning_rate": 7.142857142857143e-05,
469
+ "loss": 1.1192,
470
+ "step": 11500
471
+ },
472
+ {
473
+ "epoch": 3.6,
474
+ "eval_accuracy": 0.7694602055551931,
475
+ "eval_loss": 1.0479472875595093,
476
+ "eval_runtime": 6127.4827,
477
+ "eval_samples_per_second": 16.84,
478
+ "eval_steps_per_second": 4.21,
479
+ "step": 11581
480
+ },
481
+ {
482
+ "epoch": 3.69,
483
+ "eval_accuracy": 0.7707531140981431,
484
+ "eval_loss": 1.0423223972320557,
485
+ "eval_runtime": 6131.6585,
486
+ "eval_samples_per_second": 16.829,
487
+ "eval_steps_per_second": 4.207,
488
+ "step": 11894
489
+ },
490
+ {
491
+ "epoch": 3.73,
492
+ "learning_rate": 6.970324361628709e-05,
493
+ "loss": 1.106,
494
+ "step": 12000
495
+ },
496
+ {
497
+ "epoch": 3.79,
498
+ "eval_accuracy": 0.7719773558500885,
499
+ "eval_loss": 1.0328373908996582,
500
+ "eval_runtime": 6128.1273,
501
+ "eval_samples_per_second": 16.838,
502
+ "eval_steps_per_second": 4.21,
503
+ "step": 12207
504
+ },
505
+ {
506
+ "epoch": 3.88,
507
+ "learning_rate": 6.797791580400277e-05,
508
+ "loss": 1.0916,
509
+ "step": 12500
510
+ },
511
+ {
512
+ "epoch": 3.89,
513
+ "eval_accuracy": 0.7731614368018522,
514
+ "eval_loss": 1.0272808074951172,
515
+ "eval_runtime": 6120.3326,
516
+ "eval_samples_per_second": 16.86,
517
+ "eval_steps_per_second": 4.215,
518
+ "step": 12520
519
+ },
520
+ {
521
+ "epoch": 3.99,
522
+ "eval_accuracy": 0.7742511503011699,
523
+ "eval_loss": 1.0189120769500732,
524
+ "eval_runtime": 6131.1757,
525
+ "eval_samples_per_second": 16.83,
526
+ "eval_steps_per_second": 4.208,
527
+ "step": 12833
528
+ },
529
+ {
530
+ "epoch": 4.04,
531
+ "learning_rate": 6.625258799171843e-05,
532
+ "loss": 1.0789,
533
+ "step": 13000
534
+ },
535
+ {
536
+ "epoch": 4.08,
537
+ "eval_accuracy": 0.7757384860054987,
538
+ "eval_loss": 1.0113306045532227,
539
+ "eval_runtime": 6133.3354,
540
+ "eval_samples_per_second": 16.824,
541
+ "eval_steps_per_second": 4.206,
542
+ "step": 13146
543
+ },
544
+ {
545
+ "epoch": 4.18,
546
+ "eval_accuracy": 0.776816006797112,
547
+ "eval_loss": 1.0058414936065674,
548
+ "eval_runtime": 6130.4902,
549
+ "eval_samples_per_second": 16.832,
550
+ "eval_steps_per_second": 4.208,
551
+ "step": 13459
552
+ },
553
+ {
554
+ "epoch": 4.19,
555
+ "learning_rate": 6.45272601794341e-05,
556
+ "loss": 1.0631,
557
+ "step": 13500
558
+ },
559
+ {
560
+ "epoch": 4.28,
561
+ "eval_accuracy": 0.7777869709950421,
562
+ "eval_loss": 1.000064730644226,
563
+ "eval_runtime": 6129.8863,
564
+ "eval_samples_per_second": 16.833,
565
+ "eval_steps_per_second": 4.208,
566
+ "step": 13772
567
+ },
568
+ {
569
+ "epoch": 4.35,
570
+ "learning_rate": 6.280193236714976e-05,
571
+ "loss": 1.0557,
572
+ "step": 14000
573
+ },
574
+ {
575
+ "epoch": 4.37,
576
+ "eval_accuracy": 0.778843659908514,
577
+ "eval_loss": 0.993532121181488,
578
+ "eval_runtime": 6126.5895,
579
+ "eval_samples_per_second": 16.842,
580
+ "eval_steps_per_second": 4.211,
581
+ "step": 14085
582
+ },
583
+ {
584
+ "epoch": 4.47,
585
+ "eval_accuracy": 0.7797456195039035,
586
+ "eval_loss": 0.9887062311172485,
587
+ "eval_runtime": 6127.2121,
588
+ "eval_samples_per_second": 16.841,
589
+ "eval_steps_per_second": 4.21,
590
+ "step": 14398
591
+ },
592
+ {
593
+ "epoch": 4.5,
594
+ "learning_rate": 6.107660455486542e-05,
595
+ "loss": 1.0438,
596
+ "step": 14500
597
+ },
598
+ {
599
+ "epoch": 4.57,
600
+ "eval_accuracy": 0.7807731355140578,
601
+ "eval_loss": 0.9825865030288696,
602
+ "eval_runtime": 6126.4985,
603
+ "eval_samples_per_second": 16.843,
604
+ "eval_steps_per_second": 4.211,
605
+ "step": 14711
606
+ },
607
+ {
608
+ "epoch": 4.66,
609
+ "learning_rate": 5.9351276742581096e-05,
610
+ "loss": 1.0361,
611
+ "step": 15000
612
+ },
613
+ {
614
+ "epoch": 4.67,
615
+ "eval_accuracy": 0.7818996676870377,
616
+ "eval_loss": 0.9763655662536621,
617
+ "eval_runtime": 6127.4496,
618
+ "eval_samples_per_second": 16.84,
619
+ "eval_steps_per_second": 4.21,
620
+ "step": 15024
621
+ },
622
+ {
623
+ "epoch": 4.76,
624
+ "eval_accuracy": 0.782940768919716,
625
+ "eval_loss": 0.9697893857955933,
626
+ "eval_runtime": 6126.6806,
627
+ "eval_samples_per_second": 16.842,
628
+ "eval_steps_per_second": 4.211,
629
+ "step": 15337
630
+ },
631
+ {
632
+ "epoch": 4.81,
633
+ "learning_rate": 5.762594893029676e-05,
634
+ "loss": 1.0264,
635
+ "step": 15500
636
+ },
637
+ {
638
+ "epoch": 4.86,
639
+ "eval_accuracy": 0.7841247808628483,
640
+ "eval_loss": 0.9644368290901184,
641
+ "eval_runtime": 6128.2176,
642
+ "eval_samples_per_second": 16.838,
643
+ "eval_steps_per_second": 4.21,
644
+ "step": 15650
645
+ },
646
+ {
647
+ "epoch": 4.96,
648
+ "eval_accuracy": 0.7846301810721098,
649
+ "eval_loss": 0.9614962339401245,
650
+ "eval_runtime": 6132.4257,
651
+ "eval_samples_per_second": 16.826,
652
+ "eval_steps_per_second": 4.207,
653
+ "step": 15963
654
+ },
655
+ {
656
+ "epoch": 4.97,
657
+ "learning_rate": 5.590062111801242e-05,
658
+ "loss": 1.0176,
659
+ "step": 16000
660
+ },
661
+ {
662
+ "epoch": 5.05,
663
+ "eval_accuracy": 0.7858738693048405,
664
+ "eval_loss": 0.9536014795303345,
665
+ "eval_runtime": 6134.6669,
666
+ "eval_samples_per_second": 16.82,
667
+ "eval_steps_per_second": 4.205,
668
+ "step": 16276
669
+ },
670
+ {
671
+ "epoch": 5.12,
672
+ "learning_rate": 5.417529330572809e-05,
673
+ "loss": 1.007,
674
+ "step": 16500
675
+ },
676
+ {
677
+ "epoch": 5.15,
678
+ "eval_accuracy": 0.7867571419814423,
679
+ "eval_loss": 0.9484899044036865,
680
+ "eval_runtime": 6130.582,
681
+ "eval_samples_per_second": 16.832,
682
+ "eval_steps_per_second": 4.208,
683
+ "step": 16589
684
+ },
685
+ {
686
+ "epoch": 5.25,
687
+ "eval_accuracy": 0.7867586112749555,
688
+ "eval_loss": 0.9482876658439636,
689
+ "eval_runtime": 6124.9513,
690
+ "eval_samples_per_second": 16.847,
691
+ "eval_steps_per_second": 4.212,
692
+ "step": 16902
693
+ },
694
+ {
695
+ "epoch": 5.28,
696
+ "learning_rate": 5.244996549344375e-05,
697
+ "loss": 0.9965,
698
+ "step": 17000
699
+ },
700
+ {
701
+ "epoch": 5.35,
702
+ "eval_accuracy": 0.7880718537015102,
703
+ "eval_loss": 0.9402521848678589,
704
+ "eval_runtime": 6133.8455,
705
+ "eval_samples_per_second": 16.823,
706
+ "eval_steps_per_second": 4.206,
707
+ "step": 17215
708
+ },
709
+ {
710
+ "epoch": 5.43,
711
+ "learning_rate": 5.072463768115943e-05,
712
+ "loss": 0.9911,
713
+ "step": 17500
714
+ },
715
+ {
716
+ "epoch": 5.44,
717
+ "eval_accuracy": 0.7888353320614213,
718
+ "eval_loss": 0.9360187649726868,
719
+ "eval_runtime": 6131.2854,
720
+ "eval_samples_per_second": 16.83,
721
+ "eval_steps_per_second": 4.207,
722
+ "step": 17528
723
+ },
724
+ {
725
+ "epoch": 5.54,
726
+ "eval_accuracy": 0.7896846598862644,
727
+ "eval_loss": 0.9315310120582581,
728
+ "eval_runtime": 6130.221,
729
+ "eval_samples_per_second": 16.833,
730
+ "eval_steps_per_second": 4.208,
731
+ "step": 17841
732
+ },
733
+ {
734
+ "epoch": 5.59,
735
+ "learning_rate": 4.899930986887509e-05,
736
+ "loss": 0.9861,
737
+ "step": 18000
738
+ },
739
+ {
740
+ "epoch": 5.64,
741
+ "eval_accuracy": 0.7902251551194575,
742
+ "eval_loss": 0.9286208152770996,
743
+ "eval_runtime": 6135.9888,
744
+ "eval_samples_per_second": 16.817,
745
+ "eval_steps_per_second": 4.204,
746
+ "step": 18154
747
+ },
748
+ {
749
+ "epoch": 5.73,
750
+ "eval_accuracy": 0.7910160835517881,
751
+ "eval_loss": 0.9242651462554932,
752
+ "eval_runtime": 6134.4232,
753
+ "eval_samples_per_second": 16.821,
754
+ "eval_steps_per_second": 4.205,
755
+ "step": 18467
756
+ },
757
+ {
758
+ "epoch": 5.74,
759
+ "learning_rate": 4.727398205659075e-05,
760
+ "loss": 0.9787,
761
+ "step": 18500
762
+ },
763
+ {
764
+ "epoch": 5.83,
765
+ "eval_accuracy": 0.7916774902149969,
766
+ "eval_loss": 0.9199575185775757,
767
+ "eval_runtime": 6127.6258,
768
+ "eval_samples_per_second": 16.84,
769
+ "eval_steps_per_second": 4.21,
770
+ "step": 18780
771
+ },
772
+ {
773
+ "epoch": 5.9,
774
+ "learning_rate": 4.554865424430642e-05,
775
+ "loss": 0.972,
776
+ "step": 19000
777
+ },
778
+ {
779
+ "epoch": 5.93,
780
+ "eval_accuracy": 0.7921690081239334,
781
+ "eval_loss": 0.9167630076408386,
782
+ "eval_runtime": 6121.9416,
783
+ "eval_samples_per_second": 16.855,
784
+ "eval_steps_per_second": 4.214,
785
+ "step": 19093
786
+ },
787
+ {
788
+ "epoch": 6.03,
789
+ "eval_accuracy": 0.7929045391491827,
790
+ "eval_loss": 0.9131466150283813,
791
+ "eval_runtime": 6136.433,
792
+ "eval_samples_per_second": 16.815,
793
+ "eval_steps_per_second": 4.204,
794
+ "step": 19406
795
+ },
796
+ {
797
+ "epoch": 6.06,
798
+ "learning_rate": 4.382332643202209e-05,
799
+ "loss": 0.9642,
800
+ "step": 19500
801
+ },
802
+ {
803
+ "epoch": 6.12,
804
+ "eval_accuracy": 0.7933599893983608,
805
+ "eval_loss": 0.9112694263458252,
806
+ "eval_runtime": 6128.5893,
807
+ "eval_samples_per_second": 16.837,
808
+ "eval_steps_per_second": 4.209,
809
+ "step": 19719
810
+ },
811
+ {
812
+ "epoch": 6.21,
813
+ "learning_rate": 4.209799861973775e-05,
814
+ "loss": 0.9576,
815
+ "step": 20000
816
+ },
817
+ {
818
+ "epoch": 6.22,
819
+ "eval_accuracy": 0.7940601199523715,
820
+ "eval_loss": 0.9060889482498169,
821
+ "eval_runtime": 6120.6148,
822
+ "eval_samples_per_second": 16.859,
823
+ "eval_steps_per_second": 4.215,
824
+ "step": 20032
825
+ },
826
+ {
827
+ "epoch": 6.32,
828
+ "eval_accuracy": 0.7948685797545274,
829
+ "eval_loss": 0.9030121564865112,
830
+ "eval_runtime": 6124.1894,
831
+ "eval_samples_per_second": 16.849,
832
+ "eval_steps_per_second": 4.212,
833
+ "step": 20345
834
+ },
835
+ {
836
+ "epoch": 6.37,
837
+ "learning_rate": 4.0372670807453414e-05,
838
+ "loss": 0.9514,
839
+ "step": 20500
840
+ },
841
+ {
842
+ "epoch": 6.41,
843
+ "eval_accuracy": 0.7954765058682228,
844
+ "eval_loss": 0.8997820615768433,
845
+ "eval_runtime": 6126.3307,
846
+ "eval_samples_per_second": 16.843,
847
+ "eval_steps_per_second": 4.211,
848
+ "step": 20658
849
+ },
850
+ {
851
+ "epoch": 6.51,
852
+ "eval_accuracy": 0.7961196847197146,
853
+ "eval_loss": 0.8957119584083557,
854
+ "eval_runtime": 6121.3143,
855
+ "eval_samples_per_second": 16.857,
856
+ "eval_steps_per_second": 4.214,
857
+ "step": 20971
858
+ },
859
+ {
860
+ "epoch": 6.52,
861
+ "learning_rate": 3.864734299516908e-05,
862
+ "loss": 0.9457,
863
+ "step": 21000
864
+ },
865
+ {
866
+ "epoch": 6.61,
867
+ "eval_accuracy": 0.7966353338873807,
868
+ "eval_loss": 0.8925579190254211,
869
+ "eval_runtime": 6121.7054,
870
+ "eval_samples_per_second": 16.856,
871
+ "eval_steps_per_second": 4.214,
872
+ "step": 21284
873
+ },
874
+ {
875
+ "epoch": 6.68,
876
+ "learning_rate": 3.692201518288475e-05,
877
+ "loss": 0.9411,
878
+ "step": 21500
879
+ },
880
+ {
881
+ "epoch": 6.71,
882
+ "eval_accuracy": 0.7968278874690401,
883
+ "eval_loss": 0.8926752805709839,
884
+ "eval_runtime": 6123.2773,
885
+ "eval_samples_per_second": 16.852,
886
+ "eval_steps_per_second": 4.213,
887
+ "step": 21597
888
+ },
889
+ {
890
+ "epoch": 6.8,
891
+ "eval_accuracy": 0.7974544355055755,
892
+ "eval_loss": 0.8880347609519958,
893
+ "eval_runtime": 6121.4872,
894
+ "eval_samples_per_second": 16.857,
895
+ "eval_steps_per_second": 4.214,
896
+ "step": 21910
897
+ },
898
+ {
899
+ "epoch": 6.83,
900
+ "learning_rate": 3.519668737060042e-05,
901
+ "loss": 0.9349,
902
+ "step": 22000
903
+ },
904
+ {
905
+ "epoch": 6.9,
906
+ "eval_accuracy": 0.7982437294026129,
907
+ "eval_loss": 0.8834199905395508,
908
+ "eval_runtime": 6123.2699,
909
+ "eval_samples_per_second": 16.852,
910
+ "eval_steps_per_second": 4.213,
911
+ "step": 22223
912
+ },
913
+ {
914
+ "epoch": 6.99,
915
+ "learning_rate": 3.347135955831608e-05,
916
+ "loss": 0.9319,
917
+ "step": 22500
918
+ },
919
+ {
920
+ "epoch": 7.0,
921
+ "eval_accuracy": 0.7990805845521158,
922
+ "eval_loss": 0.8799129724502563,
923
+ "eval_runtime": 6120.2145,
924
+ "eval_samples_per_second": 16.86,
925
+ "eval_steps_per_second": 4.215,
926
+ "step": 22536
927
+ },
928
+ {
929
+ "epoch": 7.1,
930
+ "eval_accuracy": 0.7991272231482186,
931
+ "eval_loss": 0.879518449306488,
932
+ "eval_runtime": 6125.0222,
933
+ "eval_samples_per_second": 16.847,
934
+ "eval_steps_per_second": 4.212,
935
+ "step": 22849
936
+ },
937
+ {
938
+ "epoch": 7.14,
939
+ "learning_rate": 3.1746031746031745e-05,
940
+ "loss": 0.9235,
941
+ "step": 23000
942
+ },
943
+ {
944
+ "epoch": 7.19,
945
+ "eval_accuracy": 0.7999484030167242,
946
+ "eval_loss": 0.8756560683250427,
947
+ "eval_runtime": 6127.211,
948
+ "eval_samples_per_second": 16.841,
949
+ "eval_steps_per_second": 4.21,
950
+ "step": 23162
951
+ },
952
+ {
953
+ "epoch": 7.29,
954
+ "eval_accuracy": 0.8001440250718516,
955
+ "eval_loss": 0.8739376068115234,
956
+ "eval_runtime": 6134.261,
957
+ "eval_samples_per_second": 16.821,
958
+ "eval_steps_per_second": 4.205,
959
+ "step": 23475
960
+ },
961
+ {
962
+ "epoch": 7.3,
963
+ "learning_rate": 3.0020703933747414e-05,
964
+ "loss": 0.9198,
965
+ "step": 23500
966
+ },
967
+ {
968
+ "epoch": 7.39,
969
+ "eval_accuracy": 0.8010690824018636,
970
+ "eval_loss": 0.8693613409996033,
971
+ "eval_runtime": 6132.2846,
972
+ "eval_samples_per_second": 16.827,
973
+ "eval_steps_per_second": 4.207,
974
+ "step": 23788
975
+ },
976
+ {
977
+ "epoch": 7.45,
978
+ "learning_rate": 2.829537612146308e-05,
979
+ "loss": 0.9158,
980
+ "step": 24000
981
+ },
982
+ {
983
+ "epoch": 7.48,
984
+ "eval_accuracy": 0.8011952977602468,
985
+ "eval_loss": 0.8689371943473816,
986
+ "eval_runtime": 6129.2095,
987
+ "eval_samples_per_second": 16.835,
988
+ "eval_steps_per_second": 4.209,
989
+ "step": 24101
990
+ },
991
+ {
992
+ "epoch": 7.58,
993
+ "eval_accuracy": 0.8017360324487328,
994
+ "eval_loss": 0.8663704991340637,
995
+ "eval_runtime": 6128.5565,
996
+ "eval_samples_per_second": 16.837,
997
+ "eval_steps_per_second": 4.209,
998
+ "step": 24414
999
+ },
1000
+ {
1001
+ "epoch": 7.61,
1002
+ "learning_rate": 2.6570048309178748e-05,
1003
+ "loss": 0.9125,
1004
+ "step": 24500
1005
+ },
1006
+ {
1007
+ "epoch": 7.68,
1008
+ "eval_accuracy": 0.8020007406811046,
1009
+ "eval_loss": 0.8649431467056274,
1010
+ "eval_runtime": 6132.8666,
1011
+ "eval_samples_per_second": 16.825,
1012
+ "eval_steps_per_second": 4.206,
1013
+ "step": 24727
1014
+ },
1015
+ {
1016
+ "epoch": 7.76,
1017
+ "learning_rate": 2.484472049689441e-05,
1018
+ "loss": 0.9099,
1019
+ "step": 25000
1020
+ },
1021
+ {
1022
+ "epoch": 7.78,
1023
+ "eval_accuracy": 0.8026024276561983,
1024
+ "eval_loss": 0.8605436086654663,
1025
+ "eval_runtime": 6126.7586,
1026
+ "eval_samples_per_second": 16.842,
1027
+ "eval_steps_per_second": 4.211,
1028
+ "step": 25040
1029
+ },
1030
+ {
1031
+ "epoch": 7.87,
1032
+ "eval_accuracy": 0.80301129412462,
1033
+ "eval_loss": 0.8582573533058167,
1034
+ "eval_runtime": 6127.3341,
1035
+ "eval_samples_per_second": 16.84,
1036
+ "eval_steps_per_second": 4.21,
1037
+ "step": 25353
1038
+ },
1039
+ {
1040
+ "epoch": 7.92,
1041
+ "learning_rate": 2.311939268461008e-05,
1042
+ "loss": 0.9054,
1043
+ "step": 25500
1044
+ },
1045
+ {
1046
+ "epoch": 7.97,
1047
+ "eval_accuracy": 0.8034071794966846,
1048
+ "eval_loss": 0.8573377132415771,
1049
+ "eval_runtime": 6131.9465,
1050
+ "eval_samples_per_second": 16.828,
1051
+ "eval_steps_per_second": 4.207,
1052
+ "step": 25666
1053
+ },
1054
+ {
1055
+ "epoch": 8.07,
1056
+ "eval_accuracy": 0.8038572222331624,
1057
+ "eval_loss": 0.8544816374778748,
1058
+ "eval_runtime": 6128.9922,
1059
+ "eval_samples_per_second": 16.836,
1060
+ "eval_steps_per_second": 4.209,
1061
+ "step": 25979
1062
+ },
1063
+ {
1064
+ "epoch": 8.07,
1065
+ "learning_rate": 2.139406487232574e-05,
1066
+ "loss": 0.8998,
1067
+ "step": 26000
1068
+ },
1069
+ {
1070
+ "epoch": 8.16,
1071
+ "eval_accuracy": 0.8044058818938022,
1072
+ "eval_loss": 0.8519273400306702,
1073
+ "eval_runtime": 6124.6473,
1074
+ "eval_samples_per_second": 16.848,
1075
+ "eval_steps_per_second": 4.212,
1076
+ "step": 26292
1077
+ },
1078
+ {
1079
+ "epoch": 8.23,
1080
+ "learning_rate": 1.966873706004141e-05,
1081
+ "loss": 0.8939,
1082
+ "step": 26500
1083
+ },
1084
+ {
1085
+ "epoch": 8.26,
1086
+ "eval_accuracy": 0.8044216416179728,
1087
+ "eval_loss": 0.8512473702430725,
1088
+ "eval_runtime": 6126.8526,
1089
+ "eval_samples_per_second": 16.842,
1090
+ "eval_steps_per_second": 4.21,
1091
+ "step": 26605
1092
+ },
1093
+ {
1094
+ "epoch": 8.36,
1095
+ "eval_accuracy": 0.804752442721678,
1096
+ "eval_loss": 0.8492391705513,
1097
+ "eval_runtime": 6127.2647,
1098
+ "eval_samples_per_second": 16.841,
1099
+ "eval_steps_per_second": 4.21,
1100
+ "step": 26918
1101
+ },
1102
+ {
1103
+ "epoch": 8.38,
1104
+ "learning_rate": 1.7943409247757076e-05,
1105
+ "loss": 0.8942,
1106
+ "step": 27000
1107
+ },
1108
+ {
1109
+ "epoch": 8.46,
1110
+ "eval_accuracy": 0.8051816524786768,
1111
+ "eval_loss": 0.8468219637870789,
1112
+ "eval_runtime": 6124.9306,
1113
+ "eval_samples_per_second": 16.847,
1114
+ "eval_steps_per_second": 4.212,
1115
+ "step": 27231
1116
+ },
1117
+ {
1118
+ "epoch": 8.54,
1119
+ "learning_rate": 1.621808143547274e-05,
1120
+ "loss": 0.8904,
1121
+ "step": 27500
1122
+ },
1123
+ {
1124
+ "epoch": 8.55,
1125
+ "eval_accuracy": 0.8055019757141467,
1126
+ "eval_loss": 0.8458420634269714,
1127
+ "eval_runtime": 6124.8245,
1128
+ "eval_samples_per_second": 16.847,
1129
+ "eval_steps_per_second": 4.212,
1130
+ "step": 27544
1131
+ },
1132
+ {
1133
+ "epoch": 8.65,
1134
+ "eval_accuracy": 0.8057308816675628,
1135
+ "eval_loss": 0.8443206548690796,
1136
+ "eval_runtime": 6129.9291,
1137
+ "eval_samples_per_second": 16.833,
1138
+ "eval_steps_per_second": 4.208,
1139
+ "step": 27857
1140
+ },
1141
+ {
1142
+ "epoch": 8.69,
1143
+ "learning_rate": 1.4492753623188407e-05,
1144
+ "loss": 0.8862,
1145
+ "step": 28000
1146
+ },
1147
+ {
1148
+ "epoch": 8.75,
1149
+ "eval_accuracy": 0.805897348183967,
1150
+ "eval_loss": 0.843222439289093,
1151
+ "eval_runtime": 6128.1919,
1152
+ "eval_samples_per_second": 16.838,
1153
+ "eval_steps_per_second": 4.21,
1154
+ "step": 28170
1155
+ },
1156
+ {
1157
+ "epoch": 8.84,
1158
+ "eval_accuracy": 0.8064984673341041,
1159
+ "eval_loss": 0.84042888879776,
1160
+ "eval_runtime": 6116.6369,
1161
+ "eval_samples_per_second": 16.87,
1162
+ "eval_steps_per_second": 4.218,
1163
+ "step": 28483
1164
+ },
1165
+ {
1166
+ "epoch": 8.85,
1167
+ "learning_rate": 1.276742581090407e-05,
1168
+ "loss": 0.8842,
1169
+ "step": 28500
1170
+ },
1171
+ {
1172
+ "epoch": 8.94,
1173
+ "eval_accuracy": 0.806853518328651,
1174
+ "eval_loss": 0.8381487727165222,
1175
+ "eval_runtime": 6118.9718,
1176
+ "eval_samples_per_second": 16.863,
1177
+ "eval_steps_per_second": 4.216,
1178
+ "step": 28796
1179
+ },
1180
+ {
1181
+ "epoch": 9.01,
1182
+ "learning_rate": 1.1042097998619738e-05,
1183
+ "loss": 0.8812,
1184
+ "step": 29000
1185
+ },
1186
+ {
1187
+ "epoch": 9.04,
1188
+ "eval_accuracy": 0.8070338579198731,
1189
+ "eval_loss": 0.8374488353729248,
1190
+ "eval_runtime": 6118.7308,
1191
+ "eval_samples_per_second": 16.864,
1192
+ "eval_steps_per_second": 4.216,
1193
+ "step": 29109
1194
+ },
1195
+ {
1196
+ "epoch": 9.14,
1197
+ "eval_accuracy": 0.8068436046687713,
1198
+ "eval_loss": 0.8375363945960999,
1199
+ "eval_runtime": 6128.5918,
1200
+ "eval_samples_per_second": 16.837,
1201
+ "eval_steps_per_second": 4.209,
1202
+ "step": 29422
1203
+ },
1204
+ {
1205
+ "epoch": 9.16,
1206
+ "learning_rate": 9.316770186335403e-06,
1207
+ "loss": 0.8774,
1208
+ "step": 29500
1209
+ },
1210
+ {
1211
+ "epoch": 9.23,
1212
+ "eval_accuracy": 0.8077565106716271,
1213
+ "eval_loss": 0.8336867094039917,
1214
+ "eval_runtime": 6119.8095,
1215
+ "eval_samples_per_second": 16.861,
1216
+ "eval_steps_per_second": 4.215,
1217
+ "step": 29735
1218
+ },
1219
+ {
1220
+ "epoch": 9.32,
1221
+ "learning_rate": 7.591442374051071e-06,
1222
+ "loss": 0.8752,
1223
+ "step": 30000
1224
+ },
1225
+ {
1226
+ "epoch": 9.33,
1227
+ "eval_accuracy": 0.8081288482769053,
1228
+ "eval_loss": 0.8320378661155701,
1229
+ "eval_runtime": 6119.7341,
1230
+ "eval_samples_per_second": 16.861,
1231
+ "eval_steps_per_second": 4.215,
1232
+ "step": 30048
1233
+ },
1234
+ {
1235
+ "epoch": 9.43,
1236
+ "eval_accuracy": 0.8082356261550239,
1237
+ "eval_loss": 0.8310965299606323,
1238
+ "eval_runtime": 6119.6431,
1239
+ "eval_samples_per_second": 16.862,
1240
+ "eval_steps_per_second": 4.215,
1241
+ "step": 30361
1242
+ },
1243
+ {
1244
+ "epoch": 9.47,
1245
+ "learning_rate": 5.866114561766736e-06,
1246
+ "loss": 0.8732,
1247
+ "step": 30500
1248
+ },
1249
+ {
1250
+ "epoch": 9.53,
1251
+ "eval_accuracy": 0.8083999448820824,
1252
+ "eval_loss": 0.8303462266921997,
1253
+ "eval_runtime": 6118.4989,
1254
+ "eval_samples_per_second": 16.865,
1255
+ "eval_steps_per_second": 4.216,
1256
+ "step": 30674
1257
+ },
1258
+ {
1259
+ "epoch": 9.62,
1260
+ "eval_accuracy": 0.8084419046833061,
1261
+ "eval_loss": 0.8290849328041077,
1262
+ "eval_runtime": 6127.7892,
1263
+ "eval_samples_per_second": 16.839,
1264
+ "eval_steps_per_second": 4.21,
1265
+ "step": 30987
1266
+ },
1267
+ {
1268
+ "epoch": 9.63,
1269
+ "learning_rate": 4.140786749482402e-06,
1270
+ "loss": 0.8715,
1271
+ "step": 31000
1272
+ },
1273
+ {
1274
+ "epoch": 9.72,
1275
+ "eval_accuracy": 0.8088197529604327,
1276
+ "eval_loss": 0.8284289836883545,
1277
+ "eval_runtime": 6124.7156,
1278
+ "eval_samples_per_second": 16.848,
1279
+ "eval_steps_per_second": 4.212,
1280
+ "step": 31300
1281
+ },
1282
+ {
1283
+ "epoch": 9.78,
1284
+ "learning_rate": 2.4154589371980677e-06,
1285
+ "loss": 0.8705,
1286
+ "step": 31500
1287
+ },
1288
+ {
1289
+ "epoch": 9.82,
1290
+ "eval_accuracy": 0.8085015827448934,
1291
+ "eval_loss": 0.8298270106315613,
1292
+ "eval_runtime": 6120.6207,
1293
+ "eval_samples_per_second": 16.859,
1294
+ "eval_steps_per_second": 4.215,
1295
+ "step": 31613
1296
+ },
1297
+ {
1298
+ "epoch": 9.91,
1299
+ "eval_accuracy": 0.8086080278025564,
1300
+ "eval_loss": 0.8285703659057617,
1301
+ "eval_runtime": 6122.3492,
1302
+ "eval_samples_per_second": 16.854,
1303
+ "eval_steps_per_second": 4.214,
1304
+ "step": 31926
1305
+ },
1306
+ {
1307
+ "epoch": 9.94,
1308
+ "learning_rate": 6.901311249137336e-07,
1309
+ "loss": 0.8676,
1310
+ "step": 32000
1311
+ },
1312
+ {
1313
+ "epoch": 10.0,
1314
+ "step": 32200,
1315
+ "total_flos": 9.597056792179405e+18,
1316
+ "train_loss": 1.5035098029663845,
1317
+ "train_runtime": 2595975.6547,
1318
+ "train_samples_per_second": 3.176,
1319
+ "train_steps_per_second": 0.012
1320
+ }
1321
+ ],
1322
+ "max_steps": 32200,
1323
+ "num_train_epochs": 10,
1324
+ "total_flos": 9.597056792179405e+18,
1325
+ "trial_name": null,
1326
+ "trial_params": null
1327
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:be42605e93b49d362df1bfcd9f27e4c4f1ad3b23e2cb448f454ac7a56ecc7792
3
+ size 3963
vocab.json ADDED
The diff for this file is too large to render. See raw diff