Add files

Browse files

Files changed (13) hide show

README.md +203 -0
added_tokens.json +1 -0
config.gin +150 -0
config.json +34 -0
events.out.tfevents.1673453219.t1v-n-c82e3785-w-0.4133.0.v2 +3 -0
flax_model.msgpack +3 -0
pytorch_model.bin +3 -0
run_s2s_ul2-base-nl36-neddx2-en-nl.sh +75 -0
special_tokens_map.json +107 -0
spiece.model +3 -0
spiece.vocab +0 -0
tokenizer_config.json +113 -0
training_state.json +1 -0

README.md ADDED Viewed

	@@ -0,0 +1,203 @@

+---
+language:
+- nl
+- en
+- multilingual
+license: apache-2.0
+tags:
+- dutch
+- english
+- t5
+- t5x
+- ul2
+- seq2seq
+- translation
+datasets:
+- yhavinga/mc4_nl_cleaned
+- yhavinga/nedd_wiki_news
+pipeline_tag: translation
+widget:
+  - text: >-
+      Redistricting and West Virginia’s shrinking population forced the state’s
+      Republican Legislature to pit Mr. McKinley, a six-term Republican with a
+      pragmatic bent, against Mr. Mooney, who has served four terms marked more
+      by conservative rhetoric than legislative achievements.
+  - text: >-
+      It is a painful and tragic spectacle that rises before me: I have drawn
+      back the curtain from the rottenness of man. This word, in my mouth, is at
+      least free from one suspicion: that it involves a moral accusation against
+      humanity.
+  - text: >-
+      Young Wehling was hunched in his chair, his head in his hand. He was so
+      rumpled, so still and colorless as to be virtually invisible. His
+      camouflage was perfect, since the waiting room had a disorderly and
+      demoralized air, too. Chairs and ashtrays had been moved away from the
+      walls. The floor was paved with spattered dropcloths.
+---
+# ul2-base-nl36-en-nl for English to Dutch translation
+Fine-tuned T5 model on English to Dutch translation that was pretrained on Dutch using a UL2 (Mixture-of-Denoisers) objective.
+The T5 model was introduced in
+[this paper](https://arxiv.org/abs/1910.10683)
+and first released at [this page](https://github.com/google-research/text-to-text-transfer-transformer).
+The UL2 objective was introduced in
+[this paper](https://arxiv.org/abs/2205.05131)
+and first released at [this page](https://github.com/google-research/google-research/tree/master/ul2).
+## Model description
+T5 is an encoder-decoder model and treats all NLP problems in a text-to-text format.
+`ul2-base-nl36-en-nl` T5 is a transformers model fine-tuned on parallel sentence and paragraph pairs
+sampled from books.
+This model used the [T5 v1.1](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) improvements compared to the original T5 model during the pretraining:
+- GEGLU activation in the feed-forward hidden layer, rather than ReLU - see [here](https://arxiv.org/abs/2002.05202)
+- Dropout was turned off during pre-training. Dropout should be re-enabled during fine-tuning
+- Pre-trained on self-supervised objective only without mixing in the downstream tasks
+- No parameter sharing between embedding and classifier layer
+The "efficient" T5 architecture findings presented in [this paper](https://arxiv.org/abs/2109.10686) were also applied,
+which suggests that a Deep-Narrow model architecture is favorable for downstream performance compared to other model
+architectures of similar parameter count. Specifically, the model depth is defined as the number of transformer blocks
+that are stacked sequentially.
+This model uses the [t5-efficient-base-nl36](https://huggingface.co/google/t5-efficient-base-nl36) architecture's
+layer depth, which means both the encoder and the decoder have 36 transformer layers compared to the original T5 "base"
+model's architecture of 12 transformer layers.
+### UL2 pretraining objective
+This model was pretrained with the UL2's Mixture-of-Denoisers (MoD) objective, that combines diverse pre-training
+paradigms together. UL2 frames different objective functions for training language models as denoising tasks, where
+the model has to recover missing sub-sequences of a given input. During pre-training it uses a novel mixture-of-denoisers
+that samples from a varied set of such objectives, each with different configurations. UL2 is trained using a mixture of
+three denoising tasks:
+1. R-denoising (or regular span corruption), which emulates the standard T5 span corruption objective;
+2. X-denoising (or extreme span corruption); and
+3. S-denoising (or sequential PrefixLM).
+During pre-training, we sample from the available denoising tasks based on user-specified ratios.
+UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training
+denoising task. During the pre-training, a paradigm token is inserted to the input
+(`[NLU]` for R-denoising, `[NLG]` for X-denoising, or `[S2S]` for S-denoising) indicating the denoising task at hand.
+Then, during fine-tuning the same input token should be inserted to get the best performance for different downstream
+fine-tuning tasks.
+## Intended uses & limitations
+This model was fine-tuned on parallel sentence and paragraph pairs and can be used
+for machine translation.
+### How to use
+Here is how to use this model in PyTorch:
+```python
+model_name = "yhavinga/ul2-base-nl36-en-nl"
+from transformers import AutoTokenizer
+from transformers import AutoModelForSeq2SeqLM
+from transformers import pipeline
+import torch
+device_num = 0 if torch.cuda.is_available() else -1
+device = "cpu" if device_num < 0 else f"cuda:{device_num}"
+tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
+model = AutoModelForSeq2SeqLM.from_pretrained(model_name, use_auth_token=True).to(
+    device
+)
+params = {"max_length": 370, "num_beams": 4, "early_stopping": True}
+translator = pipeline("translation", tokenizer=tokenizer, model=model, device=device_num)
+print(translator("Young Wehling was hunched in his chair, his head in his hand. He was so rumpled, so still and colorless as to be virtually invisible.",
+               **params)[0]['translation_text'])
+```
+### Limitations and bias
+The training data used for this model contains a lot of unfiltered content from the internet, which is far from neutral.
+Therefore, the model can have biased predictions. This bias will also affect all fine-tuned versions of this model.
+## Training data
+The `ul2-base-nl36-en-nl` T5 model was pre-trained simultaneously on a combination of several datasets,
+including the `full` config of the "mc4_nl_cleaned" dataset, which is a cleaned version of Common Crawl's web
+crawl corpus, Dutch books, the Dutch subset of Wikipedia (2022-03-20), and a subset of "mc4_nl_cleaned"
+containing only texts from Dutch and Belgian newspapers. This last dataset is oversampled to bias the model
+towards descriptions of events in the Netherlands and Belgium.
+After pre-training, the model was
+fine-tuned on a translation dataset containing 13 million sentence and paragraph pairs
+sampled from books.
+## Training procedure
+### Preprocessing
+The ul2-base-nl36-en-nl T5 model uses a SentencePiece unigram tokenizer with a vocabulary of 32,000 tokens.
+The tokenizer includes the special tokens `<pad>`, `</s>`, `<unk>`,  known from the original T5 paper,
+`[NLU]`, `[NLG]` and `[S2S]` for the MoD pre-training, and `<n>` for newline.
+During pre-training with the UL2 objective, input and output sequences consist of 512 consecutive tokens.
+The tokenizer does not lowercase texts and is therefore case-sensitive; it distinguises
+between `dutch` and `Dutch`.
+Additionally, 100+28 extra tokens were added for pre-training tasks, resulting in a total of 32,128 tokens.
+### Fine-tuning
+This model was fine-tuned on a dataset containing 13M sentence and paragraph translation pairs sampled from books.
+* Pre-trained model used as starting point: yhavinga/ul2-base-nl36-dutch
+* Amount of fine-tune training steps: 43415
+* Batch size: 512 (gradient accumulation steps: 16)
+* Sequence length: 370 tokens
+* Model dtype: bfloat16
+* z_loss: 0.0001
+* Optimizer: adamw_hf beta1: 0.9 beta2: 0.9969 eps: 1e-08
+* Dropout rate: 0.01
+* Learning rate: 0.0009 with linear decay to 0 and warmup for 500 steps
+* Label smoothing factor: 0.11
+* Bleu score: 44.2
+### Model list
+Models in this series:
+|                      | ul2-base-en-nl   | ul2-base-nl36-en-nl   | ul2-large-en-nl   |
+|:---------------------|:-----------------|:----------------------|:------------------|
+| model_type           | t5               | t5                    | t5                |
+| _pipeline_tag        | translation      | translation           | translation       |
+| d_model              | 768              | 768                   | 1024              |
+| d_ff                 | 2048             | 3072                  | 2816              |
+| num_heads            | 12               | 12                    | 16                |
+| d_kv                 | 64               | 64                    | 64                |
+| num_layers           | 12               | 36                    | 24                |
+| num_decoder_layers   | 12               | 36                    | 24                |
+| feed_forward_proj    | gated-silu       | gated-silu            | gated-silu        |
+| dense_act_fn         | silu             | silu                  | silu              |
+| vocab_size           | 32128            | 32128                 | 32128             |
+| tie_word_embeddings  | 0                | 0                     | 0                 |
+| torch_dtype          | float32          | float32               | float32           |
+| _gin_batch_size      | 128              | 64                    | 64                |
+| _gin_z_loss          | 0.0001           | 0.0001                | 0.0001            |
+| _gin_t5_config_dtype | 'bfloat16'       | 'bfloat16'            | 'bfloat16'        |
+## Evaluation results
+See the evaluation section in the interactive [Pre-training Dutch T5 Models](https://huggingface.co/spaces/yhavinga/pre-training-dutch-t5-models) blog.
+## Acknowledgements
+This project would not have been possible without compute generously provided by Google through the
+[TPU Research Cloud](https://sites.research.google/trc/).
+Thanks to the [Finnish-NLP](https://huggingface.co/Finnish-NLP) authors for releasing their code for the UL2 objective and associated task definitions.
+Thanks to [Stephenn Fernandes](https://huggingface.co/StephennFernandes) for helping me get started with the t5x framework.
+Created by [Yeb Havinga](https://www.linkedin.com/in/yeb-havinga-86530825/)

added_tokens.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"[new_id_17]": 32117, "[new_id_20]": 32120, "[new_id_13]": 32113, "[new_id_2]": 32102, "[new_id_16]": 32116, "[new_id_7]": 32107, "[new_id_5]": 32105, "[new_id_1]": 32101, "[new_id_15]": 32115, "[new_id_12]": 32112, "[new_id_0]": 32100, "[new_id_11]": 32111, "[new_id_25]": 32125, "[new_id_24]": 32124, "[new_id_10]": 32110, "[new_id_27]": 32127, "[new_id_23]": 32123, "[new_id_14]": 32114, "[new_id_22]": 32122, "[new_id_21]": 32121, "[new_id_19]": 32119, "[new_id_3]": 32103, "[new_id_4]": 32104, "[new_id_18]": 32118, "[new_id_9]": 32109, "[new_id_8]": 32108, "[new_id_26]": 32126, "[new_id_6]": 32106}

config.gin ADDED Viewed

	@@ -0,0 +1,150 @@

+from __gin__ import dynamic_registration
+import __main__ as train_script
+import seqio
+import t5.data.mixtures
+from t5x import adafactor
+from t5x.examples.t5 import network
+from t5x import gin_utils
+from t5x import models
+from t5x import partitioning
+from t5x import trainer
+from t5x import utils
+import tasks.nedd_tasks
+import tasks.ul2_tasks as tasks2
+# Macros:
+# ==============================================================================
+BATCH_SIZE = 64
+DROPOUT_RATE = 0.0
+LABEL_SMOOTHING = 0.0
+LOSS_NORMALIZING_FACTOR = None
+MIXTURE_OR_TASK_MODULE = None
+MIXTURE_OR_TASK_NAME = 'ul2_mc4_nedd_wiki_news_mix_1'
+MODEL = @models.EncoderDecoderModel()
+MODEL_DIR = 'ul2_base_nl36_mc4_nedd_wiki_news_nl'
+OPTIMIZER = @adafactor.Adafactor()
+RANDOM_SEED = None
+SHUFFLE_TRAIN_EXAMPLES = True
+TASK_FEATURE_LENGTHS = {'inputs': 512, 'targets': 512}
+TRAIN_STEPS = 2000000
+USE_CACHED_TASKS = False
+USE_HARDWARE_RNG = False
+VOCABULARY = @seqio.SentencePieceVocabulary()
+Z_LOSS = 0.0001
+# Parameters for adafactor.Adafactor:
+# ==============================================================================
+adafactor.Adafactor.decay_rate = 0.8
+adafactor.Adafactor.logical_factor_rules = \
+    @adafactor.standard_logical_factor_rules()
+adafactor.Adafactor.step_offset = 0
+# Parameters for utils.CheckpointConfig:
+# ==============================================================================
+utils.CheckpointConfig.restore = @utils.RestoreCheckpointConfig()
+utils.CheckpointConfig.save = @utils.SaveCheckpointConfig()
+# Parameters for utils.create_learning_rate_scheduler:
+# ==============================================================================
+utils.create_learning_rate_scheduler.base_learning_rate = 1.0
+utils.create_learning_rate_scheduler.factors = 'constant * rsqrt_decay'
+utils.create_learning_rate_scheduler.warmup_steps = 10000
+# Parameters for train/utils.DatasetConfig:
+# ==============================================================================
+train/utils.DatasetConfig.batch_size = %BATCH_SIZE
+train/utils.DatasetConfig.mixture_or_task_name = %MIXTURE_OR_TASK_NAME
+train/utils.DatasetConfig.module = %MIXTURE_OR_TASK_MODULE
+train/utils.DatasetConfig.pack = True
+train/utils.DatasetConfig.seed = None
+train/utils.DatasetConfig.shuffle = %SHUFFLE_TRAIN_EXAMPLES
+train/utils.DatasetConfig.split = 'train'
+train/utils.DatasetConfig.task_feature_lengths = %TASK_FEATURE_LENGTHS
+train/utils.DatasetConfig.use_cached = %USE_CACHED_TASKS
+# Parameters for train_eval/utils.DatasetConfig:
+# ==============================================================================
+train_eval/utils.DatasetConfig.batch_size = %BATCH_SIZE
+train_eval/utils.DatasetConfig.mixture_or_task_name = %MIXTURE_OR_TASK_NAME
+train_eval/utils.DatasetConfig.module = %MIXTURE_OR_TASK_MODULE
+train_eval/utils.DatasetConfig.pack = True
+train_eval/utils.DatasetConfig.seed = 42
+train_eval/utils.DatasetConfig.shuffle = False
+train_eval/utils.DatasetConfig.split = 'validation'
+train_eval/utils.DatasetConfig.task_feature_lengths = %TASK_FEATURE_LENGTHS
+train_eval/utils.DatasetConfig.use_cached = %USE_CACHED_TASKS
+# Parameters for models.EncoderDecoderModel:
+# ==============================================================================
+models.EncoderDecoderModel.input_vocabulary = %VOCABULARY
+models.EncoderDecoderModel.label_smoothing = %LABEL_SMOOTHING
+models.EncoderDecoderModel.loss_normalizing_factor = %LOSS_NORMALIZING_FACTOR
+models.EncoderDecoderModel.module = @network.Transformer()
+models.EncoderDecoderModel.optimizer_def = %OPTIMIZER
+models.EncoderDecoderModel.output_vocabulary = %VOCABULARY
+models.EncoderDecoderModel.z_loss = %Z_LOSS
+# Parameters for partitioning.PjitPartitioner:
+# ==============================================================================
+partitioning.PjitPartitioner.logical_axis_rules = \
+    @partitioning.standard_logical_axis_rules()
+partitioning.PjitPartitioner.model_parallel_submesh = None
+partitioning.PjitPartitioner.num_partitions = 1
+# Parameters for utils.RestoreCheckpointConfig:
+# ==============================================================================
+utils.RestoreCheckpointConfig.path = []
+# Parameters for utils.SaveCheckpointConfig:
+# ==============================================================================
+utils.SaveCheckpointConfig.dtype = 'float32'
+utils.SaveCheckpointConfig.keep = 4
+utils.SaveCheckpointConfig.period = 50000
+utils.SaveCheckpointConfig.save_dataset = False
+utils.SaveCheckpointConfig.use_gda = False
+# Parameters for seqio.SentencePieceVocabulary:
+# ==============================================================================
+seqio.SentencePieceVocabulary.sentencepiece_model_file = \
+    'gs://t5-dutch-english/vocabs/nedd.32000.128extra/spiece.model'
+# Parameters for network.T5Config:
+# ==============================================================================
+network.T5Config.dropout_rate = %DROPOUT_RATE
+network.T5Config.dtype = 'bfloat16'
+network.T5Config.emb_dim = 768
+network.T5Config.head_dim = 64
+network.T5Config.logits_via_embedding = False
+network.T5Config.mlp_activations = ('gelu', 'linear')
+network.T5Config.mlp_dim = 3072
+network.T5Config.num_decoder_layers = 36
+network.T5Config.num_encoder_layers = 36
+network.T5Config.num_heads = 12
+network.T5Config.vocab_size = 32128
+# Parameters for train_script.train:
+# ==============================================================================
+train_script.train.checkpoint_cfg = @utils.CheckpointConfig()
+train_script.train.eval_period = 2000
+train_script.train.eval_steps = 20
+train_script.train.infer_eval_dataset_cfg = None
+train_script.train.model = %MODEL
+train_script.train.model_dir = %MODEL_DIR
+train_script.train.partitioner = @partitioning.PjitPartitioner()
+train_script.train.random_seed = %RANDOM_SEED
+train_script.train.stats_period = 100
+train_script.train.summarize_config_fn = @gin_utils.summarize_gin_config
+train_script.train.total_steps = %TRAIN_STEPS
+train_script.train.train_dataset_cfg = @train/utils.DatasetConfig()
+train_script.train.train_eval_dataset_cfg = @train_eval/utils.DatasetConfig()
+train_script.train.trainer_cls = @trainer.Trainer
+train_script.train.use_hardware_rng = %USE_HARDWARE_RNG
+# Parameters for trainer.Trainer:
+# ==============================================================================
+trainer.Trainer.learning_rate_fn = @utils.create_learning_rate_scheduler()
+trainer.Trainer.num_microbatches = None
+# Parameters for network.Transformer:
+# ==============================================================================
+network.Transformer.config = @network.T5Config()

config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "_name_or_path": "./",
+  "architectures": [
+    "T5ForConditionalGeneration"
+  ],
+  "d_ff": 3072,
+  "d_kv": 64,
+  "d_model": 768,
+  "decoder_start_token_id": 0,
+  "dense_act_fn": "silu",
+  "dropout_rate": 0.01,
+  "early_stopping": true,
+  "eos_token_id": 1,
+  "feed_forward_proj": "gated-silu",
+  "initializer_factor": 1.0,
+  "is_encoder_decoder": true,
+  "is_gated_act": true,
+  "layer_norm_epsilon": 1e-06,
+  "max_length": 370,
+  "model_type": "t5",
+  "num_beams": 4,
+  "num_decoder_layers": 36,
+  "num_heads": 12,
+  "num_layers": 36,
+  "output_past": true,
+  "pad_token_id": 0,
+  "relative_attention_max_distance": 128,
+  "relative_attention_num_buckets": 32,
+  "tie_word_embeddings": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.24.0",
+  "use_cache": true,
+  "vocab_size": 32128
+}

events.out.tfevents.1673453219.t1v-n-c82e3785-w-0.4133.0.v2 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6b6e252bece32e07b67707a9eb56c2bd1599dfda1084432a03d0f9f0d746f74b
+size 1941504

flax_model.msgpack ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0db5c8d7d9b492a2d2fe68a2197442fe1f12709055f0da7b85d8fec2cb08a34e
+size 1677466902

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:952bd79ab8ec1a7c8fc294ecb8cd05851a4fd570b000166ed5bc36331afe44ce
+size 3255881749

run_s2s_ul2-base-nl36-neddx2-en-nl.sh ADDED Viewed

	@@ -0,0 +1,75 @@

+export CORES=`grep -c ^processor /proc/cpuinfo`
+export CORES=`echo "scale=0; ${CORES} * 0.8 / 1" | bc`
+#export XLA_PYTHON_CLIENT_PREALLOCATE=false
+export SOURCE_LANG="en"
+export TARGET_LANG="nl"
+export HF_PROJECT="ul2-base-nl36-neddx2-en-nl"
+#
+export DATASET="/home/yeb/data/nedd_x_dataset/nedd_x_dataset.py"
+#export DATASET_CONFIG="dict"
+export DATASET_CONFIG="voc8k_beta_3buf"
+export MODEL_NAME_OR_PATH="yhavinga/ul2-base-nl36-dutch"
+export TOKENIZER_NAME="yhavinga/ul2-base-nl36-dutch"
+export MODEL_PATH="${HOME}/data/${HF_PROJECT}" # Path to the model
+export HF_DATASETS_CACHE=/mnt/ramdisk
+#         52k       8k     32ksp
+#l        472       500
+#b0       328       352
+#b1       472       480     370
+#b2       1920      1984
+mkdir -p ${MODEL_PATH}
+python ../run_s2s_flax_pmap_multiseq.py \
+  --output_dir="${MODEL_PATH}" \
+  --model_name_or_path ${MODEL_NAME_OR_PATH} \
+  --tokenizer_name ${TOKENIZER_NAME} \
+  --use_fast_tokenizer="False" \
+  --use_auth_token="True" \
+  --dataset_name_list ${DATASET}\
+  --dataset_config_name_list "${DATASET_CONFIG}"\
+  --id_filter_list "<not>-b2-" \
+  --max_train_samples_list "0"  \
+  --max_eval_samples_list "2000" \
+  --max_predict_samples_list "128" \
+  --preprocessing_num_workers="${CORES}" \
+  --source_lang="${SOURCE_LANG}" \
+  --target_lang="${TARGET_LANG}" \
+  --metric_name="sacrebleu" \
+  --do_train --do_eval --do_predict \
+  --predict_with_generate \
+  --learning_rate="0.0009" \
+  --adam_beta1="0.9" \
+  --adam_beta2="0.9969" \
+  --adam_epsilon="1e-8" \
+  --weight_decay="0.001" \
+  --label_smoothing_factor="0.11" \
+  --length_penalty="1.3" \
+  --warmup_steps 500 \
+  --dropout_rate="0.01" \
+  --dtype "bfloat16" \
+  --z_loss "1e-4" \
+  --dynamic_loss_scaling="False" \
+  --per_device_train_batch_size 4 \
+  --per_device_eval_batch_size 4 \
+  --gradient_accumulation_steps 16 \
+  --overwrite_output_dir \
+  --max_source_length_list 370 \
+  --max_target_length_list 370 \
+  --num_beams 5 \
+  --overwrite_output_dir \
+  --logging_steps 5 \
+  --save_steps 800 \
+  --eval_steps 800 \
+  --num_train_epochs 2 \
+  --max_eval_samples 512 \
+  --validation_split_count 2000 \
+  --wandb_project="${HF_PROJECT}" \
+  --wandb_job_type="pmap"
+#  --resume_from_checkpoint="${MODEL_PATH}" \
+#  --max_train_samples="1_064_886" \
+#  --max_eval_samples 256 \
+#  --max_predict_samples 256 \

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,107 @@

+{
+  "additional_special_tokens": [
+    "<extra_id_0>",
+    "<extra_id_1>",
+    "<extra_id_2>",
+    "<extra_id_3>",
+    "<extra_id_4>",
+    "<extra_id_5>",
+    "<extra_id_6>",
+    "<extra_id_7>",
+    "<extra_id_8>",
+    "<extra_id_9>",
+    "<extra_id_10>",
+    "<extra_id_11>",
+    "<extra_id_12>",
+    "<extra_id_13>",
+    "<extra_id_14>",
+    "<extra_id_15>",
+    "<extra_id_16>",
+    "<extra_id_17>",
+    "<extra_id_18>",
+    "<extra_id_19>",
+    "<extra_id_20>",
+    "<extra_id_21>",
+    "<extra_id_22>",
+    "<extra_id_23>",
+    "<extra_id_24>",
+    "<extra_id_25>",
+    "<extra_id_26>",
+    "<extra_id_27>",
+    "<extra_id_28>",
+    "<extra_id_29>",
+    "<extra_id_30>",
+    "<extra_id_31>",
+    "<extra_id_32>",
+    "<extra_id_33>",
+    "<extra_id_34>",
+    "<extra_id_35>",
+    "<extra_id_36>",
+    "<extra_id_37>",
+    "<extra_id_38>",
+    "<extra_id_39>",
+    "<extra_id_40>",
+    "<extra_id_41>",
+    "<extra_id_42>",
+    "<extra_id_43>",
+    "<extra_id_44>",
+    "<extra_id_45>",
+    "<extra_id_46>",
+    "<extra_id_47>",
+    "<extra_id_48>",
+    "<extra_id_49>",
+    "<extra_id_50>",
+    "<extra_id_51>",
+    "<extra_id_52>",
+    "<extra_id_53>",
+    "<extra_id_54>",
+    "<extra_id_55>",
+    "<extra_id_56>",
+    "<extra_id_57>",
+    "<extra_id_58>",
+    "<extra_id_59>",
+    "<extra_id_60>",
+    "<extra_id_61>",
+    "<extra_id_62>",
+    "<extra_id_63>",
+    "<extra_id_64>",
+    "<extra_id_65>",
+    "<extra_id_66>",
+    "<extra_id_67>",
+    "<extra_id_68>",
+    "<extra_id_69>",
+    "<extra_id_70>",
+    "<extra_id_71>",
+    "<extra_id_72>",
+    "<extra_id_73>",
+    "<extra_id_74>",
+    "<extra_id_75>",
+    "<extra_id_76>",
+    "<extra_id_77>",
+    "<extra_id_78>",
+    "<extra_id_79>",
+    "<extra_id_80>",
+    "<extra_id_81>",
+    "<extra_id_82>",
+    "<extra_id_83>",
+    "<extra_id_84>",
+    "<extra_id_85>",
+    "<extra_id_86>",
+    "<extra_id_87>",
+    "<extra_id_88>",
+    "<extra_id_89>",
+    "<extra_id_90>",
+    "<extra_id_91>",
+    "<extra_id_92>",
+    "<extra_id_93>",
+    "<extra_id_94>",
+    "<extra_id_95>",
+    "<extra_id_96>",
+    "<extra_id_97>",
+    "<extra_id_98>",
+    "<extra_id_99>"
+  ],
+  "eos_token": "</s>",
+  "pad_token": "<pad>",
+  "unk_token": "<unk>"
+}

spiece.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:caa6e2f21aeec181276ab80273e3f869ce303ccb8602d68e0524783c3581092d
+size 800223

spiece.vocab ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,113 @@

+{
+  "additional_special_tokens": [
+    "<extra_id_0>",
+    "<extra_id_1>",
+    "<extra_id_2>",
+    "<extra_id_3>",
+    "<extra_id_4>",
+    "<extra_id_5>",
+    "<extra_id_6>",
+    "<extra_id_7>",
+    "<extra_id_8>",
+    "<extra_id_9>",
+    "<extra_id_10>",
+    "<extra_id_11>",
+    "<extra_id_12>",
+    "<extra_id_13>",
+    "<extra_id_14>",
+    "<extra_id_15>",
+    "<extra_id_16>",
+    "<extra_id_17>",
+    "<extra_id_18>",
+    "<extra_id_19>",
+    "<extra_id_20>",
+    "<extra_id_21>",
+    "<extra_id_22>",
+    "<extra_id_23>",
+    "<extra_id_24>",
+    "<extra_id_25>",
+    "<extra_id_26>",
+    "<extra_id_27>",
+    "<extra_id_28>",
+    "<extra_id_29>",
+    "<extra_id_30>",
+    "<extra_id_31>",
+    "<extra_id_32>",
+    "<extra_id_33>",
+    "<extra_id_34>",
+    "<extra_id_35>",
+    "<extra_id_36>",
+    "<extra_id_37>",
+    "<extra_id_38>",
+    "<extra_id_39>",
+    "<extra_id_40>",
+    "<extra_id_41>",
+    "<extra_id_42>",
+    "<extra_id_43>",
+    "<extra_id_44>",
+    "<extra_id_45>",
+    "<extra_id_46>",
+    "<extra_id_47>",
+    "<extra_id_48>",
+    "<extra_id_49>",
+    "<extra_id_50>",
+    "<extra_id_51>",
+    "<extra_id_52>",
+    "<extra_id_53>",
+    "<extra_id_54>",
+    "<extra_id_55>",
+    "<extra_id_56>",
+    "<extra_id_57>",
+    "<extra_id_58>",
+    "<extra_id_59>",
+    "<extra_id_60>",
+    "<extra_id_61>",
+    "<extra_id_62>",
+    "<extra_id_63>",
+    "<extra_id_64>",
+    "<extra_id_65>",
+    "<extra_id_66>",
+    "<extra_id_67>",
+    "<extra_id_68>",
+    "<extra_id_69>",
+    "<extra_id_70>",
+    "<extra_id_71>",
+    "<extra_id_72>",
+    "<extra_id_73>",
+    "<extra_id_74>",
+    "<extra_id_75>",
+    "<extra_id_76>",
+    "<extra_id_77>",
+    "<extra_id_78>",
+    "<extra_id_79>",
+    "<extra_id_80>",
+    "<extra_id_81>",
+    "<extra_id_82>",
+    "<extra_id_83>",
+    "<extra_id_84>",
+    "<extra_id_85>",
+    "<extra_id_86>",
+    "<extra_id_87>",
+    "<extra_id_88>",
+    "<extra_id_89>",
+    "<extra_id_90>",
+    "<extra_id_91>",
+    "<extra_id_92>",
+    "<extra_id_93>",
+    "<extra_id_94>",
+    "<extra_id_95>",
+    "<extra_id_96>",
+    "<extra_id_97>",
+    "<extra_id_98>",
+    "<extra_id_99>"
+  ],
+  "eos_token": "</s>",
+  "extra_ids": 100,
+  "name_or_path": "yhavinga/ul2-base-nl36-dutch",
+  "pad_token": "<pad>",
+  "sp_model_kwargs": {},
+  "special_tokens_map_file": null,
+  "tokenizer_class": "T5Tokenizer",
+  "unk_token": "<unk>",
+  "use_fast_tokenizer": false
+}

training_state.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"step": 691215}