|
--- |
|
tags: |
|
- generated_from_trainer |
|
- summarization |
|
- book summary |
|
dataset: |
|
- kmfoda/booksum |
|
metrics: |
|
- rouge |
|
model-index: |
|
- name: long-t5-tglobal-large-booksum-WIP |
|
results: [] |
|
--- |
|
|
|
|
|
# tglobal-large-booksum-WIP |
|
|
|
> this is a WIP checkpoint that has been fine-tuned from the vanilla (original) for 10ish epochs. It is **not ready to be used for inference** |
|
This model is a fine-tuned version of [google/long-t5-tglobal-large](https://huggingface.co/google/long-t5-tglobal-large) on the `kmfoda/booksum` dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 4.9519 |
|
- Rouge1: 21.8058 |
|
- Rouge2: 2.9343 |
|
- Rougel: 10.3717 |
|
- Rougelsum: 20.1537 |
|
- Gen Len: 106.055 |
|
|
|
## Model description |
|
|
|
Testing fine-tuning only on booksum with 16384/1024 the whole time (vs. previous large WIP checkpoint I made that started from a partially-trained `pubmed` checkpoint) |
|
|
|
## Intended uses & limitations |
|
|
|
this is a WIP checkpoint that has been fine-tuned from the vanilla (original) for 10ish epochs. It is **not ready to be used for inference** |
|
|
|
## Training and evaluation data |
|
|
|
This is **only** fine-tuned on booksum (vs. previous large WIP checkpoint I made that started from a partially-trained `pubmed` checkpoint) |
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.0004 |
|
- train_batch_size: 1 |
|
- eval_batch_size: 1 |
|
- seed: 31060 |
|
- distributed_type: multi-GPU |
|
- num_devices: 4 |
|
- gradient_accumulation_steps: 32 |
|
- total_train_batch_size: 128 |
|
- total_eval_batch_size: 4 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: cosine |
|
- num_epochs: 3.0 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Gen Len | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | |
|
|:-------------:|:-----:|:----:|:-------:|:---------------:|:-------:|:------:|:-------:|:---------:| |
|
| 5.0389 | 0.99 | 37 | 219.03 | 5.1884 | 29.995 | 4.4045 | 12.8837 | 27.557 | |
|
| 4.8986 | 1.0 | 75 | 5.1286 | 26.921 | 3.7193 | 11.3605| 25.3492 | 276.005 | |
|
| 4.5928 | 2.0 | 150 | 4.9900 | 26.6667 | 3.7342 | 11.8223| 24.7087 | 178.775 | |
|
| 4.6159 | 3.0 | 225 | 4.9519 | 21.8058 | 2.9343 | 10.3717| 20.1537 | 106.055 | |
|
|
|
|
|
#### eval in bf16 |
|
|
|
|
|
``` |
|
***** eval metrics ***** |
|
epoch = 3.0 |
|
eval_gen_len = 103.075 |
|
eval_loss = 4.9501 |
|
eval_rouge1 = 21.6345 |
|
eval_rouge2 = 2.877 |
|
eval_rougeL = 10.386 |
|
eval_rougeLsum = 20.0148 |
|
eval_runtime = 0:06:02.75 |
|
eval_samples = 200 |
|
eval_samples_per_second = 0.551 |
|
eval_steps_per_second = 0.138 |
|
[INFO|trainer.py:2724] 2022-11-27 01:00: |
|
``` |
|
|
|
### Framework versions |
|
|
|
- Transformers 4.25.0.dev0 |
|
- Pytorch 1.13.0+cu117 |
|
- Datasets 2.6.1 |
|
- Tokenizers 0.13.1 |
|
|