metadata
license: apache-2.0
base_model: google/long-t5-tglobal-base
tags:
- generated_from_trainer
- synthsumm
metrics:
- rouge
datasets:
- pszemraj/synthsumm
language:
- en
pipeline_tag: summarization
inference:
parameters:
max_length: 64
min_length: 8
no_repeat_ngram_size: 3
early_stopping: true
repetition_penalty: 3.5
encoder_no_repeat_ngram_size: 4
num_beams: 3
long-t5-tglobal-base-synthsumm_direct
Fine-tuned on a synthetic dataset of curated long-context text and GPT-3.5-turbo-1106
summaries spanning multiple domains + "random" long-context examples from pretraining datasets
- Note: this model has not been fine-tuned on any other summarization datasets, just the
synthsumm
data
Try it: gradio demo | free HF inference api via requests
| .md with example outputs (gauntlet)
Usage
It's recommended to use this model with beam search decoding. If interested, you can also use the textsum
util repo to have most of this abstracted out for you:
pip install -U textsum
from textsum.summarize import Summarizer
model_name = "pszemraj/long-t5-tglobal-base-synthsumm_direct"
summarizer = Summarizer(model_name) # GPU auto-detected
text = "put the text you don't want to read here"
summary = summarizer.summarize_string(text)
print(summary)
Details
This model is a fine-tuned version of google/long-t5-tglobal-base on the None dataset. It achieves the following results on the evaluation set:
- Loss: 1.4378
- Rouge1: 48.0918
- Rouge2: 21.2531
- Rougel: 34.4307
- Rougelsum: 43.0271
- Gen Len: 84.5231
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 1
- eval_batch_size: 1
- seed: 26605
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: inverse_sqrt
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 2.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
1.9183 | 0.38 | 125 | 1.5762 | 38.7221 | 15.0873 | 28.3123 | 34.9655 | 129.2154 |
1.8815 | 0.77 | 250 | 1.5230 | 44.3531 | 17.9384 | 31.7417 | 39.5563 | 87.3538 |
1.7264 | 1.15 | 375 | 1.4735 | 45.7781 | 20.102 | 33.329 | 41.4737 | 101.9231 |
1.8545 | 1.54 | 500 | 1.4505 | 47.0134 | 20.6159 | 33.6118 | 41.6579 | 88.2308 |
1.7444 | 1.92 | 625 | 1.4378 | 48.0918 | 21.2531 | 34.4307 | 43.0271 | 84.5231 |
Framework versions
- Transformers 4.36.0.dev0
- Pytorch 2.1.0
- Datasets 2.15.0
- Tokenizers 0.15.0