Update README.md
Browse files
README.md
CHANGED
@@ -452,7 +452,6 @@ model-index:
|
|
452 |
value: 214.9692
|
453 |
verified: true
|
454 |
---
|
455 |
-
|
456 |
# long-t5-tglobal-base-16384 + BookSum
|
457 |
|
458 |
<a href="https://colab.research.google.com/gist/pszemraj/d9a0495861776168fd5cdcd7731bc4ee/example-long-t5-tglobal-base-16384-book-summary.ipynb">
|
@@ -461,9 +460,9 @@ model-index:
|
|
461 |
|
462 |
Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
|
463 |
|
464 |
-
-
|
465 |
-
-
|
466 |
-
-
|
467 |
|
468 |
## Cheeky Proof-of-Concept
|
469 |
|
@@ -482,10 +481,11 @@ A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/
|
|
482 |
- [Intended uses & limitations](#intended-uses--limitations)
|
483 |
- [Training and evaluation data](#training-and-evaluation-data)
|
484 |
- [FAQ](#faq)
|
485 |
-
- [
|
486 |
-
- [How to fine-tune further](#how-to-fine-tune-further)
|
|
|
487 |
- [Training procedure](#training-procedure)
|
488 |
-
- [Updates](#updates)
|
489 |
- [Training hyperparameters](#training-hyperparameters)
|
490 |
- [Framework versions](#framework-versions)
|
491 |
- [Citation info](#citation-info)
|
@@ -498,8 +498,8 @@ A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/
|
|
498 |
|
499 |
A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
|
500 |
|
501 |
-
-
|
502 |
-
-
|
503 |
|
504 |
Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)
|
505 |
|
@@ -528,15 +528,14 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
|
|
528 |
|
529 |
## Intended uses & limitations
|
530 |
|
531 |
-
-
|
532 |
-
-
|
533 |
-
-
|
534 |
|
535 |
## Training and evaluation data
|
536 |
|
537 |
`kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209). Summaries longer than 1024 LongT5 tokens were filtered out to prevent the model from learning to generate "partial" summaries.
|
538 |
|
539 |
-
|
540 |
* * *
|
541 |
|
542 |
## FAQ
|
@@ -553,9 +552,9 @@ See [train with a script](https://huggingface.co/docs/transformers/run_scripts)
|
|
553 |
|
554 |
This model was originally tuned on Google Colab with a heavily modified variant of the [longformer training notebook](https://github.com/patrickvonplaten/notebooks/blob/master/Fine_tune_Longformer_Encoder_Decoder_(LED)_for_Summarization_on_pubmed.ipynb), key enabler being deepspeed. You can try this as an alternate route to fine-tuning the model without using the command line.
|
555 |
|
556 |
-
###
|
557 |
|
558 |
-
I
|
559 |
|
560 |
```sh
|
561 |
pip install textsum
|
@@ -570,17 +569,14 @@ summarizer = Summarizer(
|
|
570 |
model_name_or_path="pszemraj/long-t5-tglobal-base-16384-book-summary"
|
571 |
)
|
572 |
|
573 |
-
|
574 |
-
out_str = summarizer.summarize_string(
|
575 |
-
"This is a long string of text that will be summarized."
|
576 |
-
)
|
577 |
print(f"summary: {out_str}")
|
578 |
-
|
579 |
```
|
580 |
|
581 |
-
This package provides easy-to-use interfaces for
|
582 |
|
583 |
-
For details, explanations, and
|
584 |
|
585 |
* * *
|
586 |
|
@@ -588,8 +584,8 @@ For details, explanations, and docs, see the README (_linked above_) or the [wik
|
|
588 |
|
589 |
### Updates:
|
590 |
|
591 |
-
-
|
592 |
-
-
|
593 |
|
594 |
### Training hyperparameters
|
595 |
|
@@ -597,26 +593,26 @@ _NOTE: early checkpoints of this model were trained on a "smaller" subsection of
|
|
597 |
|
598 |
The following hyperparameters were used during the **most recent** training round\*:
|
599 |
|
600 |
-
-
|
601 |
-
-
|
602 |
-
-
|
603 |
-
-
|
604 |
-
-
|
605 |
-
-
|
606 |
-
-
|
607 |
-
-
|
608 |
-
-
|
609 |
-
-
|
610 |
-
-
|
611 |
|
612 |
\* Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train
|
613 |
|
614 |
### Framework versions
|
615 |
|
616 |
-
-
|
617 |
-
-
|
618 |
-
-
|
619 |
-
-
|
620 |
|
621 |
## Citation info
|
622 |
|
|
|
452 |
value: 214.9692
|
453 |
verified: true
|
454 |
---
|
|
|
455 |
# long-t5-tglobal-base-16384 + BookSum
|
456 |
|
457 |
<a href="https://colab.research.google.com/gist/pszemraj/d9a0495861776168fd5cdcd7731bc4ee/example-long-t5-tglobal-base-16384-book-summary.ipynb">
|
|
|
460 |
|
461 |
Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
|
462 |
|
463 |
+
- generalizes reasonably well to academic & narrative text.
|
464 |
+
- A simple example/use case on ASR is [here](https://longt5-booksum-example.netlify.app/).
|
465 |
+
- Example notebook in Colab (_click on the icon above_).
|
466 |
|
467 |
## Cheeky Proof-of-Concept
|
468 |
|
|
|
481 |
- [Intended uses & limitations](#intended-uses--limitations)
|
482 |
- [Training and evaluation data](#training-and-evaluation-data)
|
483 |
- [FAQ](#faq)
|
484 |
+
- [How to run inference over a very long (30k+ tokens) document in batches?](#how-to-run-inference-over-a-very-long-30k-tokens-document-in-batches)
|
485 |
+
- [How to fine-tune further?](#how-to-fine-tune-further)
|
486 |
+
- [Are there simpler ways to run this?](#are-there-simpler-ways-to-run-this)
|
487 |
- [Training procedure](#training-procedure)
|
488 |
+
- [Updates:](#updates)
|
489 |
- [Training hyperparameters](#training-hyperparameters)
|
490 |
- [Framework versions](#framework-versions)
|
491 |
- [Citation info](#citation-info)
|
|
|
498 |
|
499 |
A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
|
500 |
|
501 |
+
- 30+ epochs of fine-tuning from the base model on V100/A100 GPUs
|
502 |
+
- Training used 16384 token input / 1024 max output
|
503 |
|
504 |
Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)
|
505 |
|
|
|
528 |
|
529 |
## Intended uses & limitations
|
530 |
|
531 |
+
- The current checkpoint is fairly well converged but will be updated if further improvements can be made.
|
532 |
+
- Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset (API gen parameters are the same).
|
533 |
+
- while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
|
534 |
|
535 |
## Training and evaluation data
|
536 |
|
537 |
`kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209). Summaries longer than 1024 LongT5 tokens were filtered out to prevent the model from learning to generate "partial" summaries.
|
538 |
|
|
|
539 |
* * *
|
540 |
|
541 |
## FAQ
|
|
|
552 |
|
553 |
This model was originally tuned on Google Colab with a heavily modified variant of the [longformer training notebook](https://github.com/patrickvonplaten/notebooks/blob/master/Fine_tune_Longformer_Encoder_Decoder_(LED)_for_Summarization_on_pubmed.ipynb), key enabler being deepspeed. You can try this as an alternate route to fine-tuning the model without using the command line.
|
554 |
|
555 |
+
### Are there simpler ways to run this?
|
556 |
|
557 |
+
For this reason, I created a Python package utility. It's called [textsum](https://github.com/pszemraj/textsum), and you can use it to load models and summarize things in a few lines of code.
|
558 |
|
559 |
```sh
|
560 |
pip install textsum
|
|
|
569 |
model_name_or_path="pszemraj/long-t5-tglobal-base-16384-book-summary"
|
570 |
)
|
571 |
|
572 |
+
long_string = "This is a long string of text that will be summarized."
|
573 |
+
out_str = summarizer.summarize_string(long_string)
|
|
|
|
|
574 |
print(f"summary: {out_str}")
|
|
|
575 |
```
|
576 |
|
577 |
+
This package provides easy-to-use interfaces for applying summarization models to text documents of arbitrary length. Currently implemented interfaces include a Python API, a CLI, and a shareable demo application.
|
578 |
|
579 |
+
For details, explanations, and documentation, see the README (_linked above_) or the [wiki](https://github.com/pszemraj/textsum/wiki).
|
580 |
|
581 |
* * *
|
582 |
|
|
|
584 |
|
585 |
### Updates:
|
586 |
|
587 |
+
- July 22, 2022: updated to a fairly converged checkpoint
|
588 |
+
- July 3, 2022: Added a new version with several epochs of additional general training that is more performant.
|
589 |
|
590 |
### Training hyperparameters
|
591 |
|
|
|
593 |
|
594 |
The following hyperparameters were used during the **most recent** training round\*:
|
595 |
|
596 |
+
- learning_rate: 0.0005
|
597 |
+
- train_batch_size: 1
|
598 |
+
- eval_batch_size: 1
|
599 |
+
- seed: 42
|
600 |
+
- distributed_type: multi-GPU
|
601 |
+
- gradient_accumulation_steps: 128
|
602 |
+
- total_train_batch_size: 128
|
603 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
604 |
+
- lr_scheduler_type: cosine
|
605 |
+
- lr_scheduler_warmup_ratio: 0.01
|
606 |
+
- num_epochs: 2
|
607 |
|
608 |
\* Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train
|
609 |
|
610 |
### Framework versions
|
611 |
|
612 |
+
- Transformers 4.20.1
|
613 |
+
- Pytorch 1.10.0+cu113
|
614 |
+
- Datasets 2.3.2
|
615 |
+
- Tokenizers 0.12.1
|
616 |
|
617 |
## Citation info
|
618 |
|