pszemraj
/

long-t5-tglobal-base-16384-book-summary

text2text-generation

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on Dec 1, 2022

Commit

ddb8588

•

1 Parent(s): 2e06713

add FAQ

Files changed (1) hide show

README.md +25 -0

README.md CHANGED Viewed

@@ -468,6 +468,8 @@ A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/
 > The narrator tells us that he's graduated from the Navy seals and has been involved in many secret raids. He's also one of the best snipers in the entire U.S. military. He promises to "wipe you out with precision" when they meet again.
 ## Model description
 A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
@@ -512,6 +514,29 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
 _NOTE: early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens** and then trained further for 10+ epochs._
 ## Training procedure
 ### Updates:

 > The narrator tells us that he's graduated from the Navy seals and has been involved in many secret raids. He's also one of the best snipers in the entire U.S. military. He promises to "wipe you out with precision" when they meet again.
+---
 ## Model description
 A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
 _NOTE: early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens** and then trained further for 10+ epochs._
+---
+---
+## FAQ
+### How to run inference over a very long (30k+ tokens) document in batches?
+See `summarize.py` in [the code for my hf space Document Summarization](https://huggingface.co/spaces/pszemraj/document-summarization/blob/main/summarize.py) :)
+You can also use the same code to split a document into batches of 4096, etc., and run over those with the model. This is useful in situations where CUDA memory is limited.
+### How to fine-tune further?
+See [train with a script](https://huggingface.co/docs/transformers/run_scripts) and [the summarization scripts](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization).
+This model was originally tuned on Google Colab with a heavily modified variant of the [longformer training notebook](https://github.com/patrickvonplaten/notebooks/blob/master/Fine_tune_Longformer_Encoder_Decoder_(LED)_for_Summarization_on_pubmed.ipynb), key enabler being deepspeed. You can try this as an alternate route to fine-tuning the model without the command line.
+---
 ## Training procedure
 ### Updates: