pszemraj commited on
Commit
ddb8588
1 Parent(s): 2e06713
Files changed (1) hide show
  1. README.md +25 -0
README.md CHANGED
@@ -468,6 +468,8 @@ A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/
468
 
469
  > The narrator tells us that he's graduated from the Navy seals and has been involved in many secret raids. He's also one of the best snipers in the entire U.S. military. He promises to "wipe you out with precision" when they meet again.
470
 
 
 
471
  ## Model description
472
 
473
  A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
@@ -512,6 +514,29 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
512
 
513
  _NOTE: early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens** and then trained further for 10+ epochs._
514
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
515
  ## Training procedure
516
 
517
  ### Updates:
 
468
 
469
  > The narrator tells us that he's graduated from the Navy seals and has been involved in many secret raids. He's also one of the best snipers in the entire U.S. military. He promises to "wipe you out with precision" when they meet again.
470
 
471
+ ---
472
+
473
  ## Model description
474
 
475
  A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
 
514
 
515
  _NOTE: early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens** and then trained further for 10+ epochs._
516
 
517
+ ---
518
+
519
+ ---
520
+
521
+ ## FAQ
522
+
523
+ ### How to run inference over a very long (30k+ tokens) document in batches?
524
+
525
+ See `summarize.py` in [the code for my hf space Document Summarization](https://huggingface.co/spaces/pszemraj/document-summarization/blob/main/summarize.py) :)
526
+
527
+ You can also use the same code to split a document into batches of 4096, etc., and run over those with the model. This is useful in situations where CUDA memory is limited.
528
+
529
+ ### How to fine-tune further?
530
+
531
+ See [train with a script](https://huggingface.co/docs/transformers/run_scripts) and [the summarization scripts](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization).
532
+
533
+ This model was originally tuned on Google Colab with a heavily modified variant of the [longformer training notebook](https://github.com/patrickvonplaten/notebooks/blob/master/Fine_tune_Longformer_Encoder_Decoder_(LED)_for_Summarization_on_pubmed.ipynb), key enabler being deepspeed. You can try this as an alternate route to fine-tuning the model without the command line.
534
+
535
+
536
+
537
+ ---
538
+
539
+
540
  ## Training procedure
541
 
542
  ### Updates: