UNIST-Eunchan's picture
Update README.md
c0ec00c
metadata
base_model: google/pegasus-x-base
tags:
  - generated_from_trainer
datasets:
  - ccdv/arxiv-summarization
model-index:
  - name: Paper-Summarization-ArXiv
    results:
      - task:
          name: Summarization
          type: summarization
        dataset:
          name: ccdv/arxiv-summarization
          type: ccdv/arxiv-summarization
          config: section
          split: test
          args: section
        metrics:
          - name: ROUGE-1
            type: rouge
            value: 43.2305
          - name: ROUGE-2
            type: rouge
            value: 16.6571
          - name: ROUGE-L
            type: rouge
            value: 24.4315
          - name: ROUGE-LSum
            type: rouge
            value: 33.9399
license: bigscience-openrail-m
language:
  - en
metrics:
  - rouge
library_name: transformers
pipeline_tag: summarization

Paper-Summarization-ArXiv

This model is a fine-tuned version of google/pegasus-x-base on the arxiv-summarization dataset.

Base Model: Pegasus-x-base (State-of-the-art for Long Context Summarization)

Finetuning Dataset:

  • We used full of ArXiv Dataset (Cohan et al., 2018, NAACL-HLT 2018) [PDF]
    • (Full length is 200,000+)

GPU: (RTX A6000) x 1

Train time: About 120 hours for 5 epochs

Test time: About 8 hours for test dataset.

Intended uses & limitations

  • Research Paper Summarization

Compare to Baseline

  • Pegasus-X-base zero-shot Performance:

    • R-1 | R-2 | R-L | R-LSUM : 6.2269 | 0.7894 | 4.6905 | 5.4591
  • This model

    • R-1 | R-2 | R-L | R-LSUM : 43.2305 | 16.6571 | 24.4315 | 33.9399 at
    model.generate(input_ids =inputs["input_ids"].to(device),
                                attention_mask=inputs["attention_mask"].to(device),
                                length_penalty=1, num_beams=2, max_length=128*4,min_length=150, no_repeat_ngram_size= 3, top_k=25,top_p=0.95)
      
    
    • R-1 | R-2 | R-L | R-LSUM : 40.8486 | 16.3717 | 25.2937 | 33.6923 (refer to PEGASUS-X's paper) at
    model.generate(input_ids =inputs["input_ids"].to(device),
                                attention_mask=inputs["attention_mask"].to(device),
                                length_penalty=1, num_beams=1, max_length=128*2,top_p=1)
    
    • R-1 | R-2 | R-L | R-LSUM : 38.1317 | 15.0357 | 23.0286 | 30.9938 (Diverse Beam-Search Decoding) at
    model.generate(input_ids =inputs["input_ids"].to(device),
                                attention_mask=inputs["attention_mask"].to(device),
                                num_beam_groups=5,diversity_penalty=1.0,num_beams=5,min_length=150,max_length=128*4)
    
    • R-1 | R-2 | R-L | R-LSUM : 43.3017 | 16.6023 | 24.1867 | 33.7019 at
    model.generate(input_ids =inputs["input_ids"].to(device),
                                attention_mask=inputs["attention_mask"].to(device),
                                length_penalty=1.2, num_beams=4, max_length=128*4,min_length=150, no_repeat_ngram_size= 3, temperature=0.9,top_k=50,top_p=0.92)
     
    

Training procedure

We use huggingface-based environment such as datasets, trainer, etc.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05,
train_batch_size: 1,
eval_batch_size: 1,
seed: 42,
gradient_accumulation_steps: 64,
total_train_batch_size: 64,
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08,
lr_scheduler_type: linear,
lr_scheduler_warmup_steps: 1586,
num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss
2.6153 1.0 3172 2.1045
2.202 2.0 6344 2.0511
2.1547 3.0 9516 2.0282
2.132 4.0 12688 2.0164
2.1222 5.0 15860 2.0127

Framework versions

  • Transformers 4.32.1
  • Pytorch 2.0.1
  • Datasets 2.12.0
  • Tokenizers 0.13.2