Edit model card

BART-Large-CNN-scratch

The BART-Large-CNN-scratch model is a newly trained version of the facebook/bart-large model. This model was trained from scratch on the CNN/DailyMail dataset to reproduce the performance of the facebook/bart-large-cnn model.

  • Developed by: phanerozoic
  • Model type: BartForConditionalGeneration
  • Source model: facebook/bart-large
  • License: cc-by-nc-4.0
  • Languages: English

Model Details

BART-Large-CNN-scratch utilizes a transformer-based architecture with a sequence-to-sequence approach, tailored specifically for text summarization tasks. This model builds upon the strengths of the original BART architecture by training from scratch using the CNN/DailyMail dataset.

Configuration

  • Max input length: 1024 tokens
  • Max target length: 128 tokens
  • Learning rate: 4e-5
  • Batch size: 32
  • Epochs: 1
  • Hardware used: NVIDIA RTX 6000 Ada Lovelace

Training and Evaluation Data

The model was trained on 1 epoch of the CNN/DailyMail dataset, a comprehensive collection of news articles paired with human-written summaries. This dataset is widely used as a benchmark for evaluating text summarization models due to its size and the quality of its annotations.

Training Procedure

The training involved starting from the facebook/bart-large model and training from scratch with the following settings:

  • Epochs: 1
  • Batch size: 32
  • Learning rate: 4e-5
  • Training time: 7 hours
  • Loss: 0.65

During training, the model was optimized to reduce the loss function, enhancing its ability to generate summaries that are both concise and informative.

Performance

The training process resulted in the following performance metrics:

  • ROUGE-1: 44.07
  • ROUGE-2: 21.06
  • ROUGE-L: 30.65

Comparing Performance to Base Model

The performance of BART-Large-CNN-scratch is compared against Facebook's base BART-large-cnn model:

Model ROUGE-1 ROUGE-2 ROUGE-L
Facebook BART-large-cnn 42.949 20.815 30.619
BART-Large-CNN-scratch 44.070 21.060 30.650

Analysis of Summaries

Eiffel Tower Article Summary Comparison

Facebook BART-Large-CNN Summary:

"The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world."

BART-Large-CNN-scratch Summary:

"The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building. Its base is square, measuring 125 metres (410 ft) on each side. It is the second tallest free-standing structure in France after the Millau Viaduct."

  • Comparison:
    • Both summaries start with identical descriptions of the Eiffel Tower's height and base dimensions.
    • The Facebook summary mentions the historical significance of the Eiffel Tower surpassing the Washington Monument.
    • The scratch summary includes the detail of the tower being the second tallest free-standing structure in France, providing a different historical context.
    • The scratch summary omits the name of the tower, indicating our deficiency in attempting to perfectly replicate Facebook's performance.

Paper Clip Article Summary Comparison

Facebook BART-Large-CNN Summary:

"The earliest form of the paper clip dates back to the 13th century. The most widely recognized design is attributed to the Norwegian inventor Johan Vaaler. The design of paper clips has continued to evolve, with various shapes and sizes available on the market. During World War II, paper clips became a symbol of resistance in Norway."

BART-Large-CNN-scratch Summary:

"The paper clip dates back to the 13th century, when a device made of a bent metal wire was used to hold sheets of paper together. The most widely recognized design is attributed to the Norwegian inventor Johan Vaaler, who received a patent for his paper clip design in 1899. During World War II, the paper clip became a symbol of resistance in Norway."

  • Comparison:
    • Both summaries start with descriptions of the origins of the paper clip and Johan Vaaler's contributions.
    • The Facebook summary briefly mentions the evolution of paper clip designs and their availability in various shapes and sizes.
    • The scratch summary includes additional historical details about the use of bent metal wires in the 13th century and Vaaler's patent, providing a richer historical context.

Implications

  1. Reproducibility:

    • The BART-Large-CNN-scratch model closely reproduces the performance of the Facebook BART-large-cnn model, capturing key historical points and providing concise summaries. However, it shows some differences in detail prioritization, indicating that while the reproduction is effective, it is not exact.
  2. Model Training from Scratch:

    • Training from scratch has proven to be effective, with the BART-Large-CNN-scratch model achieving competitive ROUGE scores. However, the summaries differ in detail compared to the Facebook model, suggesting areas for further fine-tuning.
  3. Practical Applications:

    • Both models are effective for summarizing historical and technical articles. The BART-Large-CNN-scratch model is excellent for concise overviews, while the Facebook model provides more comprehensive context.

Conclusion

The BART-Large-CNN-scratch model demonstrates strong performance, capturing essential historical points and providing concise summaries. While it does not exactly reproduce the Facebook model's summaries, it achieves similar quality and even exceeds in ROUGE scores. This makes it a robust tool for text summarization applications.

Acknowledgments

Special thanks to the developers of the BART architecture and the Hugging Face team. Their tools and frameworks were instrumental in the development and fine-tuning of this model. The NVIDIA RTX 6000 Ada Lovelace hardware provided the necessary computational power to achieve these results.

Downloads last month
7
Safetensors
Model size
406M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train phanerozoic/BART-Large-CNN-Scratch

Collection including phanerozoic/BART-Large-CNN-Scratch

Evaluation results