Edit model card

llama2-7bb-tweet-summarization-gradnorm-0.3-warmupratio-0.05

This model is a fine-tuned version of NousResearch/Llama-2-7b-hf on the dialogstudio dataset. It achieves the following results on the evaluation set:

  • Loss: 2.8360
  • Rouge Scores: {'rouge1': 93.92075452911438, 'rouge2': 78.28015883656892, 'rougeL': 64.88738306318788, 'rougeLsum': 93.91572652306441}
  • Bleu Scores: [0.9489359800839542, 0.9362845242017266, 0.908851614503138, 0.877219164400539]
  • Gen Len: 463.0182

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 7
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rouge Scores Bleu Scores Gen Len
1.9797 1.0 220 1.8468 {'rouge1': 92.58753211785933, 'rouge2': 78.56005630365634, 'rougeL': 67.90431046147724, 'rougeLsum': 92.60062258669173} [0.9044331868078556, 0.8935689766031644, 0.8701723917264629, 0.8432837507770929] 463.0182
1.6875 2.0 440 1.8317 {'rouge1': 93.57745806777827, 'rouge2': 79.20734399292829, 'rougeL': 68.03949913123978, 'rougeLsum': 93.56573169703795} [0.9260232753232301, 0.9151369981183058, 0.8909296719649512, 0.8630015430201563] 463.0182
1.3609 3.0 660 1.9440 {'rouge1': 93.64149561116312, 'rouge2': 78.9369863604149, 'rougeL': 67.28929677118091, 'rougeLsum': 93.6354094969574} [0.933089623937576, 0.9213474707086045, 0.8961256783117583, 0.8671119660431741] 463.0182
0.9973 4.0 880 2.1479 {'rouge1': 93.77098210043772, 'rouge2': 78.72676191106424, 'rougeL': 66.61685782420736, 'rougeLsum': 93.77132525696588} [0.9407524092990022, 0.9287706231907287, 0.9027163186452807, 0.8726978389893866] 463.0182
0.6828 5.0 1100 2.3624 {'rouge1': 93.76850681087201, 'rouge2': 78.54959646542315, 'rougeL': 65.96739684743356, 'rougeLsum': 93.76918986163282} [0.9447432833130077, 0.9323421216849288, 0.9057018192399795, 0.8750402029132044] 463.0182
0.4662 6.0 1320 2.6675 {'rouge1': 93.85846920408349, 'rouge2': 78.2490547871314, 'rougeL': 65.29853641567857, 'rougeLsum': 93.85718380561036} [0.9469562557636547, 0.9345062357610694, 0.9072329702581828, 0.8757492563086158] 463.0182
0.3594 7.0 1540 2.8360 {'rouge1': 93.92075452911438, 'rouge2': 78.28015883656892, 'rougeL': 64.88738306318788, 'rougeLsum': 93.91572652306441} [0.9489359800839542, 0.9362845242017266, 0.908851614503138, 0.877219164400539] 463.0182

Framework versions

  • PEFT 0.8.2.dev0
  • Transformers 4.38.0.dev0
  • Pytorch 2.1.0+cu121
  • Datasets 2.16.2.dev0
  • Tokenizers 0.15.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for DrishtiSharma/llama2-7bb-tweet-summarization-gradnorm-0.3-warmupratio-0.05

Adapter
(136)
this model