Edit model card

paligemma_newslakeandmadvqa_conbime

This model is a fine-tuned version of google/paligemma-3b-pt-224 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0349

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 10
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 40
  • optimizer: Use OptimizerNames.ADAMW_HF with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 900
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
5.6592 0.1768 100 5.9832
5.1792 0.3535 200 4.5564
3.6303 0.5303 300 3.2205
2.8756 0.7070 400 2.5592
2.3211 0.8838 500 2.1059
1.9172 1.0605 600 1.8195
1.7352 1.2373 700 1.6310
1.621 1.4141 800 1.4883
1.4855 1.5908 900 1.3832
1.4496 1.7676 1000 1.2959
1.2769 1.9443 1100 1.2350
1.2199 2.1211 1200 1.1831
1.2619 2.2978 1300 1.1391
1.1412 2.4746 1400 1.0943
1.0869 2.6513 1500 1.0589
1.123 2.8281 1600 1.0349

Framework versions

  • PEFT 0.13.0
  • Transformers 4.46.0.dev0
  • Pytorch 2.4.1+cu121
  • Datasets 3.0.1
  • Tokenizers 0.20.0
Downloads last month
8
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for RoyRoyRpy/paligemma_newslakeandmadvqa_conbime

Adapter
(155)
this model