byt5-large_ocr / README.md
Amala3's picture
End of training
40ee4a9 verified
metadata
license: apache-2.0
base_model: google/byt5-large
tags:
  - generated_from_trainer
model-index:
  - name: models
    results: []

models

This model is a fine-tuned version of google/byt5-large on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0735

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss
1.0178 0.0699 500 0.1299
0.199 0.1398 1000 0.1156
0.1327 0.2097 1500 0.1020
0.1241 0.2796 2000 0.0964
0.1115 0.3495 2500 0.0919
0.1109 0.4193 3000 0.0897
0.1016 0.4892 3500 0.0870
0.1081 0.5591 4000 0.0847
0.0946 0.6290 4500 0.0826
0.0996 0.6989 5000 0.0822
0.0962 0.7688 5500 0.0809
0.09 0.8387 6000 0.0799
0.0901 0.9086 6500 0.0785
0.098 0.9785 7000 0.0773
0.0866 1.0484 7500 0.0772
0.0877 1.1183 8000 0.0776
0.0865 1.1881 8500 0.0761
0.0912 1.2580 9000 0.0757
0.0786 1.3279 9500 0.0762
0.085 1.3978 10000 0.0744
0.0818 1.4677 10500 0.0750
0.078 1.5376 11000 0.0751
0.0791 1.6075 11500 0.0746
0.0848 1.6774 12000 0.0743
0.0755 1.7473 12500 0.0738
0.0785 1.8172 13000 0.0739
0.0773 1.8871 13500 0.0736
0.0792 1.9569 14000 0.0735

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.1.2
  • Datasets 2.18.0
  • Tokenizers 0.19.1