MubarakB's picture
Model save
0da1c8b verified
metadata
library_name: transformers
license: apache-2.0
base_model: MubarakB/mt5_small_lg_en
tags:
  - generated_from_trainer
metrics:
  - bleu
model-index:
  - name: mt5_small_lg_inf_en_v1
    results: []

mt5_small_lg_inf_en_v1

This model is a fine-tuned version of MubarakB/mt5_small_lg_en on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4187
  • Bleu: 0.2171
  • Gen Len: 9.0204

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
No log 1.0 138 0.4669 0.0658 9.3837
No log 2.0 276 0.4559 0.132 8.0245
No log 3.0 414 0.4507 0.2112 8.1592
0.4726 4.0 552 0.4472 0.2144 8.0367
0.4726 5.0 690 0.4445 0.2134 8.0082
0.4726 6.0 828 0.4425 0.3274 7.8612
0.4726 7.0 966 0.4405 0.3378 7.5959
0.447 8.0 1104 0.4390 0.3304 7.3918
0.447 9.0 1242 0.4378 0.3285 7.3673
0.447 10.0 1380 0.4362 0.3147 7.6694
0.4398 11.0 1518 0.4350 0.3181 7.4163
0.4398 12.0 1656 0.4341 0.3166 7.5224
0.4398 13.0 1794 0.4330 0.3178 7.5592
0.4398 14.0 1932 0.4318 0.2157 7.8204
0.4313 15.0 2070 0.4312 0.3169 8.1388
0.4313 16.0 2208 0.4307 0.3169 7.9633
0.4313 17.0 2346 0.4297 0.3064 8.2245
0.4313 18.0 2484 0.4293 0.2045 8.2776
0.4262 19.0 2622 0.4286 0.3027 8.4367
0.4262 20.0 2760 0.4280 0.2042 8.5061
0.4262 21.0 2898 0.4274 0.3033 8.5633
0.4214 22.0 3036 0.4272 0.3019 8.7714
0.4214 23.0 3174 0.4264 0.3051 8.649
0.4214 24.0 3312 0.4263 0.3021 8.8367
0.4214 25.0 3450 0.4254 0.2981 8.8204
0.4161 26.0 3588 0.4251 0.2992 8.8776
0.4161 27.0 3726 0.4248 0.3044 8.8571
0.4161 28.0 3864 0.4246 0.3 8.8776
0.4124 29.0 4002 0.4246 0.2998 8.8163
0.4124 30.0 4140 0.4239 0.2983 9.0857
0.4124 31.0 4278 0.4234 0.2988 9.0163
0.4124 32.0 4416 0.4233 0.2996 8.8816
0.4087 33.0 4554 0.4232 0.298 8.9714
0.4087 34.0 4692 0.4226 0.3003 8.9796
0.4087 35.0 4830 0.4224 0.2992 9.1796
0.4087 36.0 4968 0.4225 0.3005 9.0571
0.4053 37.0 5106 0.4224 0.2994 8.8571
0.4053 38.0 5244 0.4220 0.3 9.1143
0.4053 39.0 5382 0.4216 0.3019 9.102
0.4006 40.0 5520 0.4215 0.3016 8.9714
0.4006 41.0 5658 0.4212 0.3011 8.9224
0.4006 42.0 5796 0.4211 0.2982 9.2816
0.4006 43.0 5934 0.4210 0.2985 9.1633
0.3986 44.0 6072 0.4210 0.2994 9.0776
0.3986 45.0 6210 0.4209 0.308 9.3265
0.3986 46.0 6348 0.4208 0.2963 9.1714
0.3986 47.0 6486 0.4205 0.3093 9.0531
0.3953 48.0 6624 0.4205 0.3068 9.4449
0.3953 49.0 6762 0.4202 0.3075 8.9918
0.3953 50.0 6900 0.4203 0.3071 9.1306
0.3929 51.0 7038 0.4200 0.3052 9.3143
0.3929 52.0 7176 0.4200 0.306 9.1796
0.3929 53.0 7314 0.4200 0.3058 9.2204
0.3929 54.0 7452 0.4200 0.3076 8.8367
0.391 55.0 7590 0.4196 0.3078 8.8776
0.391 56.0 7728 0.4197 0.3041 9.0449
0.391 57.0 7866 0.4198 0.3041 8.8776
0.3887 58.0 8004 0.4201 0.3171 8.9224
0.3887 59.0 8142 0.4192 0.3074 9.0449
0.3887 60.0 8280 0.4197 0.318 8.8571
0.3887 61.0 8418 0.4194 0.3167 9.1469
0.3871 62.0 8556 0.4194 0.3186 8.8612
0.3871 63.0 8694 0.4192 0.3181 8.8245
0.3871 64.0 8832 0.4192 0.3178 9.0449
0.3871 65.0 8970 0.4194 0.3168 8.9673
0.3849 66.0 9108 0.4191 0.3159 8.9184
0.3849 67.0 9246 0.4192 0.3191 8.7347
0.3849 68.0 9384 0.4189 0.3173 8.8367
0.3841 69.0 9522 0.4189 0.3198 8.7633
0.3841 70.0 9660 0.4189 0.3168 8.9306
0.3841 71.0 9798 0.4187 0.3182 8.9837
0.3841 72.0 9936 0.4191 0.3179 8.9918
0.3823 73.0 10074 0.4189 0.3173 8.951
0.3823 74.0 10212 0.4188 0.3158 8.9551
0.3823 75.0 10350 0.4188 0.3184 8.9061
0.3823 76.0 10488 0.4187 0.3174 8.9347
0.3809 77.0 10626 0.4186 0.2163 9.1061
0.3809 78.0 10764 0.4189 0.2173 8.8531
0.3809 79.0 10902 0.4187 0.3156 9.0776
0.3798 80.0 11040 0.4187 0.3166 8.9796
0.3798 81.0 11178 0.4187 0.3172 8.9796
0.3798 82.0 11316 0.4187 0.3177 9.0
0.3798 83.0 11454 0.4187 0.3167 9.0204
0.3799 84.0 11592 0.4187 0.3166 8.9837
0.3799 85.0 11730 0.4187 0.3174 9.0776
0.3799 86.0 11868 0.4187 0.2174 9.1469
0.3789 87.0 12006 0.4188 0.2167 8.9143
0.3789 88.0 12144 0.4187 0.2171 9.0327
0.3789 89.0 12282 0.4187 0.217 9.0531
0.3789 90.0 12420 0.4186 0.3176 9.1102
0.378 91.0 12558 0.4186 0.3182 9.0531
0.378 92.0 12696 0.4186 0.3186 9.1102
0.378 93.0 12834 0.4187 0.2177 9.0163
0.378 94.0 12972 0.4187 0.2172 9.0204
0.3768 95.0 13110 0.4186 0.2171 9.0204
0.3768 96.0 13248 0.4186 0.2171 9.0367
0.3768 97.0 13386 0.4187 0.2173 8.9959
0.3769 98.0 13524 0.4187 0.2172 8.9959
0.3769 99.0 13662 0.4187 0.2172 9.0
0.3769 100.0 13800 0.4187 0.2171 9.0204

Framework versions

  • Transformers 4.45.1
  • Pytorch 2.4.0
  • Datasets 3.0.1
  • Tokenizers 0.20.0