Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

#Fine-tuned on the base IAM handwritten TrOCR model

Datasets used were:

  • Imgur5k
  • English Handwritten Characters from Kaggle

Outliers was removed using Z-Score for the image width and IQR for the image height. Cleaned Dataset: 208141 Original Dataset: 210122

Note that only 20 percent of the data was used and used random sampling of value 42. Number of training examples: 33302 Number of validation examples: 8326

I used these training arguments based on GPT's suggestion because it would be too expensive for me to run the original configuration with 100 percent of the data.

training_args = Seq2SeqTrainingArguments( predict_with_generate=True, eval_strategy="epoch", per_device_train_batch_size=16, per_device_eval_batch_size=16, fp16=True, output_dir="./", logging_steps=500, save_steps=5000, eval_steps=1000, num_train_epochs=2 )

CER: 0.082 WER: 0.184


license: mit datasets: - staghado/IMGUR-dataset language: - en metrics: - cer base_model: - microsoft/trocr-base-handwritten

Downloads last month
6
Safetensors
Model size
334M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .