trocr-base-printed-synthetic_dataset_ocr

This model is a fine-tuned version of microsoft/trocr-base-printed on an unknown dataset.

Model description

Here is the link to my code for this model: https://github.com/DunnBC22/Vision_Audio_and_Multimodal_Projects/tree/main/Optical%20Character%20Recognition%20(OCR)/20%2C000%20Synthetic%20Samples%20Dataset

Intended uses & limitations

This model could be used to read labels with printed text.

Training and evaluation data

Here is the link to the dataset that I used for this model: https://www.kaggle.com/datasets/ravi02516/20k-synthetic-ocr-dataset

Character Length for Training Dataset:

Character Length for Evaluation Dataset:

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 1
mixed_precision_training: Native AMP

Training results

CER = 0.003 (Actually, 0.002896524170994806)

Framework versions

Transformers 4.26.1
Pytorch 1.13.1+cu116
Datasets 2.10.1
Tokenizers 0.13.2

*Note: Please make sure to give proper credit to the owner(s) of the data and developers of the model (microsoft/trocr-base-printed).

Model Checkpoint

@misc{li2021trocr, title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models}, author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei}, year={2021}, eprint={2109.10282}, archivePrefix={arXiv}, primaryClass={cs.CL}}

Metric (Character Error Rate [CER])

@inproceedings{morris2004, author = {Morris, Andrew and Maier, Viktoria and Green, Phil}, year = {2004}, month = {01}, pages = {}, title = {From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition.} }

DunnBC22
/

trocr-base-printed-synthetic_dataset_ocr