OCR-LayoutLMv3
This model is a fine-tuned version of microsoft/layoutlmv3-base on the funsd-layoutlmv3 dataset. It achieves the following results on the evaluation set:
- Loss: 0.9788
- Precision: 0.8989
- Recall: 0.9051
- F1: 0.9020
- Accuracy: 0.8404
Model description
LayoutLMv3 is a pre-trained multimodal Transformer for Document AI with unified text and image masking. The simple unified architecture and training objectives make LayoutLMv3 a general-purpose pre-trained model. For example, LayoutLMv3 can be fine-tuned for both text-centric tasks, including form understanding, receipt understanding, and document visual question answering, and image-centric tasks such as document image classification and document layout analysis.
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei, Preprint 2022.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- training_steps: 2000
Training results
Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|---|---|---|
No log | 1.33 | 100 | 0.6966 | 0.7418 | 0.8063 | 0.7727 | 0.7801 |
No log | 2.67 | 200 | 0.5767 | 0.8104 | 0.8644 | 0.8365 | 0.8117 |
No log | 4.0 | 300 | 0.5355 | 0.8246 | 0.8852 | 0.8539 | 0.8295 |
No log | 5.33 | 400 | 0.5240 | 0.8706 | 0.8922 | 0.8813 | 0.8427 |
0.5326 | 6.67 | 500 | 0.6337 | 0.8528 | 0.8778 | 0.8651 | 0.8260 |
0.5326 | 8.0 | 600 | 0.6870 | 0.8698 | 0.8828 | 0.8762 | 0.8240 |
0.5326 | 9.33 | 700 | 0.6584 | 0.8723 | 0.9061 | 0.8889 | 0.8342 |
0.5326 | 10.67 | 800 | 0.7186 | 0.8868 | 0.9031 | 0.8949 | 0.8335 |
0.5326 | 12.0 | 900 | 0.6822 | 0.9040 | 0.9076 | 0.9058 | 0.8526 |
0.1248 | 13.33 | 1000 | 0.7042 | 0.8872 | 0.9021 | 0.8946 | 0.8511 |
0.1248 | 14.67 | 1100 | 0.7920 | 0.9027 | 0.9036 | 0.9032 | 0.8480 |
0.1248 | 16.0 | 1200 | 0.8052 | 0.8964 | 0.9151 | 0.9056 | 0.8389 |
0.1248 | 17.33 | 1300 | 0.8932 | 0.8995 | 0.9066 | 0.9030 | 0.8329 |
0.1248 | 18.67 | 1400 | 0.8728 | 0.8950 | 0.9061 | 0.9005 | 0.8398 |
0.0442 | 20.0 | 1500 | 0.9051 | 0.8960 | 0.9116 | 0.9037 | 0.8347 |
0.0442 | 21.33 | 1600 | 0.9587 | 0.8947 | 0.9031 | 0.8989 | 0.8401 |
0.0442 | 22.67 | 1700 | 0.9822 | 0.9042 | 0.9046 | 0.9044 | 0.8389 |
0.0442 | 24.0 | 1800 | 0.9734 | 0.9043 | 0.9061 | 0.9052 | 0.8391 |
0.0442 | 25.33 | 1900 | 0.9842 | 0.9042 | 0.9091 | 0.9066 | 0.8410 |
0.0225 | 26.67 | 2000 | 0.9788 | 0.8989 | 0.9051 | 0.9020 | 0.8404 |
Framework versions
- Transformers 4.25.0.dev0
- Pytorch 1.12.1
- Datasets 2.6.1
- Tokenizers 0.13.1
- Downloads last month
- 36
Space using jinhybr/OCR-LayoutLMv3 1
Evaluation results
- Precision on funsd-layoutlmv3self-reported0.899
- Recall on funsd-layoutlmv3self-reported0.905
- F1 on funsd-layoutlmv3self-reported0.902
- Accuracy on funsd-layoutlmv3self-reported0.840