arxyzan commited on
Commit
9492734
1 Parent(s): 820b26f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -5
README.md CHANGED
@@ -8,10 +8,15 @@ tags:
8
  - image-to-text
9
  pipeline_tag: image-to-text
10
  ---
11
- A CRNN model for Persian OCR. This model is based on a simple CNN + LSTM architecture inspired by [this paper](https://arxiv.org/abs/1507.05717). This is a successor model to
12
- our previous model [hezarai/crnn-base-fa-64x256](https://huggingface.co/hezarai/crnn-base-fa-64x256). The dataset for training this model was almost 5 times larger and the
13
- maximum output length supported by this model has been increased from 32 to 48 characters. (The model can actually output 96 characters including blank but to tackle CTC decoding challenges no samples
14
- longer than 48 characters have been fed to the model).
15
 
16
- Note that this model is only optimized for printed/scanned documents and supports up to 50-ish characters. (For an end-to-end OCR pipeline, use a text detector model first to
 
 
 
 
 
 
 
 
17
  extract text boxes preferrably in word-level and then use this model), but it can be used to be fine-tuned on other domains like license plate or handwritten texts.
 
8
  - image-to-text
9
  pipeline_tag: image-to-text
10
  ---
11
+ A CRNN model for Persian OCR. This model is based on a simple CNN + LSTM architecture inspired by [this paper](https://arxiv.org/abs/1507.05717).
 
 
 
12
 
13
+ This is a successor model to our previous model [hezarai/crnn-base-fa-64x256](https://huggingface.co/hezarai/crnn-base-fa-64x256).
14
+ The improvements include:
15
+ - 5X larger dataset
16
+ - Change input image size from 64x256 to 32x384
17
+ - Increase max output length from 64 to 96 (Max length of the samples in the dataset was 48 to handle CTC loss issues)
18
+ - Support numbers and special characters (see id2label in `model_config.yaml`)
19
+ - Auto-handling of LTR characters like digits in between the text
20
+
21
+ Note that this model is only optimized for printed/scanned documents and works best on texts with a length of up to 50-ish characters. (For an end-to-end OCR pipeline, use a text detector model first to
22
  extract text boxes preferrably in word-level and then use this model), but it can be used to be fine-tuned on other domains like license plate or handwritten texts.