Mar 23, 2023

Hello,

You said you trained with your own proprietary dataset, but can you talk about how your dataset was structured?
I want to train something similar to you, maybe with more fields, but I'm not quite sure how to structure my training dataset.

Kind regards
lars

to-be

Owner Mar 23, 2023

duplicate of https://huggingface.co/spaces/to-be/invoice_document_headers_extraction_with_donut/discussions/2

to-be changed discussion status to closed Mar 23, 2023

Altimis

Feb 13

Hello @to-be .Thank you for the demo. Since the dataset is confidential, I have 2 questions regarding the training that you've done please :

What hyperparameters did you choose for your training (I have a similar dataset -- as I've seen int your 3 test invoices --) ?
How were you able to use a different input size (I get an error indicating a mismatch when I change the default input size used in Donut) ?

Thank you in advance.

to-be

Owner Feb 19

train_batch_sizes:
- 1
  val_batch_sizes:
- 2
  input_size:
- 1600
- 1280
  max_length: 256
  align_long_axis: False
  num_nodes: 1
  seed: 2022
  lr: 3e-05
  warmup_steps: 300
  num_training_samples_per_epoch: 1200
  max_epochs: 100
  max_steps: -1
  num_workers: 4
  val_check_interval: 1.0
  check_val_every_n_epoch: 3
  gradient_clip_val: 1.0
from transformers import VisionEncoderDecoderConfig

max_length = 768
image_size = [1920, 1280]
#image_size = [1280, 960]

update image_size of the encoder

during pre-training, a larger image size was used

config = VisionEncoderDecoderConfig.from_pretrained("naver-clova-ix/donut-base")
config.encoder.image_size = image_size # (height, width)

update max_length of the decoder (for generation)

config.decoder.max_length = max_length

to-be
/

donut-base-finetuned-invoices

Question Regarding the trainingsset

update image_size of the encoder

during pre-training, a larger image size was used

update max_length of the decoder (for generation)