--- license: apache-2.0 tags: - generated_from_trainer model-index: - name: model_5M results: [] --- # model_5M This model is a fine-tuned version of [gsarti/it5-large](https://huggingface.co/gsarti/it5-large) on a dataset of Common Procurement Vocabulary (CPV) codes. ## Model description The model is trained on 3.2M pairs of Italian tender descriptions and the corresponding CPV code. Here an example: > {"source": "lavori lavori di pavimentazione delle vie san martino e santa Maddalena", "target": "45262321-7 - lavori di pavimentazione"} ## Intended uses & limitations This model can generate a CPV code given an Italian tender description. ## Training and evaluation data Training data are taken form the [ANAC website](https://dati.anticorruzione.it/opendata). ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 16 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 3.0 ### Training results ### Framework versions - Transformers 4.26.1 - Pytorch 1.13.1+cu117 - Datasets 2.9.0 - Tokenizers 0.13.2