--- base_model: huggingface/CodeBERTa-small-v1 tags: - generated_from_trainer model-index: - name: training results: [] --- # training This model is a fine-tuned version of [huggingface/CodeBERTa-small-v1](https://huggingface.co/huggingface/CodeBERTa-small-v1) on an [my a dataset curated from The Technical Debt Dataset](https://huggingface.co/datasets/davidgaofc/techdebt). # dataset citation Valentina Lenarduzzi, Nyyti Saarimäki, Davide Taibi. The Technical Debt Dataset. Proceedings for the 15th Conference on Predictive Models and Data Analytics in Software Engineering. Brazil. 2019. ## Model description Classifies cleaned diffs of code. * 1: exhibits possible technical debt * 0: is probably clean ## Intended uses & limitations Limited by many things probably, use with caution. Improvements in progress. ## Training and evaluation data ~95% accurate on the test split of dataset above ~.94 F1 score on test split of dataset above. ## Training procedure One epoch of training done on the dataset above. ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 30 - eval_batch_size: 30 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 1 ### Framework versions - Transformers 4.35.0 - Pytorch 2.1.0+cu118 - Datasets 2.14.6 - Tokenizers 0.14.1