--- license: apache-2.0 datasets: - yongchao/gptgen_text_detection metrics: - accuracy pipeline_tag: text-classification --- # BERT-based Classification Model for AI Generated Text Detection ## Model Overview This BERT-based model is fine-tuned for the task of Ai generated text detection, especially in a TEXT-SQL senario. Please be noted that this model is still in testing phase, its validity has not been fully tested. ## Model Details - **Architecture**: BERT (bert-base-uncased) - **Training Data**: The model was trained on a dataset of 2000 labeled human and ai created questions. - **Training Procedure**: - **Epochs**: 10 - **Batch Size**: 16 - **Learning Rate**: 2e-5 - **Warmup Steps**: 500 - **Weight Decay**: 0.01 - **Model Performance**: - **Accuracy**: 85.7% - **Precision**: 82.4% - **Recall**: 91% - **F1 Score**: 86.5% ## Limitations and Ethical Considerations ### Limitations The model may not perform well on text that are significantly different from the training data. ### Ethical Considerations Be aware of potential biases in the training data that could affect the model's predictions. Ensure that the model is used in a fair and unbiased manner. ## References - **BERT Paper**: Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. - **Dataset**: [Link to the dataset](https://huggingface.co/datasets/yongchao/gptgen_text_detection)