|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- yongchao/gptgen_text_detection |
|
metrics: |
|
- accuracy |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# BERT-based Classification Model for AI Generated Text Detection |
|
|
|
## Model Overview |
|
This BERT-based model is fine-tuned for the task of Ai generated text detection, especially in a TEXT-SQL senario. |
|
Please be noted that this model is still in testing phase, its validity has not been fully tested. |
|
|
|
## Model Details |
|
- **Architecture**: BERT (bert-base-uncased) |
|
- **Training Data**: The model was trained on a dataset of 2000 labeled human and ai created questions. |
|
- **Training Procedure**: |
|
- **Epochs**: 10 |
|
- **Batch Size**: 16 |
|
- **Learning Rate**: 2e-5 |
|
- **Warmup Steps**: 500 |
|
- **Weight Decay**: 0.01 |
|
- **Model Performance**: |
|
- **Accuracy**: 85.7% |
|
- **Precision**: 82.4% |
|
- **Recall**: 91% |
|
- **F1 Score**: 86.5% |
|
|
|
## Limitations and Ethical Considerations |
|
|
|
### Limitations |
|
The model may not perform well on text that are significantly different from the training data. |
|
|
|
### Ethical Considerations |
|
Be aware of potential biases in the training data that could affect the model's predictions. Ensure that the model is used in a fair and unbiased manner. |
|
|
|
## References |
|
- **BERT Paper**: Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. |
|
- **Dataset**: [Link to the dataset](https://huggingface.co/datasets/yongchao/gptgen_text_detection) |
|
|