AGTD-v0.1 / README.md
0xnu's picture
Upload README.md
a06e72a verified
|
raw
history blame
No virus
2.96 kB
---
license: apache-2.0
datasets:
- dmitva/human_ai_generated_text
---
## 0xnu/AGTD-v0.1
The "0xnu/AGTD-v0.1" model represents a significant breakthrough in distinguishing between text written by humans and one generated by Artificial Intelligence (AI). It is rooted in sophisticated algorithms and offers exceptional accuracy and efficiency in text analysis and classification. Everything is detailed in the study and accessible [here](https://arxiv.org/abs/2311.15565).
### Training Details
```sh
Precision: 0.6269
Recall: 1.0000
F1-score: 0.7707
Accuracy: 0.7028
Confusion Matrix:
[[197 288]
[ 0 484]]
```
![Training History](training_history.png "Training History")
### Run the model
```Python
import os
os.environ["KERAS_BACKEND"] = "tensorflow"
import keras
import tensorflow as tf
import pickle
import numpy as np
from huggingface_hub import hf_hub_download
# Hugging Face repository details
REPO_ID = "0xnu/AGTD-v0.1"
MODEL_FILENAME = "human_ai_text_classification_model.keras"
TOKENIZER_FILENAME = "tokenizer.pkl"
# Download the model and tokenizer
model_path = hf_hub_download(repo_id=REPO_ID, filename=MODEL_FILENAME)
tokenizer_path = hf_hub_download(repo_id=REPO_ID, filename=TOKENIZER_FILENAME)
# Load the model
model = keras.models.load_model(model_path)
# Load the tokenizer
with open(tokenizer_path, 'rb') as tokenizer_file:
tokenizer = pickle.load(tokenizer_file)
# Input text
text = "This model trains on a diverse dataset and serves functions in applications requiring a mechanism for distinguishing between human and AI-generated text."
# Parameters (these should match the training parameters)
MAX_LENGTH = 100000
# Tokenization function
def tokenize_text(text, tokenizer, max_length):
sequences = tokenizer.texts_to_sequences([text])
padded_sequence = tf.keras.preprocessing.sequence.pad_sequences(sequences, maxlen=max_length, padding='post', truncating='post')
return padded_sequence
# Prediction function
def predict_text(text, model, tokenizer, max_length):
processed_text = tokenize_text(text, tokenizer, max_length)
prediction = model.predict(processed_text)[0][0]
return prediction
# Make prediction
prediction = predict_text(text, model, tokenizer, MAX_LENGTH)
# Interpret results
if prediction >= 0.5:
print(f"The text is likely AI-generated (confidence: {prediction:.2f})")
else:
print(f"The text is likely human-written (confidence: {1-prediction:.2f})")
print(f"Raw prediction value: {prediction}")
```
### Citation
```tex
@misc{agtd2024,
author = {Oketunji, A.F.},
title = {Evaluating the Efficacy of Hybrid Deep Learning Models in Distinguishing AI-Generated Text},
year = 2023,
version = {v3},
publisher = {arXiv},
doi = {https://doi.org/10.48550/arXiv.2311.15565},
url = {https://arxiv.org/abs/2311.15565}
}
```
### Copyright
(c) 2024 [Finbarrs Oketunji](https://finbarrs.eu). All Rights Reserved.