File size: 2,958 Bytes
4088a0c e4e2411 835a894 4088a0c ce5abbc 4fb6f94 9cb476d 4fb6f94 9cb476d e4e2411 9cb476d e4e2411 9cb476d bbbd679 e4e2411 9cb476d e4e2411 9cb476d e4e2411 3a54614 e4e2411 8c084ea e4e2411 8c084ea e4e2411 8c084ea e4e2411 8c084ea e4e2411 8c084ea e4e2411 8c084ea e4e2411 8c084ea e4e2411 8c084ea e4e2411 8c084ea e4e2411 85e3d87 e4e2411 5df930e ee22a6f a06e72a ee22a6f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
---
license: apache-2.0
datasets:
- dmitva/human_ai_generated_text
---
## 0xnu/AGTD-v0.1
The "0xnu/AGTD-v0.1" model represents a significant breakthrough in distinguishing between text written by humans and one generated by Artificial Intelligence (AI). It is rooted in sophisticated algorithms and offers exceptional accuracy and efficiency in text analysis and classification. Everything is detailed in the study and accessible [here](https://arxiv.org/abs/2311.15565).
### Training Details
```sh
Precision: 0.6269
Recall: 1.0000
F1-score: 0.7707
Accuracy: 0.7028
Confusion Matrix:
[[197 288]
[ 0 484]]
```
![Training History](training_history.png "Training History")
### Run the model
```Python
import os
os.environ["KERAS_BACKEND"] = "tensorflow"
import keras
import tensorflow as tf
import pickle
import numpy as np
from huggingface_hub import hf_hub_download
# Hugging Face repository details
REPO_ID = "0xnu/AGTD-v0.1"
MODEL_FILENAME = "human_ai_text_classification_model.keras"
TOKENIZER_FILENAME = "tokenizer.pkl"
# Download the model and tokenizer
model_path = hf_hub_download(repo_id=REPO_ID, filename=MODEL_FILENAME)
tokenizer_path = hf_hub_download(repo_id=REPO_ID, filename=TOKENIZER_FILENAME)
# Load the model
model = keras.models.load_model(model_path)
# Load the tokenizer
with open(tokenizer_path, 'rb') as tokenizer_file:
tokenizer = pickle.load(tokenizer_file)
# Input text
text = "This model trains on a diverse dataset and serves functions in applications requiring a mechanism for distinguishing between human and AI-generated text."
# Parameters (these should match the training parameters)
MAX_LENGTH = 100000
# Tokenization function
def tokenize_text(text, tokenizer, max_length):
sequences = tokenizer.texts_to_sequences([text])
padded_sequence = tf.keras.preprocessing.sequence.pad_sequences(sequences, maxlen=max_length, padding='post', truncating='post')
return padded_sequence
# Prediction function
def predict_text(text, model, tokenizer, max_length):
processed_text = tokenize_text(text, tokenizer, max_length)
prediction = model.predict(processed_text)[0][0]
return prediction
# Make prediction
prediction = predict_text(text, model, tokenizer, MAX_LENGTH)
# Interpret results
if prediction >= 0.5:
print(f"The text is likely AI-generated (confidence: {prediction:.2f})")
else:
print(f"The text is likely human-written (confidence: {1-prediction:.2f})")
print(f"Raw prediction value: {prediction}")
```
### Citation
```tex
@misc{agtd2024,
author = {Oketunji, A.F.},
title = {Evaluating the Efficacy of Hybrid Deep Learning Models in Distinguishing AI-Generated Text},
year = 2023,
version = {v3},
publisher = {arXiv},
doi = {https://doi.org/10.48550/arXiv.2311.15565},
url = {https://arxiv.org/abs/2311.15565}
}
```
### Copyright
(c) 2024 [Finbarrs Oketunji](https://finbarrs.eu). All Rights Reserved.
|