File size: 2,958 Bytes

4088a0c
e4e2411
835a894
 
4088a0c
ce5abbc
4fb6f94
9cb476d
4fb6f94
9cb476d
e4e2411
9cb476d
e4e2411
 
 
 
 
 
 
 
9cb476d
 
bbbd679
 
e4e2411
9cb476d
 
e4e2411
 
9cb476d
e4e2411
 
 
 
 
3a54614
e4e2411
 
 
 
8c084ea
e4e2411
 
 
8c084ea
e4e2411
 
8c084ea
e4e2411
 
 
8c084ea
 
 
 
e4e2411
 
8c084ea
e4e2411
 
 
 
 
8c084ea
e4e2411
 
 
 
 
8c084ea
e4e2411
 
8c084ea
e4e2411
 
 
8c084ea
e4e2411
85e3d87
e4e2411
5df930e
ee22a6f
 
 
 
 
 
 
a06e72a
ee22a6f

---
license: apache-2.0
datasets:
- dmitva/human_ai_generated_text
---

## 0xnu/AGTD-v0.1

The "0xnu/AGTD-v0.1" model represents a significant breakthrough in distinguishing between text written by humans and one generated by Artificial Intelligence (AI). It is rooted in sophisticated algorithms and offers exceptional accuracy and efficiency in text analysis and classification. Everything is detailed in the study and accessible [here](https://arxiv.org/abs/2311.15565).

### Training Details

```sh
Precision: 0.6269
Recall: 1.0000
F1-score: 0.7707
Accuracy: 0.7028
Confusion Matrix:
[[197 288]
 [  0 484]]
```

![Training History](training_history.png "Training History")

### Run the model

```Python
import os
os.environ["KERAS_BACKEND"] = "tensorflow"

import keras
import tensorflow as tf
import pickle
import numpy as np
from huggingface_hub import hf_hub_download

# Hugging Face repository details
REPO_ID = "0xnu/AGTD-v0.1"
MODEL_FILENAME = "human_ai_text_classification_model.keras"
TOKENIZER_FILENAME = "tokenizer.pkl"

# Download the model and tokenizer
model_path = hf_hub_download(repo_id=REPO_ID, filename=MODEL_FILENAME)
tokenizer_path = hf_hub_download(repo_id=REPO_ID, filename=TOKENIZER_FILENAME)

# Load the model
model = keras.models.load_model(model_path)

# Load the tokenizer
with open(tokenizer_path, 'rb') as tokenizer_file:
    tokenizer = pickle.load(tokenizer_file)

# Input text
text = "This model trains on a diverse dataset and serves functions in applications requiring a mechanism for distinguishing between human and AI-generated text."

# Parameters (these should match the training parameters)
MAX_LENGTH = 100000

# Tokenization function
def tokenize_text(text, tokenizer, max_length):
    sequences = tokenizer.texts_to_sequences([text])
    padded_sequence = tf.keras.preprocessing.sequence.pad_sequences(sequences, maxlen=max_length, padding='post', truncating='post')
    return padded_sequence

# Prediction function
def predict_text(text, model, tokenizer, max_length):
    processed_text = tokenize_text(text, tokenizer, max_length)
    prediction = model.predict(processed_text)[0][0]
    return prediction

# Make prediction
prediction = predict_text(text, model, tokenizer, MAX_LENGTH)

# Interpret results
if prediction >= 0.5:
    print(f"The text is likely AI-generated (confidence: {prediction:.2f})")
else:
    print(f"The text is likely human-written (confidence: {1-prediction:.2f})")

print(f"Raw prediction value: {prediction}")
```

### Citation

```tex
@misc{agtd2024,
  author       = {Oketunji, A.F.},
  title        = {Evaluating the Efficacy of Hybrid Deep Learning Models in Distinguishing AI-Generated Text},
  year         = 2023,
  version      = {v3},
  publisher    = {arXiv},
  doi          = {https://doi.org/10.48550/arXiv.2311.15565},
  url          = {https://arxiv.org/abs/2311.15565}
}
```

### Copyright

(c) 2024 [Finbarrs Oketunji](https://finbarrs.eu). All Rights Reserved.