A distilBERT based Phishing Email Detection Model
Model Overview
This model is based on DistilBERT and has been fine-tuned for multilabel classification of Emails and URLs as safe or potentially phishing.
Key Specifications
- Base Architecture: DistilBERT
- Task: Multilabel Classification
- Fine-tuning Framework: Hugging Face Trainer API
- Training Duration: 3 epochs
Performance Metrics
- F1-score: 97.717
- Accuracy: 97.716
- Precision: 97.736
- Recall: 97.717
Dataset Details
The model was trained on a custom dataset of Emails and URLs labeled as legitimate or phishing. The dataset is available at cybersectony/PhishingEmailDetection
on the Hugging Face Hub.
Usage Guide
Installation
pip install transformers
pip install torch
Quick Start
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/model-name")
model = AutoModelForSequenceClassification.from_pretrained("your-username/model-name")
def predict_email(email_text):
# Preprocess and tokenize
inputs = tokenizer(
email_text,
return_tensors="pt",
truncation=True,
max_length=512
)
# Get prediction
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
# Get probabilities for each class
probs = predictions[0].tolist()
# Create labels dictionary
labels = {
"legitimate_email": probs[0],
"phishing_url": probs[1],
"legitimate_url": probs[2],
"phishing_url_alt": probs[3]
}
# Determine the most likely classification
max_label = max(labels.items(), key=lambda x: x[1])
return {
"prediction": max_label[0],
"confidence": max_label[1],
"all_probabilities": labels
}
Example Usage
# Example usage
email = """
Dear User,
Your account security needs immediate attention. Please verify your credentials.
Click here: http://suspicious-link.com
"""
result = predict_email(email)
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.2%}")
print("\nAll probabilities:")
for label, prob in result['all_probabilities'].items():
print(f"{label}: {prob:.2%}")
- Downloads last month
- 82
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for cybersectony/phishing-email-detection-distilbert_v2.1
Base model
distilbert/distilbert-base-uncased