Indo Spam Chatbot

Model Overview

Indo Spam Chatbot is a fine-tuned spam detection model based on the Gemma 2 2B architecture. This model is specifically designed for identifying spam messages in WhatsApp chatbot interactions. It has been fine-tuned using a dataset of 40,000 spam messages collected over a year. The dataset includes two labels:

Spam
Non-spam

The model supports detecting spam across multiple categories, such as:

Offensive and abusive words
Profane language
Gibberish words and numbers
Spam links
And more

How To Use

Using this model becomes easy when you have transformers installed:

pip install -U transformers

Then you can use the model like this:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Spam sentence
sentences = ["adsfwcasdfad", 
             "kak bisa depo di link ini: http://dewa.site/dewa/dewi", 
             "p", 
             "1234"]

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('kasyfilalbar/indo-spam-chatbot')
model = AutoModelForSequenceClassification.from_pretrained('kasyfilalbar/indo-spam-chatbot', device_map = "auto")

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

with torch.no_grad():
    encoded_input = encoded_input.to('cuda')
    model_output = model(**encd_sent)
    model_output = model_output.logits
    label = torch.argmax(model_output, dim=1)

print(label.item())

REPOSITORY

for more info about the code, you could visit https://github.com/Kasyfil97/indo-spam-chatbot

kasyfilalbar
/

indo-spam-chatbot

Indo Spam Chatbot

Model Overview

How To Use

REPOSITORY

Model tree for kasyfilalbar/indo-spam-chatbot