from joblib import load from sklearn.feature_extraction.text import TfidfVectorizer import numpy as np import streamlit as st info = [ {"title": "NAME", "detail": "AKINBITAN TAIWO EMMANUEL"}, {"title": "MATRIC NO", "detail": "HNDCOM/22/032"}, {"title": "CLASS", "detail": "HND2"}, {"title": "LEVEL", "detail": "400L"}, {"title": "PROJECT SUPERVISOR", "detail": ""}, ] st.title("Project Information") for item in info: st.write(f"{item['title']}: {item['detail']}") st.image('fcahpt.jpg', caption='federal college of animal health and production technology') st.header('Spam Detection using Naive Bayes Classifier') st.write('This is spam detection developed with python using Naive Bayes Classifier') vectorizer = load('tfidf_vectorizer.joblib') user_input = st.text_area("Enter some text:", "") if user_input is not None: x = vectorizer.transform([user_input]) model = load('Naive_Bayes_Spam_Detection.joblib') pred = model.predict(x) if pred[0] == 1: st.markdown("Prediction: The entered text is likey to be a Spam, be careful ", unsafe_allow_html=True) elif pred[0] == 0: st.markdown("Prediction: The entered text is not a Spam and safe", unsafe_allow_html=True) else: st.write('Error, Try again') st.header("Project Description") st.markdown(""" Spam Detection using Naive Bayes Classifier is a classic and effective approach for automatically identifying spam emails or messages. In a comprehensive approach of how it works; """) st.header("1. Data Collection and Preprocessing:") st.markdown(""" - The process begins with collecting a dataset of emails or messages labeled as spam or non-spam (ham). - Each message undergoes preprocessing steps such as removing HTML tags, punctuation, and stopwords (commonly occurring words like "and", "the", etc.). - The text is then tokenized and transformed into numerical representations using techniques like TF-IDF (Term Frequency-Inverse Document Frequency) or Count Vectorization. """) st.header("2. Understanding Naive Bayes Classifier:") st.markdown(""" - Naive Bayes is a probabilistic classification algorithm based on Bayes' theorem, which calculates the probability of a certain event happening given the occurrence of another event. - The "naive" assumption in Naive Bayes is that the features are conditionally independent given the class label. This simplifies the calculation and makes the algorithm computationally efficient. """) st.header("3. Training the Naive Bayes Model:") st.markdown(""" - The dataset is split into training and testing sets. - During training, the Naive Bayes classifier learns the probability distribution of words or features given each class (spam or ham). - It calculates the prior probabilities of spam and ham messages and the likelihood probabilities of each word occurring in spam and ham messages. - These probabilities are estimated from the training data using maximum likelihood estimation or other smoothing techniques. """) st.header("4. Classification:") st.markdown(""" - Once the model is trained, it can classify new, unseen messages. - Given a new message, the classifier calculates the probability that it belongs to each class (spam or ham) using Bayes' theorem. - The final classification decision is based on the class with the highest probability. If the probability of a message being spam is higher than a predefined threshold, it's classified as spam; otherwise, it's classified as ham. """) st.header("5. Model Evaluation:") st.markdown(""" - The performance of the Naive Bayes classifier is evaluated using metrics such as accuracy, precision, recall, and F1-score on a separate test dataset. - These metrics help assess how well the model generalizes to unseen data and its effectiveness in distinguishing between spam and non-spam messages. """) st.header("6. Deployment and Fine-Tuning:") st.markdown(""" - Once the model is trained and evaluated, it can be deployed for real-world use. - Deployment may involve integrating the model into email systems or messaging platforms to automatically filter spam messages. - Periodic updates and fine-tuning of the model may be necessary to adapt to changing spamming techniques and patterns. """)