This repository contains a machine learning project focused on the detection of spam messages using the Naive Bayes algorithm. By employing statistical techniques and natural language processing (NLP), we build and evaluate a model capable of classifying text data as spam or not spam.
- Scikit-learn
- NumPy
The project uses a collection of labeled emails, with a binary classification of 'Spam' or 'Not Spam'.
The process of creating the spam detection model includes:
- Vectorizing the text data using
CountVectorizer
to convert text data into numerical data that can be used by the machine learning algorithm. - Training a
MultinomialNB
classifier on the processed data. - Evaluating the model's performance using various metrics such as accuracy, precision, recall, and F1 score.
A simple yet effective Naive Bayes classifier is trained with the following steps:
vectorizer = CountVectorizer()
X_train = vectorizer.fit_transform(train_text)
After training, the model is evaluated to determine its effectiveness at spam detection. The performance is quantified using the following metrics:
- Accuracy
- Precision
- Recall
- F1 Score
The results section should provide an overview of the model's classification abilities and potential areas for improvement.
SpamDetection.ipynb
: The Jupyter notebook containing the step-by-step process for training the spam detection model and evaluating its performance.
Clone this repository and run the Jupyter notebook to view the code and analysis. Ensure that you have all the required libraries installed on your machine.
The Spam Detection with Naive Bayes project demonstrates a practical application of machine learning in text classification. The model can be further refined and potentially integrated into email clients to help filter out unwanted messages.