CLICK HERE to see the notebook in Jupyter nbviewer
The specification is able to tag every legitimate sms message correctly, while filtering out over 99% of spam messages.
The dataset was downloaded from Kaggle, and can also be found at the UCI Machine Learning Repository. Acknowledgements to Tiago A. Almeida and José María Gómez Hidalgo, creators of the original dataset. More information can be found here.
The GloVe encodings used are the 50 dimensional, 400K vocabulary from the the glove.6B.zip
. More information and download link can be found here.
precision recall f1-score
not spam 1.00 1.00 1.00
spam 1.00 0.99 0.99
avg 1.00 1.00 1.00
AUC = 0.993
Train Accuracy = 0.998
Test Accuracy = 0.998
- glove_sms.ipynb: Jupyter Notebook containing the classification models.
- glove_sms_utils.py: Python scrypt containing some auxiliary functions.
- cm_heat_plots.py: Python scrypt containing functions to create confusion matrix plots.
- spelling_v2.py: Python scrypt containing a spelling corrector.
- data: Auxiliary data files used for the spelling corrector and the corrector's output. NOTE: it does not contain the data for the analysis, that data can be downloaled here direct link.
- media: Output and auxiliary images.
- README.md: this file.