In this project we use kaggle spooky author dataset to build few sklearn models which are able to classify topic/sentences. This project also demonstrate the formal procedure to tackle any NLP project. The interesting part of this project are:
- We use word cloud to visualize most occurance words
- We break down the project with functional programming
- We customize the sklearn count_vectorizer class to include lemmatizer feature into it from NLTK, in order to preprocess our text in one call
- We evaluate 3 sklearn models performance.
- Deepl learning algorithm can be applied with small modification of the notebook