James Sorrentino and Jessica Zhou
In the DELIVERABLES
folder can be found all of the Jupyter notebooks used to preprocess the data and train Random Forest classifiers and LSTMs for classifying sincerity of Quora questions. This project is based on the respective Kaggle competition.
FilterQuestions.ipynb
contains code for filtering questions for biomedical content and visualizing the distribution of sincere/insincere questions in the prefiltered and filtered datasets.
RandomForest.ipynb
contains code for preprocessing the training data, and training classifiers for both unfiltered and filtered (biomedical) questions.
LSTM.ipynb
contains code for preprocessing the training data and constructing the RNN for training the LSTMs for unfiltered and filtered questions.
Run these notebooks by stepping through the cells. More detailed documentation can be found within the notebooks.
Please demo the performance of our classifiers (trained on the filtered datasets) in LiveDemoClassifiers.ipynb
. Step through the cells, which load pickled files with preprocessed data for each respective classifier, as well as the pretrained classifiers themselves.
Required packages:
- pandas
- matplotlib
- nltk
- re
- string
- numpy
- math
- sci-kit learn
- keras