med277's Introduction

MED277

Introduction to Biomedical Natural Language Processing Final Project

James Sorrentino and Jessica Zhou

In the DELIVERABLES folder can be found all of the Jupyter notebooks used to preprocess the data and train Random Forest classifiers and LSTMs for classifying sincerity of Quora questions. This project is based on the respective Kaggle competition.

FilterQuestions.ipynb contains code for filtering questions for biomedical content and visualizing the distribution of sincere/insincere questions in the prefiltered and filtered datasets.
RandomForest.ipynb contains code for preprocessing the training data, and training classifiers for both unfiltered and filtered (biomedical) questions.
LSTM.ipynb contains code for preprocessing the training data and constructing the RNN for training the LSTMs for unfiltered and filtered questions.
Run these notebooks by stepping through the cells. More detailed documentation can be found within the notebooks.

Please demo the performance of our classifiers (trained on the filtered datasets) in LiveDemoClassifiers.ipynb. Step through the cells, which load pickled files with preprocessed data for each respective classifier, as well as the pretrained classifiers themselves.

Required packages:

pandas
matplotlib
nltk
re
string
numpy
math
sci-kit learn
keras

Recommend Projects

zrcjessica / med277 Goto Github PK

med277's Introduction

MED277

Introduction to Biomedical Natural Language Processing Final Project

med277's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent