Two-class classification, or binary classification, may be the most widely applied kind of machine-learning problem. In this example, you’ll learn to classify movie reviews as positive or negative, based on the text content of the reviews.
References
[1] The IDMB raw dataset : http://mng.bz/0tIo
[2] https://github.com/pytorch/examples/blob/master/mnist/main.py
[4] https://towardsdatascience.com/unit-testing-and-logging-for-data-science-d7fb8fd5d217
[5] https://github.com/CoreyMSchafer/code_snippets/blob/master/Decorators/decorators.py
[7] https://nextjournal.com/gkoehler/pytorch-mnist
IMDB dataset: a set of 50,000 highly polarized reviews from the Internet Movie Database. They’re split into 25,000 reviews for training and 25,000 reviews for testing, each set consisting of 50% negative and 50% positive reviews.
Traditional Techniques: Logistic Regression, Multilayer Perceptron
In this section, we use unittest to verify the accuracy and confusion matrix of method "Logistic Regression".
- Recurrent Neural Network: LSTM
In this project, as you see, we did:
-
How to explore extensions to a baseline model to improve training and predicting capacity.
-
How to use unit tests for the API and the model
-
How to use unit tests for logging
-
Can all of the unit tests be run with a single script and do all of the unit tests pass?
-
How to monitor performance
-
How to compare multiple models
-
How to use visualizations for the EDA investigation