- nltk
- numpy
- scipy
- sklearn
- xgboost
- gensim
- matplotlib
- pandas
- util.py: utility functions
- Amazon_Review_Helpfulness_Prediction.ipynb: A jupyter notebook file to intergrate all the functions to do feature extraction, feature selection, model training, hyper parameter optimization and model evaluation. To run this notebook, you need to put the data set (reviews_Office_Products.json and meta_Office_Products.json) into the same folder.
We use the amazon product data from Julian UCSD by sending a request to him (The description page for the data set is http://jmcauley.ucsd.edu/data/amazon/). This dataset contains product reviews and metadata from Amazon, including 143.7 million reviews regarding 1.2 million products spanning May 1996 - July 2014.