amazon_electronic_product_reviews_analysis's Introduction

Big Data Analytics Project

Amazon electronic products review analysis

In this project, we are building the system using the Amazon Electronics dataset that contains information of more than 60,000 electronic products. The dataset includes productID, Review Time, and Review of the products.

This system is programed using PySpark and Hive environment. And user interface is build on the jupyter notebook connected with the Pyspark. After typing in the product ID, this system will output several positive/negative features and the relativity of these features. At the same time, we can also obtain the graph that show how the scores of this product is changing with time and number of reviews. In the end, we also showed the LDA result for the positive and negative reviews. .

This folder is orgarnized as follows.

proj/
├── algorithm analysis/
├── data/
├── scraper/
└── final script/

If you want to run the final version of script locally, please download all scripts and data to the bin folder under pyspark. Run the productanalysis.ipynb on notebook connected to the pyspark

algorithm analysis

feature_extraction_comparison.ipynb Compare different features
feature_selection_model_analysis.ipynb Compare different features
plot.ipynb Basic statistical analysis

data

whitelist
productinfo ( generated by Hive)
productname

final script

productanalysis.ipynb (Final version)
plot.py

scraper

Extractinfo.ipynb (Final version)

Recommend Projects

mmyyl / amazon_electronic_product_reviews_analysis Goto Github PK

amazon_electronic_product_reviews_analysis's Introduction

Big Data Analytics Project

Amazon electronic products review analysis

amazon_electronic_product_reviews_analysis's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent