spark_for_yelpdata_analysis's Introduction

Analysis of Yelp Reviews

Data Source

Three datasets (user, review, and business) were used for this analysis. All datasets are can be found on Kaggle. The datasets were uploaded to AWS S3 bucket s3://yelpdata-ak/yelp/yelp_academic_dataset_business.json

Objectives

The objective is to use Apache Spark on AWS EMR while accessing data sets from AWS S3 to analyze a few million data points.

(1) Business dataset is used to find the most abundant business types (categories).

(2) Review and Business datasets are both used to see if reviews, who leave textual reviews, rate businesses more or less highly as compared to the overall business rating.

(3) User and Business datasets are used to see if elite reviewers rate businesses any differently than regular reviewers.

Yelp datasets on AWS S3

Recommend Projects

asyakhl / spark_for_yelpdata_analysis Goto Github PK

spark_for_yelpdata_analysis's Introduction

Analysis of Yelp Reviews

Data Source

Objectives

Yelp datasets on AWS S3

spark_for_yelpdata_analysis's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent