Git Product home page Git Product logo

analysis-and-recommendation-on-yelp's Introduction

Analysis-and-Recommendation-on-YELP

Analysis and Recommendation on YELP dataset

Objective:

To provide useful insights using YELP dataset for businesses through big data analytics to determine strengths and weaknesses, so that existing owners and future business owners can make decision on new businesses or business expansion. Also to provide recommendation to both business owners and users by extensive analysis on data.

Project Overview:

The project involves analysis on the dataset, visualization based on analysis and recommendations. Major modules of the project are:

  1. Validation of reviews on businesses based on user information.
  2. Classification of positive and negative reviews using Machine Learning techniques.
  3. Recommending location based “buzzwords” to future business owners by analyzing positive reviews and negative reviews for a businesses in a state.
  4. User-specific recommendations using user’s history of availed services. Recommendations are provided based on categories of the services, location of the business, user reviews and user ratings.

Analysis was done on the dataset to understand correlation between different metrics like - location of business and its success, etc. Analysis on business trends based on location, ratings, category and attributes of the business was performed. Trends of closed businesses was observed using user reviews and ratings.

Visualizations for the project were done using python libraries and are stored in visualization folder.

Project presentation can be found at Prezi WIN ARYD.exe executed on a windows OS fr

Steps for execution:

Dataset for the project should be downloaded from Yelp dataset challenge and stored in yelp-dataset folder. The codes should be executed in the following specified order in:

${SPARK_HOME}/bin/spark-submit business_etl.py
${SPARK_HOME}/bin/spark-submit user_etl.py
${SPARK_HOME}/bin/spark-submit review_classification.py
${SPARK_HOME}/bin/spark-submit review_etl.py

The following files can be executed in any order:

${SPARK_HOME}/bin/spark-submit user_recomm.py "'CxDOIDnH8gp9KXzpBHJYXw'"
# user name can be changed to obtain recommendations for different users
${SPARK_HOME}/bin/spark-submit user_analysis.py
${SPARK_HOME}/bin/spark-submit top_reviews.py
${SPARK_HOME}/bin/spark-submit business_analysis.py
${SPARK_HOME}/bin/spark-submit restaurant_analysis.py
${SPARK_HOME}/bin/spark-submit topic_mod_pos.py
${SPARK_HOME}/bin/spark-submit topic_mod_neg.py
${SPARK_HOME}/bin/spark-submit topics.py
${SPARK_HOME}/bin/spark-submit word_cloud.py
${SPARK_HOME}/bin/spark-submit ngram_word_cloud.py

Optional execution for converting data to json format for visualization:

${SPARK_HOME}/bin/spark-submit converttojson.py

Files:

-- business location - outliers removed using euclidean distance from avg location of businesses in state (Data Cleaning)

-- users's location -- user validation score

-- classification of reviews (Machine Learning)

-- joined classes to reviews and dropped not so useful columns

-- location based recommendations -- category based recommendations -- overall recommendations

-- most availed category of business by an user -- average stars given by user for each category -- number of positive and negative reviews given by a user

-- chose top 10 positive and top 10 negative reviews based on validation score for business with maximum reviews

-- average review count and stars by city and category -- average review count and stars by state and category -- business attribute based analysis -- average stars for open and closed businesses -- top 15 business categories -- top 15 business categories - city-wise -- cities with most businesses -- businesses with more 5 star ratings

-- top 20 restaurants on yelp (viz) -- restaurants with most funny, cool, useful reviews (viz)

-- topic modeling using positive reviews for businesses in Pennsylvania

-- topic modeling using negative reviews for businesses in Ontario

-- extracted terms and topics from the model saved from topic modeling

-- most frequent words from tips and review for Earl (viz) -- most frequent words from tips and review for Ontario (viz) -- most frequent words from tips and review for top 20 restaurants (viz) -- most frequent words from tips and review for bottom 20 restaurants (viz)

-- wordcloud NGrams from tips review -- wordcloud NGrams from tips review for Arizona

-- converting parquet ETLed files to JSON format for visualization purposes

Folders:

-- outputs after classification of reviews and etl steps on datasets will be stored

-- outputs of all the visualizations will be stored here

-- all results of topic modelling will be saved here

-- all results of analysis will be stored here

analysis-and-recommendation-on-yelp's People

Contributors

subikshaa avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.