Git Product home page Git Product logo

wow3_recommender's Introduction

Wow3 recommender

This is the demo website for our project. http://104.154.117.206:8080/index.html   (This website is up for short periods)

Contents

  • Data Processing : ./data
  • Modeling : ./model
  • Web development : ./web

Data Processing

Overview

  • Data source is "Yelp Dataset Challenge"
  • https://www.yelp.com/dataset_challenge
  • We decided to use following data sets
  • Review data: yelp_academic_dataset_review.csv
  • Business data: yelp_academic_dataset_business.csv
  • User data: yelp_academic_dataset_user.csv

Steps

  • Steps for loading and making clean data
  • ./data/mysql_data_preparation.txt
  • We used MYSQL and python to manipulate data
  • MYSQL has innodb storage engine. We added index key on the columns that are used for joining so that they can do join or select operation fast.
  • We output clean files in the format of CSV file
  • Script of creating a graph information of users for System G
  • ./data/create_user_graph.py
  • We clean data so that they can use System G
  • Script of creating data used at web system using ML/NLP model
  • ./model/Prediction_func.ipynb

Modeling

  • Nearest Neighbors Recommendation: ./model/Recommend1.ipynb

  • We utilized nearest neighbors algorithm to help user to find out the top 20 restaurants based on his or her current geolocation. After adding features “stars_business”, “stars_review”, “average_stars”, “latitude” and “longitude” , we built the nearest neighbors model by tuning coefficients for each feature and take the highest “start_business”, highest “stars_review”, highest “average_stars”, current “latitude” and current “longitude” as test data to find out the 20 nearest neighbors.

  • Topic Model and Sentiment Analysis with Key Word: ./model/Topic Model2-Key Word Sentiment Analysis.ipynb

  • To build the topic model and conduct sentiment analysis, we first grouped by all review text data for each business and preprocessed data by removing stopwords, removing punctuation, tokenizing and doing stemming and lemmatization.Then, according to the input keyword, we chose a certain number (top 3 in our model) of most related businesses. Then we applied the Latent Dirichlet Allocation to find the most popular ten topics for that business. For these popular topics in each business, we used classification algorithm to give the specific sentiment analysis for each topic. In this way, we can provide user with not only the most popular and related information with key word, but also some sentimental reference for their to make their choices. UI will return all word clouds describing these popular topics , where user can have a clear understanding about this business.

  • Sentiment Analysis Classification about Review: ./model/Sentiment Analysis3-Review Classification.ipynb

  • Sometimes some review are ambiguous about corresponding business. To provide more clear review, we also did sentiment analysis towards review data. We first grouped by all review text data for each business and preprocessed data by removing stopwords, removing punctuation, tokenizing and doing stemming and lemmatization. Then we regarded those review data with five stars as positive and those review data with no more than two stars as negative. We took them as training set to build the sentiment analysis classification model. For those ambiguous review, we can use this model to extract the sentiment behind it and give our user more direct information.

  • User Rate Prediction by Time Series Model: ./model/Time Series4.ipynb

  • Sometimes average review star from the yelp can not give you a comprehensive understanding about the immediate feedback about food or service quality. To compensate that, we built the time series model to give the monthly trend of review star. We firstly grouped by the monthly review stars based on their mean values. Then we computed the moving average for the time series and performed Dickey-Fuller test to analyze the results. Besides, we tried different mathematical transformation towards monthly data. After removing the trend and seasonality, we visualized the autocorrelation function and partial autocorrelation function. According to them, we decided the parameter for the autoregressive model(AR), moving average model(RA) and autoregressive moving average(ARMA) model. Then we applied these models to our data and transformed results to the original scale. In the end, we used several statistics to evaluate the model to make our final decision.

System G, Graph analysis

  • Script for inputting graph data
  • ./data/systemg.txt
  • We used glemlin to input data, extract information, and visualize user graph

Web development

Overview

  • Codes for web system is stored at ./web/
  • We used Python, HTML, CSS(Bootstrap), JavaScript(jQuery, CanvasJS)

How to set up web system

  • Start up python web server for port 8080
  • $ twistd -no web --path=. &
  • Start up python web server for port 7777 (REST API)
  • $ python server.py
  • This is for REST API. REST API accepts GET request and return json
  • Modify the URL for your own environment at ./web/js/map.js
  • var server = 'http://[your_server_address]:7777/get/';

Others

Notification

  • Following file/directory are not submitted since the data sizes would be large
  • ./data/source/wow3_all_mysql.csv
  • ./data/source/wow3_business_mysql.csv
  • ./data/source/wow3_review_mysql.csv
  • ./data/source/wow3_user_mysql.csv.csv
  • ./web/data/business_LDA.csv
  • ./web/data/rate
  • If you run the web server, please prepare these data

Data download

wow3_recommender's People

Contributors

hs2865 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.