Git Product home page Git Product logo

dataengtechtest's Introduction

Mango Data Engineering Technical Problem

The winemag-data-130k-v2-formatted.json file contains a list of review for wine from various users.

We would like you to demonstrate Python, database and API knowledge to provide some insight into the wine reviews.

Insight Problem

Schema Creation

Construct a database schema and user called 'vino' with password 'vino'

Table Population

Create two tables. One called 'reviews' which matches structure of the JSON records and another 'userinfo' that contains the following fields for a given Twitter user:

  • id - autogenerated primary key
  • name - name of the user
  • description - a description of the user
  • profile_image_url - Profile image URL
  • followers_count - Count of their followers

Write a script that reads and parses the json file then inserts the data into the MySQL database.

Twitter User Population

Write a script that queries the MySQL database table and list all users with a Twitter handle, fetches the data required for the 'userinfo' table from the twitter API, and inserts that data into the MySQL database.

Unique Reviewers Query

Write a script that counts the number of unique reviewers in the reviews table.

n Reviewers Query

Write a script that ouputs users with five or more reviews.

Twitter Followers/Reviews

Write a script that looks at the Twitter users and calculates a score for followers_count * number of reviews for that user.

Submission

The output you should include in the final submission (as a compressed zip file) should be:

  • Table Population Script (Python) for managing the database (create, drop, list structure)
  • Twitter User Population script (Python) for querying the database, the API and inserting data into the userinfo table
  • Output file for the unique reviewers
  • Output file for the twitter followers/reviews
  • Output file for the users with 5 reviews or more
  • A dumped MySQL database structure
  • Suitable tests for the scripts

A few notes:

  • Please DO NOT put this on Github.
  • Please provide this as a Dropbox link and email the link back to [email protected]

dataengtechtest's People

Contributors

gingerbenw avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.