The winemag-data-130k-v2-formatted.json file contains a list of review for wine from various users.
We would like you to demonstrate Python, database and API knowledge to provide some insight into the wine reviews.
Construct a database schema and user called 'vino' with password 'vino'
Create two tables. One called 'reviews' which matches structure of the JSON records and another 'userinfo' that contains the following fields for a given Twitter user:
- id - autogenerated primary key
- name - name of the user
- description - a description of the user
- profile_image_url - Profile image URL
- followers_count - Count of their followers
Write a script that reads and parses the json file then inserts the data into the MySQL database.
Write a script that queries the MySQL database table and list all users with a Twitter handle, fetches the data required for the 'userinfo' table from the twitter API, and inserts that data into the MySQL database.
Write a script that counts the number of unique reviewers in the reviews table.
Write a script that ouputs users with five or more reviews.
Write a script that looks at the Twitter users and calculates a score for followers_count * number of reviews for that user.
The output you should include in the final submission (as a compressed zip file) should be:
- Table Population Script (Python) for managing the database (create, drop, list structure)
- Twitter User Population script (Python) for querying the database, the API and inserting data into the userinfo table
- Output file for the unique reviewers
- Output file for the twitter followers/reviews
- Output file for the users with 5 reviews or more
- A dumped MySQL database structure
- Suitable tests for the scripts
A few notes:
- Please DO NOT put this on Github.
- Please provide this as a Dropbox link and email the link back to [email protected]