Git Product home page Git Product logo

twitter_postgres's Introduction

Twitter in Postgres

You will repeat the Twitter/MapReduce assignment using Postgres. Because this assignment will involve many new programming concepts, it will be spread out over several assignments.

In this first assignment, we will focus on:

  1. working with postgres from python
  2. inserting data into the database
  3. understanding JSON/denormalized vs normalized schemas (i.e. NoSQL vs SQL)

Tasks

  1. Getting started:

    1. Fork this repo
    2. Enable github actions on your fork
    3. Clone the fork onto the lambda server
    4. Modify the README.md file so that the test case image points to your forked repo
  2. Main tasks:

    1. There are two postgres containers defined in the docker-compose.yml file services: one containes a normalized database schema, and the other is denormalized. You will need to update the ports for each database so that they do not conflict with anyone else.

      NOTE: Recall that in the pagila assignments, there was no need to adjust the ports. This is because the database was not exposed to the lambda server. In this assignment, we must expose the database to the lambda server. The load_tweets.sh and load_tweets.py scripts will be run from the lambda server.

      It would be possible to put these scripts "inside" the database image so that we wouldn't need to expose the ports. But I've put the scripts "outside" the container to give you more practice connecting to the db from a remote system.

    2. Complete the missing sections of the load_tweets.py file. This file is responsible for loading data into the normalized database. The schema for the normalized database is summarized as:

      The arrows represent foreign keys onto the primary key of the target table. The foreign keys are likely to cause you many errors when inserting your data. These errors may be frustrating, but they are actually a GOOD thing (some would even say GREAT), because Postgres is preventing you from accidentally adding corrupted data into the database.

    3. Complete the missing sections of the load_tweets.sh file. This file will both call the load_tweets.py file, and use the SQL COPY command to load data into the denormalized database.

HINT: As you debug your insert code, you may need to delete your database. Calling

$ docker-compose down

is not enough, since the database is persisted to a volume. To delete the database, you'll need to use the

$ docker volume ls
$ docker volume rm VOLUME_ID

commands to list the docker volumes and delete the appropriate volumes. Alternatively, you can use

$ docker volume prune

to delete all volumes. These commands will only work if all of your docker instances are stopped.

Submission

Upload a link to your forked github repo on sakai

GRADING: There are 9 total test cases in the sql folder. If you implement the code above correctly, then the output of the SELECT commands in each test case should be the same for each database. The assignment is worth 16 points (8 points per database). If you pass all test cases for a database, you will get 8 points. If you fail any test cases, you will receive -4 points for the first test case and -1 point for each additional failing test case. After failing 5 test cases, you will get a 0/8 for that databases (you will not go negative).

twitter_postgres's People

Contributors

mikeizbicki avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.