Git Product home page Git Product logo

twitter_postgres's Introduction

Twitter in Postgres

You will repeat the Twitter/MapReduce assignment using Postgres. Because this assignment will involve many new programming concepts, it will be spread out over several assignments.

In this first assignment, we will focus on:

  1. working with postgres from python
  2. inserting data into the database
  3. understanding JSON/denormalized vs normalized schemas (i.e. NoSQL vs SQL)

Tasks

  1. Getting started:

    1. Fork this repo
    2. Enable github action on your fork
    3. Clone the fork onto the lambda server
    4. Modify the README.md file so that the test case image points to your forked repo
  2. Main tasks:

    1. There are two postgres containers defined in the docker-compose.yml file ports: one containes a normalized database schema, and the other is denormalized. You will need to update the ports for each database so that they do not conflict with anyone else.

      NOTE: Recall that in the pagila assignments, there was no need to adjust the ports. This is because the database was not exposed to the lambda server. In this assignment, we must expose the database to the lambda server. The load_tweets.sh and load_tweets.py scripts will be run from the lambda server.

      It would be possible to put these scripts "inside" the database image so that we wouldn't need to expose the ports. But I've put the scripts "outside" the container to give you more practice connecting to the db from a remote system.

    2. Complete the missing sections of the load_tweets.py file. This file is responsible for loading data into the normalized database. The schema for the normalized database is summarized as:

      The arrows represent foreign keys onto the primary key of the target table. The foreign keys are likely to cause you many errors when inserting your data. These errors may be frustrating, but they are actually a GOOD thing (some would even say GREAT), because Postgres is preventing you from accidentally adding corrupted data into the database.

    3. Complete the missing sections of the load_tweets.sh file. This file will both call the load_tweets.py file, and use the SQL COPY command to load data into the denormalized database.

    4. Grading Note: There are 9 total test cases in the sql folder. If you implement the code above correctly, then the output of the SELECT commands in each test case should be the same for each database. Each test case is worth 1 point per database, for 18 points total.

    5. HINT: As you debug your insert code, you may need to delete your database. Calling

      $ docker-compose down
      

      is not enough, since the database is persisted to a volume. To delete the database, you'll need to use the

      $ docker volume ls
      $ docker volume rm VOLUME_ID
      

      commands to list the docker volumes and delete the appropriate volumes. Alternatively, you can use

      $ docker volume prune
      

      to delete all volumes.

  3. Upload a link to your forked github repo on sakai

twitter_postgres's People

Contributors

mikeizbicki avatar tonnpo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.