Git Product home page Git Product logo

helenaschatz / crowdfunding_etl Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 1.0 489 KB

Builded an ETL pipeline using Python, Pandas, Python dictionary methods and regular expressions to extract and transform the data. Created four CSV files and use the CSV file data to create an ERD and a table schema. Finally, uploaded the CSV file data into a Postgres database.

Jupyter Notebook 100.00%
erdiagram etl-pipeline pandas postgresql python regex

crowdfunding_etl's Introduction

Crowdfunding ETL

A collaborative project to build an ETL pipeline using Python, Pandas, and PostgreSQL for extracting, transforming, and loading crowdfunding data from Excel files into a relational database.

ETL (Extract, Transform and Load) data processing is an automated procedure that extracts relevant information from raw data, converts it into a format that fulfills business requirements and loads it into a target system.

demo

Instructions

The instructions for this mini project are divided into the following subsections:

  1. Create the Category and Subcategory DataFrames
  2. Create the Campaign DataFrame
  3. Create the Contacts DataFrame
  4. Create the Crowdfunding Database

Features

  • Utilize Python and Pandas in order to:
    • Extract and transform crowdfunding and contact data from Excel files
    • Create and export Category, Subcategory, Campaign, and Contact DataFrames as CSV files
  • Utilize PostgreSQL in order to:
    • Design an ERD and table schema for the database
    • Create and populate PostgreSQL database tables

Data

  1. For our analysis, we have extracted and transformed data from the following datasets available in the Resources folder.

    • contacts.xlsx
    • crowdfunding.xlsx
  2. Then, we loaded our transformed data into crowdfunding_db PostgreSQL database.

    • category.csv
    • subcategory.csv
    • contacts.csv
    • campaign.csv

How to Run

  • Clone the repository.

  • Install PostgresSQL

  • Install required Python packages: pandas, numpy and openpyxl.

  • In order to run our analysis, first select Python Environment (Python 3.9.13) in Jupyter Notebook.

  • Select "Run All" in ETL_Mini_Project.ipynb file, which will Extract, Transform data, and Create CSV files.

    • All CSV files will be located in Resources folder.
  • Set up a PostgreSQL server to create a new database called crowdfunding_db.

  • In Crowdfunding DB folder, use provided crowdfunding_db_schema.sql file to create tables in PostgreSQL crowdfunding_db database.

    • Import each CSV file into its corresponding SQL table in the following order: category, subcategory, contacts, and campaign.
    • Query the database to verify the data has been loaded correctly, by running a SELECT statement for each.

demo

Outcome

In this project, we have demonstrated the ETL pipeline utilizing Python, Pandas, and a combination of both Python dictionary and regular expression techniques for data extraction and transformation. We were able to successfully generate four CSV files, and use the data in these files to design an ERD as well as table schema, then loaded the data into PostgresSQL database from the CSV files via dataframes.

Resources and Libraries

Programming Languages Database Libraries Software Modules
Python demo PostgreSQLdemo Pandas demo QuickDBD demo datetime as dt
numpy demo Jupyter Notebook demo pprint
json

Team Members:

Helena Fedorenko
Jason Barbagallo
Xing Ying Chen
Nancy Santiago
Anthony Parry

crowdfunding_etl's People

Contributors

helenaschatz avatar xc1614 avatar jbarbs44 avatar nancyrsantiago avatar

Watchers

Kostas Georgiou avatar  avatar

Forkers

xc1614

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.