Git Product home page Git Product logo

crumbybr / launchpad2dataeng Goto Github PK

View Code? Open in Web Editor NEW

This project forked from thetechhustle/launchpad2dataeng

0.0 0.0 0.0 6 KB

Launchpad2DataEng offers a comprehensive introduction to the skills and concepts essential for a successful career in data engineering, all while fostering an open-source community where learners and experts alike can share insights, ask questions, and contribute to the growth of everyone involved.

License: MIT License

Python 100.00%

launchpad2dataeng's Introduction

Add by BobbyD

Project-Based Data Engineering Learning

Welcome to our open-source project aimed at fostering practical data engineering skills! This project is inspired by the approach of breaking into data engineering with zero cost and a focus on hands-on projects. We will guide you through setting up a real-world data pipeline using modern tools and technologies like Python, BigQuery/Snowflake, and Astronomer. Whether you're a beginner looking to dive into data engineering or an experienced professional aiming to brush up on your skills, this project i...

Overview

This project outlines a step-by-step approach to building a data engineering pipeline, from sourcing data to implementing quality checks. We focus on practical, project-based learning to equip you with the skills needed to excel in the field of data engineering.

What You Will Build

  • A Python script to fetch data from a REST API.
  • A process to dump this data into a CSV file initially.
  • A Snowflake or BigQuery setup to manage your data in the cloud.
  • An automated pipeline using Astronomer to ingest data on a scheduled basis.
  • Data quality checks to ensure the integrity of your data.

Getting Started

Before you begin, make sure you have the following prerequisites:

  • Python installed on your machine.
  • An account with Snowflake or BigQuery (free tiers are available).
  • An account with Astronomer.

Installation & Setup

  1. Find a Data Source: Choose a data source you are interested in (e.g., stock market, Pokémon, sports data). Make sure it offers a REST API.
  2. Python Script for Data Fetching: Clone this repository and navigate to the script directory. Modify the script to point to your chosen data source.
  3. Snowflake/BigQuery Account: Follow the instructions on their website to set up a free trial account. Modify the script to dump data into your Snowflake/BigQuery instance instead of a CSV.
  4. Astronomer for Automation: Set up an account and follow the instructions to automate your data ingestion.
  5. Data Quality Checks: Implement data quality checks using Great Expectations or your custom checks.

Contributing

We welcome contributions from the community! Whether it's adding new features, improving documentation, or reporting bugs, your contributions are greatly appreciated.

  • Fork the Repository: Start by forking this repository to your GitHub account.
  • Create a Pull Request: After making your changes, create a pull request against our repository. Please provide a clear description of your changes.
  • Code Review: Your pull request will be reviewed by our team. We may suggest some changes or improvements.

License

This project is open-source and available under the MIT License.

Acknowledgments

launchpad2dataeng's People

Contributors

bdorlus avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.