Git Product home page Git Product logo

aws-snowflake-dp-template's Introduction

Data Platform

This repo contains the code for a data platform with 2 environments:

  1. dev: A development platform running on docker
  2. prd: A production platform running on aws + k8s

Setup

  1. Create + activate virtual env:
python3 -m venv .venv/dpenv
source .venv/dpenv/bin/activate
  1. Install + init phidata:
pip install phidata
phi init -l

If you encounter errors, try updating pip using python -m pip install --upgrade pip

  1. Setup workspace:
phi ws setup
  1. Copy secrets:
cp -r workspace/example_secrets workspace/secrets
  1. Run dev platform on docker:
phi ws up dev:docker

If something fails, run again with debug logs:

phi ws up -d

Optional: Create .env file:

cp example.env .env

Using the dev environment

The workspace/dev directory contains the code for the dev environment. Run it using:

phi ws up dev:docker

Run Airflow

  1. Set dev_airflow_enabled=True in workspace/settings.py and run phi ws up dev:docker
  2. Check out the airflow webserver running in the airflow-ws-container:
  • url: http://localhost:8310/
  • user: admin
  • pass: admin

Run Jupyter

  1. Set dev_jupyter_enabled=True in workspace/settings.py and run phi ws up dev:docker
  2. Check out jupyterlab running in the jupyter-container:
  • url: http://localhost:8888/
  • pass: admin

Validate workspace

Validate the workspace using: ./scripts/validate.sh

This will:

  1. Format using black
  2. Type check using mypy
  3. Test using pytest
  4. Lint using ruff
./scripts/validate.sh

If you need to install packages, run:

pip install black mypy pytest ruff

Install workspace

Install the workspace & python packages in the virtual env using:

./scripts/install.sh

This will:

  1. Install python packages from requirements.txt
  2. Install the workspace in --editable mode

Add python packages

Following PEP-631, add dependencies to the pyproject.toml file.

To add a new package:

  1. Add the module to the pyproject.toml file.
  2. Run: ./scripts/upgrade.sh to update the requirements.txt file.
  3. Run phi ws up dev:docker -f to recreate images + containers

Add airflow providers

Airflow requirements are stored in the workspace/dev/airflow/resources/requirements-airflow.txt file.

To add new airflow providers:

  1. Add the module to the workspace/dev/airflow/resources/requirements-airflow.txt file.
  2. Run phi ws up -f --name airflow to recreate images + containers

Stop workspace

phi ws down

Restart workspace

phi ws restart

Add environment/secret variables

The containers read env using the env_file and secrets using the secrets_file params which by default point to files in the workspace/env and workspace/secrets directories.

Airflow

To add env variables to your airflow containers:

  1. Update the workspace/env/dev_airflow_env.yml file.
  2. Restart all airflow containers using: phi ws restart dev:docker:airflow

To add secret variables to your airflow containers:

  1. Update the workspace/secrets/dev_airflow_secrets.yml file.
  2. Restart all airflow containers using: phi ws restart dev:docker:airflow

Test a DAG

# ssh into airflow-worker | airflow-ws
docker exec -it airflow-ws-container zsh
docker exec -it airflow-worker-container zsh

# Test run the DAGs using module name
python -m workflow.dir.file

# Test run the DAG file
python /mnt/workspaces/data-platform/workflow/dir/file.py

# List DAGs
airflow dags list

# List tasks in DAG
airflow tasks list \
  -S /mnt/workspaces/data-platform/workflow/dir/file.py \
  -t dag_name

# Test airflow task
airflow tasks test dag_name task_name 2022-07-01

aws-snowflake-dp-template's People

Contributors

ashpreetbedi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.