Git Product home page Git Product logo

docker-airflow-spark's Introduction

docker-airflow-spark

Docker with Airflow + Postgres + Spark cluster + JDK (spark-submit support) + Jupyter Notebooks

๐Ÿ“ฆ The Containers

  • airflow-webserver: Airflow webserver and scheduler, with spark-submit support.

  • postgres: Postgres database, used by Airflow.

    • image: postgres:13.6
    • port: 5432
  • spark-master: Spark Master.

    • image: bitnami/spark:3.2.1
    • port: 8081
  • spark-worker[-N]: Spark workers (default number: 1). Modify docker-compose.yml file to add more.

    • image: bitnami/spark:3.2.1
  • jupyter-spark: Jupyter notebook with pyspark support.

    • image: jupyter/pyspark-notebook:spark-3.2.1
    • port: 8888

๐Ÿ›  Setup

Clone project

$ git clone https://github.com/pyjaime/docker-airflow-spark

Build airflow Docker

$ cd docker-airflow-spark/airflow/
$ docker build --rm -t docker-airflow2:latest .

Setup the sandbox

The sandbox will contain the folders where data will be persisted from the containers, and some test files. We will create the folder easily:

$ cd docker-airflow-spark/
$ cp -R sandbox-test/. ../sandbox/

Launch containers

$ cd docker-airflow-spark/
$ docker-compose -f docker-compose.yml up -d

Check accesses

๐Ÿ‘ฃ Additional steps

Create a test user for Airflow

$ docker-compose run airflow-webserver airflow users create --role Admin --username admin \
  --email admin --firstname admin --lastname admin --password admin

Edit connection from Airflow to Spark

  • Go to Airflow UI > Admin > Edit connections
  • Edit spark_default entry:
    • Connection Type: Spark
    • Host: spark://spark
    • Port: 7077

Test spark-submit from Airflow

Go to the Airflow UI and run the test_spark_submit_operator DAG :)

๐ŸŽ‰ A big thank you

THANK YOU Thiago Cordon

docker-airflow-spark's People

Contributors

pyjaime avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.