This R&D / technical watch project was created to compare and benchmark 2 workflow management tools: Airflow 2.0 and Prefect. Following the release of Airflow 2.0, this repository was used as an illustrative / sandbox environment to highlight the main differences that still exist between these solutions when it comes to the orchestration of machine learning workflows.
You can follow the guidelines below to schedule and orchestrate a dummy data science training workflow based on the iris dataset. This will help you get an idea of the philosophy behind each tool.
Before running the server for the first time, run the following command to configure Prefect for local orchestration:
prefect backend server
Note the server requires Docker and Docker Compose to be running. To start the server, UI, and all required infrastructure, run:
prefect server start
Once all components are running, you can view the Prefect UI by visiting http://localhost:8080.
Please note that executing flows from the server requires at least one Prefect Agent to be running:
prefect agent local start
We are now ready to use Prefect Core Server. Let's create a new project for the purpose of this demo:
prefect create project airflow_prefect_contest
Finally, to register any flow with the server, call the register method:
flow.register()
Airflow needs a home, ~/airflow is the default, but you can lay foundation somewhere else if you prefer:
export AIRFLOW_HOME=~/airflow
Initialize the database:
airflow db init
Start the web server. The default port is 8080 but we'll use 8081 to avoid a conflict with Prefect's web server:
airflow webserver -p 8081
You can now view the Airflow UI by visiting http://localhost:8081.
Start the scheduler:
airflow scheduler
Create an admin user account for yourself:
airflow users create --username admin
--firstname <your-firstname>
--lastname <your-lastname>
--role Admin
--email <your-email>