Git Product home page Git Product logo

airflowctl's People

Contributors

ahnsv avatar kaxil avatar pre-commit-ci[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

airflowctl's Issues

In background mode, the pid in background_process_ids is off by one

I am trying out airflowctl, and when I setup a new environment and then run it in the background, the PID that is saved in the background_process_ids file is always off by one. It is actually the PID of the shell that ran the command. At the moment I just edit the file, increment the number, and then subsequent commands (stop for eg) then work.

airflowctl and GitHub usage

Hello there, I'm not sure if this is the right place to ask this question, but I was wondering what is the recommended way to use airflowctl, specifically settings.yaml file, if you want to put an airflow project into a GitHub repo.

I initially thought that you create the project in your dev environment using airflowctl init <project_name> ... as suggested in the docs. Then, you push the project to the GitHub repo, and once the project is ready for deployment, you just clone the repo into the production environment where you simply run airflowctl build provided settings.yaml already contains all the necessary variables for the project to work, but that would be a bad idea from a security perspective if you included connections with credentials in that file, which is my case.

So I don't know If I just have to add settings.yaml to .gitignore and create it manually after I clone the repo in production since I already have to do so with the .env file.

I will appreaciate any suggestion or guidance you can provide. Thanks in advance.

Error with Airflow Cmd Forwarding

I suspect that this line is raising an exception where none should be raised. The error I get is: Key 'venv_path' not found in settings file. But I believe the venv_path is optional and should fall back on None if the project hasn't configured one (if I understand the code correctly).

integration into existing virtual environment

Hi, I'm a new user of this project.
I wonder if this project intends to support usecase where users have existing virutal environment.
FYI, I am trying to integrate this project as our local development environment. Since we already installed airflow package, it doesn't need to re-install airflow dependencies.

Add checks before running "build" command to verify Airflow installation succeeds

Airflow installation can fail because of the following:

  • Constraints file doesn't exist because Python version wasn't supported when that specific Airflow version was released
  • Verify passed python version is in the "official" supported Python version on Airflow (https://github.com/apache/airflow#requirements) and verify other things in that requirements like DB version if someone had configured a different metadata DB or an

Running of Airflow can fail because of the following:

"source not found error" when executing "airflowctl start command"

When i run the following command:

❯ airflowctl start

I got this error:
Verifying Airflow installation...
/bin/sh: 1: source: not found
Error starting Airflow: Command 'source /home/peter/Desktop/Code/airflow/airflow_2.6.3/.venv/bin/activate && airflow version' returned non-zero exit status 127.

I installed pyenv and airflowctl on Ubuntu 20.04.6 LTS

Any clue about why the "Verifying Airflow installation... process " use sh rather than bash?

Thanks in advance

Add "Next Steps" doc/epilog at the end of running init and build

Similar to the following, there should be a "Next Steps" doc printed after someone runs airflowctl build
to show them the following and their usage:

For init:

  • settings.yaml (airflow_version and python_version)
  • requirements.txt
  • dags directory

For build:

  • Running DAGs
  • Activating venv and running airflow commands
  • Changing executor (and configuring postgres) (probably ??)

next_steps += f"""
* You can now run all the "airflow" commands in your terminal. For example:
[bold blue]$ airflow version[/bold blue]
* Run Apache Airflow in standalone mode using the following command:
[bold blue]$ airflow standalone[/bold blue]
* Access the Airflow UI in your web browser at: [bold cyan]http://localhost:8080[/bold cyan]
For more information and guidance, please refer to the Apache Airflow documentation:
[bold cyan]https://airflow.apache.org/docs/apache-airflow/{version}/[/bold cyan]

Add example dags for different personas in airflowctl

Add an option --persona in the airflowctl init command with the following options:

  • Data Engineer
  • Data Scientist
  • Data Analyst
  • DevOps Engineer

Add directories for those persona in

Build new DAGs for each persona and based on the selected persona, copy those example DAGs over when someone runs `airflowctl init --persona "data-engineer".

support different forks of airflow

use case

all the dags we write for use on gcp composer composer lineage, so probably wouldnt render in vanilla apache airflow.

description

google store their composer code in this repo: https://github.com/GoogleCloudPlatform/composer-airflow
the branch name designates which version of airlfow+composer to install: https://github.com/GoogleCloudPlatform/composer-airflow/branches/all
these are not published to public pip

possible implementation

on init offer the options to specify:

  • airflow fork repo github url & branch name
  • or maybe a local path to the airflow code?

Remove pip as the primary installation method

airflowctl is affected by the chicken-and-egg problem of Python version. No surprisingly it has the same burden as other Python environment & project managers (e.g., hatch, pipx, pyenv) :

If we start with pip, the version used in airflowctl built must match the same version as your pip environment:

  $ python -V
  Python X.Y.Z

  $ python -m pip install airflowctl
  $ airflowctl init --python-version=X.Y.Z .
  $ airflowctl build .

Otherwise pyenv is invoked to get the correct python version (forcing users to use a specific version manager).

The above is the ideal instruction if we're going to use pip. With the current instructions, you can run into a lot of issues if you have one of these situation:

  • --python-version is not the same as the version used in pip install airflowctl
  • --python-version is not specified
  • pyenv is not available
  • python is a different version than the one installed with pip. For example:
    • In a broken environment where python and pip are related
    • User later on upgraded Python and did not reinstall airflowctl.

Here are my suggestions:

  • Installation instructions should remove pip as the primary method.
  • Ideally, airflowctl behaviors should not depend on the Python version it is installed to.
  • airflowctl build should fail if python_version does not exist in $PATH. Error message should ask users to have the correct Python version.
    • Alternatively, users can override which Python path to build the virtual environment with.
  • Provide a Python-agnostic installation method (e.g., brew, apt, yum, winget, choco, etc.)

Alternatively, although a bigger rewrite, switch to using hatch or rye.

`build` installs packages into the global user space when running from a venv

airflowctl build appears to install packages into the "root" (i.e. the non-venv) python environment, which causes dependency conflicts and ultimately fails to initialize the project properly. The expected behavior is to have the packages installed in the venv newly created by airflowctl. Below is the process I go through for setting up an airflowctl project:

  1. Create a dedicated venv for running airflowctl ("actl").
  2. Activate actl and pip install airflowctl.
  3. Use airflowctl init ... to initialize a new project.
  4. Use airflowctl build ... to set the project up.

I'm using python 3.9 on CentOS.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.