astronomer / astro-cli Goto Github PK
View Code? Open in Web Editor NEWCLI that makes it easy to create, test and deploy Airflow DAGs to Astronomer
Home Page: https://www.astronomer.io
License: Other
CLI that makes it easy to create, test and deploy Airflow DAGs to Astronomer
Home Page: https://www.astronomer.io
License: Other
I just installed the DigitalOcean CLI doctl
and the way they do auth is really clean. You just create an auth token in the DO web app that's only shown once, then run $ doctl auth init
and paste it in.
This would also improve the security of CLI deployments so a user could provide a separate key than their own password on a shared CLI server.
@andscoop @schnie @cwurtz What do you guys think? Are we cool with doing the same setup?
Edit: I was bored, so I took a quick stab at seeing what I could do in 15 minutes here. I'll probably lean on someone more experienced in Go and the CLI for help, but here's a start:
https://github.com/astronomerio/astro-cli/compare/access-token
Current astro config commands only set configs in the project directory, or "fail" silently if not in a project directory. We need to add a global option to set configs in the users home directory.
Currently when starting an airflow cluster we prefix container names with a scrubbed version of the project name.
Ex. Before Fix
Project Name/ Dir Name:: astro-dags
Docker Container Prefix: astrodags
I propose we don't strip hyphens and other valid characters and instead replace them with underscores. This would apply to any other valid directory characters we capture.
Ex1. After Fix
Project Name/ Dir Name: astro-dags
Docker Container Prefix: astro_dags
Ex2. After Fix
Project Name/ Dir Name: astro dags
Docker Container Prefix: astro_dags
It's overwhelming to have all the example connections on the Admin > Connections page. It would be nice to clear these out and provide a single connection to the Airflow Postgres instance.
We need to allow users to override configuration settings through the CLI, rather than editing config files by hand. For example, setting the --global
registry URL.
I think the old CLI has this functionality so that can be referenced as we copy it over.
Expected: If I try to log in but haven't created a cluster yet EE is not yet installed, the CLI will give me an error message indicating what to do.
Actual:
$ astro auth login
panic: exit status 1
goroutine 1 [running]:
github.com/astronomerio/astro-cli/docker.Exec(0xc42022bd10, 0x2, 0x2, 0xb, 0xc42022bd18)
/Users/taylor/go/src/github.com/astronomerio/astro-cli/docker/docker.go:24 +0xeb
github.com/astronomerio/astro-cli/auth.Login()
/Users/taylor/go/src/github.com/astronomerio/astro-cli/auth/auth.go:9 +0x5c
github.com/astronomerio/astro-cli/cmd.authLogin(0x19f0f20, 0x1a1fec8, 0x0, 0x0)
/Users/taylor/go/src/github.com/astronomerio/astro-cli/cmd/auth.go:42 +0x20
github.com/astronomerio/astro-cli/vendor/github.com/spf13/cobra.(*Command).execute(0x19f0f20, 0x1a1fec8, 0x0, 0x0, 0x19f0f20, 0x1a1fec8)
/Users/taylor/go/src/github.com/astronomerio/astro-cli/vendor/github.com/spf13/cobra/command.go:702 +0x2c6
github.com/astronomerio/astro-cli/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0x19ef5a0, 0x1, 0x19f0260, 0xc42022bf60)
/Users/taylor/go/src/github.com/astronomerio/astro-cli/vendor/github.com/spf13/cobra/command.go:783 +0x30e
github.com/astronomerio/astro-cli/vendor/github.com/spf13/cobra.(*Command).Execute(0x19ef5a0, 0xc42022bf70, 0x15240f9)
/Users/taylor/go/src/github.com/astronomerio/astro-cli/vendor/github.com/spf13/cobra/command.go:736 +0x2b
main.main()
/Users/taylor/go/src/github.com/astronomerio/astro-cli/main.go:10 +0x2d
@andscoop What are your thoughts on this?
Pauses containers, keeps data
Implement OAuth login flow similar to what Greg outlined in #67
The flow should be:
User types auth command, a link should get output for the user to auth with. They will copy/paste the link, and be directed to a webpage. This will click a button which will open a popup to auth with google (or whatever other provides we eventually support). Once auth'd the window will close and they will be redirected to a page with a link for them to copy/paste back into the CLI.
clean up libcompose logging + log out URLs
What needs to be installed as prereqs for our CLI?
A user needs to be aware of what projects they have running in the cloud for supplying arguments to deploy.
Suggested Behavior
When a user runs astro airflow deploy
with no supplied release name, CLI will prompt user with a list of valid release names they have access to.
A new command that returns a list of projects (deploys) a user has access to.
astro airflow ls
is a possibilitySPIKE
How much of this should live in houston? Have we drawn lines around what features the CLI wraps vs what houston will wrap?
As a user, having scheduler logs when debugging a custom plugin or workflow would be really helpful. Often times, the error message exposed in the UI doesn't contain the full traceback.
Something like astro airflow start --logs
or something would be a great.
I know a few customers have bought this up for cloud CLI, so just wanted to make sure that feedback was captured here.
Houston fetch/get queries have been updated to be more gql standard. As such, fetchDeployments
is now just deployments
. The call in the CLI should get updated asap to mitigate how many users will have outdates versions of the CLI when the deprecated fetchDeployments
query is deleted
We need to rip out the clickstream commands.
When installing as a user with go get github.com/astronomerio/astro-cli
, the current binary is named astro-cli
.
$ ls $GOBIN
astro-cli
Instead it should be just astro
consistent with installing as a dev with make build
. However, it looks like it might not be possible to have the binary named differently than the package when installing via go get
.
http://lucasfcosta.com/2017/02/07/Understanding-Go-Dependency-Management.html
@schnie / @cwurtz Do you know if it's possible? If not, should we just have users set alias astro=astro-cli
in their .bash_profile for now (assuming they aren't using the old CLI)?
astro airflow auth
should prompt user with an example cmd output for setting their registry if no registry is specified.
Suggested Output
astro config set docker.registry.authority registry.EXAMPLE_DOMAIN.com
We currently have scaffolded out an airflow create
command which will create deployments on the users cluster. We also have an airflow kill
command which will destructively stop a local running airflow instance.
This can be confusing for users reviewing list of commands as they appear to be related. I think we should consider renaming airflow create
to indicate that it will be pushing new containers to your cluster.
airflow provision
seems to better describe what is happening.
We really shouldn't need to have the user/password in the global config file for the registry. Would be great if we can use the creds created after we astro auth login
.
Output a link to their local Airflow cluster after an astro airflow up
to give users easier/quicker access to their running docker containers.
Proposal to rename ./astro/config.json
to ./astro/astro.json
in order to avoid conflicts between configurations in the old CLI and the new CLI. This could make dev and internal work easier while we continue to support two CLIs.
With the CLI it would be cool from a UX perspective if I could do:
$ astro workspace set <workspace-name>
Instead of:
$ astro config set project.workspace <workspace-id>
As a user, it feels like choosing my active workspace naturally falls under the workspace command group.
This was first pain point I hit having multiple workspaces and doing a deploy.
Note: This would require making workspace names unique as they currently are not.
Also, it would be nice to have the CLI still default to the default workspace even when multiple have been created (vs defaulting to none which I think it is doing currently).
What do others think (especially if you've already tried multiple workspaces today)? @andscoop @schnie @ryw @cwurtz
A customer is running into this when trying to install onto his ec2:
curl -sL https://install.astronomer.io | sudo bash
astronomerio/astro-cli: checking GitHub for latest tagastronomerio/astro-cli: found version: https://api.github.com/repos/astronomerio/astro-cli/releases/10068145 for https://api.github.com/repos/astronomerio/astro-cli/releases/10068145/linux/amd64astronomerio/astro-cli: downloading https://github.com/astronomerio/astro-cli/releases/download/https://api.github.com/repos/astronomerio/astro-cli/releases/10068145/astro_https://api.github.com/repos/astronomerio/astro-cli/releases/10068145_linux_amd64.tar.gzcurl:
(22) The requested URL returned error: 404 Not Found
I run the same command on my computer (also running ubuntu) and it works.
His OS:
OS - Linux 4.9.81-44.57.amzn2.x86_64 #1 SMP Mon Feb 19 17:51:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linuxdocker version -
Docker version 17.06.2-ce, build 3dfb8343b139d6342acfd9975d7f1068b5b1c3d3 (edited)
on bash terminal
I propose that we use two commands for these behaviors
astro airflow stop
A non-destructive wrapper around docker stop which maintains mounted volumes. Similar to how some may view a pause command
astro airflow kill
A destructive tear down of containers. How astro airflow stop
currently works at time of issue creation.
Some users will need the ability to trust custom CA's if they choose to supply their own certs which have been issued by a CA not trusted by base OS.
The docker-cli
has some code built around this that we should attempt to replicate as closely as possible. This is because of the fact that we are wrapping docker-cli
calls in some places as well as making http calls against the registry
and houston-api
. We will want consistent behavior, especially when we are making an http call and a docker-cli
command in the same astro-cli
command.
I have some example code proving out the solution, but will need to take time to replicate docker-cli
way of doing things.
@andscoop are you still working on https://github.com/astronomerio/astro-cli/tree/feature/workspace-and-deployment-update or is it ready to merge in?
Including the package google-cloud-logging
in the requirements.txt
breaks builds.
This is because of a dependency in google-cloud-logging
called grpcio
.
grpcio
relies on python-dev
system package to provide a file Python.h
.
It appears that we also need to include some compiler tools that are not provided with alpine-linux by default.
Creating the following packages.txt
file resolved the pip install issues we were having.
py-pip
python-dev
gcc
musl-dev
make
linux-headers
build-base
There is likely some redundancy in these packages as I did not have time to dig into everything that the build-base package provided.
Command exist and houston api endpoint exists, just need to wire them up.
Modify astro airflow create [title]
cmd to prevent output of links if deployment fails.
curl -o- https://astro-cli.astronomer.io/install.sh | bash
Cmd Placeholder exists and houston-api call exists, need to wire them up.
Replication:
1.) add an editable package to your requirements.txt file such as
-e git+https://github.com/astronomerio/simple-salesforce@master#egg=simple-salesforce
2.) run astro airflow start
Ensure that docker.registry.authority config is overwritten when cmd astro auth login -d [domain]
is run and docker.registry.authority already exists.
Add a command that allows user to update or upgrade the CLI from within the cli itself
astro update
or
astro upgrade
How do I use and hack on our CLI?
As a user, I want to change settings in airflow.cfg for my airflow deployment. I might want to change default views for DAGs, have DAGs be unpaused by default, or change a few other non-infrastructure related settings.
We'd need to override certain settings so that the user can't touch them for safety, but there are definitely some settings that we can give the user control over.
It might also be nice to rope this into the UI at some point.
This issue is a bookmark on the need to implement astro airflow pause
which will maintain mounted volumes instead of doing a full teardown.
Add check to astro airflow deploy
to check for uncommitted code if user is using git. Gracefully do nothing if not in a git project.
If user is using git and they have uncommitted changes, stop deployment and show warning. Warning should specify they can astro airflow deploy -f
to force deployment.
Otherwise, if they are using git and are clean, then allow the deploy to go right through.
@vparekh94 and I just had a support call with customer troubleshooting getting started with the CLI.
The CLI failed to install because it couldn't add the PATH command to the bash profile. The command failed because the user did not have a ~/.bash_profile
created yet.
@andscoop @cwurtz @schnie What do we think about auto creating an empty ~/.bash_profile
when it doesn't exist?
A dev user will likely want to see the status of their deployments from the CLI. MVP of this is likely a single command that provides summary stats of a deployment. A command like this is necessary because deployments and pushes do not happen instantaneously and strange behavior can occur until all resources have been re provisioned with latest image.
Suggest Behavior
astro airflow status [deployment_name]
which will return the state of all pods in that deployment. This behavior can later be extended to meet other needs.SPIKE
astro airflow deploy
so that they are aware of when a deployment is fully complete?Description
As a user of Airflow on Astronomer, I would like to be able to programmatically pause/unpause dags so I can more easily affect dag behavior (and possibly tie in with my own internal tooling) without having to log directly into Airflow.
Example behavior:
astro pause {dag_id}
astro unpause {dag_id}
Relevant Links:
https://github.com/astronomerio/incubator-airflow/blob/master/airflow/bin/cli.py#L306
https://github.com/teamclairvoyant/airflow-rest-api-plugin/blob/master/plugins/rest_api_plugin.py#L20
source: https://github.com/astronomerio/astronomer-cloud/issues/164
@cwurtz commented on Wed Jun 06 2018
Update astro auth login
so it outputs a URL that direct the user to an orbit page to start the OAuth process. After that URL output it should await input of a token that user will eventually receive after authenticating.
This token should then be saved to the global config.
Exact details still TBD, but currently it is expected that all registry auth functionality should be stripped out. Depending on what options we have with docker auth, we'll either hardcode the username to something, and use the token as the password. Alternatively if we can set or pass the encoded user/pass, we can just pass the token.
A user will need the ability to re-build their images after modifying packages.txt
and/or requirements.txt
without destroying volumes.
What do we want this user experience to be?
Some options to consider
astro airflow refresh
astro airflow start -r
to rebuild\refresh imagesastro airflow stop
so that next call to astro airflow start
rebuilds images and keeps volumes.There seems to be a lot of ways to go about this - not all of them inline without docker works behind the scenes
I looked into docker-compose down without passing the -v
arg but this will result in an error Volume "airflow_logs" needs to be recreated - driver has changed
. I believe this is because we are changing the underlying airflow container. In reality we only need the postgres container volumes to persist.
It looks like docker-compose build could be leveraged to solve this issue.
When a plugin is broken due to a bad import or otherwise, it will cause a DAG to fail because it is unable to import from the plugin as the components of the plugin will never be imported.
This DAG failure is shown on the webserver home page - but it provides no indication that it is the plugin that failed. These logs are on the Airflow Webserver at boot.
I propose that in the CLI we detect
1.) plugin import errors
2.) print a friendly error message
3.) print command or path to relevant logs
This isn't an issue for me currently because I found a workaround but wanted to capture this for posterity and a better long term solution.
I was attempting to start our dockerized Airflow with mounted volumes on the following server:
$ cat /etc/*-release
CentOS Linux release 7.3.1611 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
CentOS Linux release 7.3.1611 (Core)
CentOS Linux release 7.3.1611 (Core)
I ran into an issue where the default user we create in the containers (astro
) is not present on the home server and changing the user in the Dockerfile to my home user throws the following error:
ERROR: for airflowenterprise_flower_1 Cannot start service flower: linux spec user: unable to find user ben.gregory: no matching entries in passwd file
Creating airflowenterprise_scheduler_1 ...
ERROR: for airflowenterprise_scheduler_1 Cannot start service scheduler: linux spec user: unable to find user ben.gregory: no matching entries in passwd file
ERROR: for flower Cannot start service flower: linux spec user: unable to find user ben.gregory: no matching entries in passwd file
ERROR: for scheduler Cannot start service scheduler: linux spec user: unable to find user ben.gregory no matching entries in passwd file
ERROR: Encountered errors while bringing up the project.
To get around this, I attempted to add RUN adduser -D -u 123456789876 -G astro ben.gregory
to the Dockerfile but ran into another issue.
$ docker build .
Sending build context to Docker daemon 3.072kB
Step 1/12 : FROM alpine:3.7
---> 3fd9065eaf02
Step 2/12 : RUN adduser -D -u 123456789876 -G astro ben.gregory
---> Running in 78bb924159c9
adduser: number 123456789876 is not in 0..256000 range
Some googling led me to this being an Alpine issue, specifically BusyBox which throws an error if the UID is greater than 256000 (https://stackoverflow.com/questions/41807026/cant-add-a-user-with-a-high-uid-in-docker-alpine?rq=1). It turns out Alpine doesn't actually care about the UID so I added the suggested workaround in the SO question.
# Create user
echo "ben-gregory:x:$UID_TO_SET:$UID_TO_SET::/home/ben-gregory:" >> /etc/passwd
## thanks for http://stackoverflow.com/a/1094354/535203 to compute the creation date
echo "ben-gregory:!:$(($(date +%s) / 60 / 60 / 24)):0:99999:7:::" >> /etc/shadow
echo "ben-gregory:x:$UID_TO_SET:" >> /etc/group
mkdir /home/ben-gregory && chown user: /home/ben-gregory
FWIW, BusyBox also won't let you add a user with a period so ben.gregory had to become ben-gregory, but it could as easily have been left as user
.
Finally I had to change the Dockerfile for airflow to chown
the directory to my new user from astro
.
Goreleaser isn't building the binary with the version tags on the binary. Needs to mimic the make build
command.
Each image right now is named for the repository it is in. If a user has a project structure like this:
p1 -> airflow -> prod -> astro_directory
p2 -> airflow -> prod -> astro_directory
Running astro airflow start
in p1's project, astro airflow stop
and then running astro airflow start
in p2's astro directory would spin up the same image as what's in p1.
This could cause issues for users with multiple airflow deployments in multiple projects
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.