Git Product home page Git Product logo

Comments (11)

mistercrunch avatar mistercrunch commented on May 21, 2024

Only the Run feature from the UI isn't working isn't it? The problem is I don't want to run an executor and a task in the scope of a web request, I need to run that task async, and without a remote service it's just impossible.

You can use airflow run from the CLI until you move to CeleryExecutor. BTW it's super easy to set up and it can run on the same box. You can use sqlalchemy as a broker and see how much mileage you get.

from airflow.

martingrayson avatar martingrayson commented on May 21, 2024

I don't suppose it would be possible to have a quick-start / few steps added to the documentation to get going started with Celery? I'm having trouble convincing my colleagues that it wouldn't be a massive overhead maintaining Celery too.

from airflow.

mistercrunch avatar mistercrunch commented on May 21, 2024

Well Celery is integrated with Airflow, it's just a Python library that ships with Airflow. The Celery broker (most likely RabbitMQ or Redis) is a piece of infrastructure that is required and someone needs to keep up and running. Redis is fairly common nowaways and a breeze to setup, at Airbnb we already had both systems running in production and in-house knowledge about them.

But note that Celery supports using a database (through SqlAlchemy) as a broker, which you already should have setup. So using your same SqlAlchemy connection as a broker seems pretty reasonable to me, even though it is "experimental" as far as Celery support.

The thing is Celery is an async framework that can operate at web scale (a common use case is to process thumbnails for uploaded images outside the scope of a web request), and is setup to handle dozens, if not thousands of messages per second. A database might have some troubles with that many messages, plus the workers constantly poking at it. But with Airflow, the number of messages you'd send is probably in the few hundreds, or thousands a day, so using Celery as a broker might be very reasonable, especially in a pre-production-type setup.

As far as getting a proper Celery setup going, people should refer to the Celery docs. I just added a reference in the docs here:
65c5f0a

Both hey, it'd be nice the best of both worlds in terms of "get going quickly" and "scale to infinity", but the later one has to require some infrastructure.

For the record, when RabbitMQ was having some problems (unrelated to Airflow), I setup a survival Redis box and migrated in about 20 minutes. Of course productionizing Redis, setting up a slave and monitoring it is more workload, but you can do all of this once Airflow becomes an important part of your ecosystem. This should be somewhat trivial for ops folks or data-infra people: that should be part of their job description to provide the services you need to do your work. I respect trying to keep the ecosystem simple though!

from airflow.

r39132 avatar r39132 commented on May 21, 2024

I'm leaning towards using our Postgres DB as the broker as it is the quickest route to adoption within the company and will be fine for a while. When we reach scale, I'd lean towards SQS over anything else because its infrastructure that I don't need to maintain, ansibilize, monitor, etc... and because it scales to hundreds of millions of messages per day.

from airflow.

mistercrunch avatar mistercrunch commented on May 21, 2024

Postgres/SqlAlchemy should work just fine as a CeleryBroker, please let us know how much mileage you get out of it. I'd bet twelve bucks that it would just never become the bottleneck.

from airflow.

r39132 avatar r39132 commented on May 21, 2024

I've opened #63

The experimental status and list of limitations of Sqlalchemy is a real turn-off : http://celery.readthedocs.org/en/latest/getting-started/brokers/sqlalchemy.html#broker-sqlalchemy

I'm looking for a workflow engine that can be light-weight during the adoption phase at our company and fault-tolerant down the line. I've started playing with Celery a bit but I don't want to stand up RabbitMQ/Redis or any other backend right now, even in production because there is a cost of my launching infrastructure in production -- need to ansibilize it, set up logging and alerting, set up monit, etc... all before anyone is using it in production. SQS and Postgres both have limitations as brokers and known bugs.

I liked the support for the LocalExecutor and Sequential Executor because they were lightweight. If and when adoption grows here, we will consider celery and setting up Redis/RabbitMQ, but for now, we won't. In addition to supporting the broker infrastructure for celery, I also need to run a separate "airflow worker" and make sure it is fault-tolerant (e.g. monit, etc...). It would have been nice if a worker started in the main "airflow webserver", but I don't see any queue consumers running when I run "airflow webserver".

Finally, I'm not clear why running the LocalExecutor (if I don't have more than 3 flows running) is a bad idea. But, I would like to have the UI features work and I would like to have the dags imported into the DB, not just showing up on the UI.

from airflow.

mistercrunch avatar mistercrunch commented on May 21, 2024

You're talking about 1 UI feature (TaskDialog->Run) that we lived without for months. It's pretty minimal.

I'm not sure if you have tried it but airflow scheduler does start a working LocalExecutor in the background if it is setup that way.

As for keeping two commands up and running that should be pretty easy to do. I haven't seen airflow webserver and airflow scheduler go down in long time. noup or screen should give you mileage beyond POC. Though clearly in a production setup it should be kept up and monitored.

I feel like we have it pretty good on offering variety on the spectrum of ramping up to production. Maybe it could be better, but it's pretty decent as is. I don't see us spending cycles there for a moment.

from airflow.

r39132 avatar r39132 commented on May 21, 2024

Sorry, I thought airflow scheduler was related to Celery execution... My colleagues and I somehow missed that after going through both the quick start and tutorial. There is a reference to "master scheduler" in the tutorial, which led to some conflation of the airflow (local) scheduler and celery scheduler. Makes more sense now, so we will launch with the scheduler and local executor.

from airflow.

mistercrunch avatar mistercrunch commented on May 21, 2024

Hopefully this clarifies things a bit
b235411

from airflow.

Dowwie avatar Dowwie commented on May 21, 2024

What are your thoughts on using disque rather than redis as broker?

from airflow.

mistercrunch avatar mistercrunch commented on May 21, 2024

If you mean as a broker for Celery, Disque doesn't seem to be documented here:
http://celery.readthedocs.org/en/latest/getting-started/brokers/

from airflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.