Git Product home page Git Product logo

airflow's Introduction

Research Kernel's WorkFlow Management

Why WorkFlow Management ?

At Research Kernel, we need to keep updating our Elastic Search database everyday as arxiv.org publish new Research Papers. We also have to find the similar papers of new incoming papers in our database by passing those papers to our recommendation system.

We use AWS compute heavy spot EC2 instances for machine learning workload and and shut them down as soon as put ML computation is finished and save the output into our knowledge graph. We have to do this simple process everyday. As we do have a lot of task dependency, scheduling and sanity checks, this can't be done with a simple cron job.

Airflow

We are using Airflow to solve our problem. It is a platform to programmatically author, schedule and monitor workflows. There are other alternatives too but, we had better understanding of Airflow as compared to other alternatives as it uses python which is easy to learn.

What do we do with Airflow?

We use Airflow to control and automate our AWS components (EC2 spot instances auto bid, launch, mount, and shutdown the instance), as well as to schedule the whole Extract, transform, load and ML workflows.

We use docker build of Airflow by puckel, and the whole ML ecosystem is controlled and automated by Airflow.

WorkFlow Diagram

We will share the airflow task graph soon.

Project Structure

We have one Airflow server which update our databases, provision spot instance with mounted EBS volume and trigger a second DAG which run the ML workload on the provisioned Spot instance, as soon as the all the second DAG tasks are finished, it will stop the instance.

We have two Folders in repository, awsbot ( for automating aws ) and ml-workflow (Recommendation system). Also, looking for Contributors who can help us to improve and review our AWS bots and airflow DAGs.

airflow's People

Contributors

prakritidev avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.