Git Product home page Git Product logo

Comments (2)

mistercrunch avatar mistercrunch commented on May 21, 2024

The requests lib is already a dependency for Airflow, you can see it both in the requirements.txt and the setup.py, it's a great library.

Airflow tasks are expected to be synchronous, or made so by writing some sort of sleep/check routine in your operator. It's also expected to raise an exception as a way to communicate an error.

It may be tricky to generalize an HttpOperator since all systems expect different endpoints, payload and return different results. Using a PythonOperator that uses the requests lib is the quick way to do this. Maybe an HttpSensor would be generic enough, receiving an enpoint, payload and a regex to match in the response.

I don't know much about remap, but maybe a RemapOperator would make more sense. In general, it's probably a better approach, so that instead of receiving generic enpoint and payload, it can receive something more meaningful like job_name, parameters_dict or whatever makes sense for that specific external system.

A side note about hooks, they use the Connection model to store connection information as opposed to hard-coding it in script. It may be nice to have a thin HttpHook that would retrieve that info from the DB and acts as some thin wrapper around the requests lib. I'm not 100% on this though, it may not add a whole lot of value (vs confusion)...

from airflow.

gtoonstra avatar gtoonstra commented on May 21, 2024

So I put something preliminary together of a hook and a sensor to get an idea of the complexity. You can see this development branch at:

master...gtoonstra:http_protocol_sensor

I agree, the flavours of what's written with HTTP is too rich to create anything that is generic enough and attempts in such generic approaches usually start to pollute other areas with logic, for example here proliferation of logic into the DAG's.

In the branch, the hook raises exceptions, but I think most of those should be moved to the operator instead. This is based on the assumption that the operator class decides on success or failure based on the responses of the hook, not the hook itself. In cases of database when a db was expected and didn't exist, the hook is allowed to raise exceptions.

Then all we need probably for now is a SimpleHttpOperator, which is limited to the following:

  • it fails or succeeds on the return code only, not on the response content.
  • it only supports a 'one-shot' call, no cookies, sessions or conversations, so no calls in a sequence.
  • maybe add a 'regex' on the "GET" call to check the state of the object beyond the return code (existence).
  • it has a timeout and never uses a persistent connection.
  • only supports basic auth of login and password.

Anything beyond this simple use requires a specific operator:

  • file upload/download over http, special auth schemes, http conversations and sessions, special mime-types, etc.

So the remap operator probably falls into the latter category.

from airflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.