Comments (2)
The requests
lib is already a dependency for Airflow, you can see it both in the requirements.txt
and the setup.py
, it's a great library.
Airflow tasks are expected to be synchronous, or made so by writing some sort of sleep/check routine in your operator. It's also expected to raise an exception as a way to communicate an error.
It may be tricky to generalize an HttpOperator since all systems expect different endpoints, payload and return different results. Using a PythonOperator that uses the requests lib is the quick way to do this. Maybe an HttpSensor would be generic enough, receiving an enpoint, payload and a regex to match in the response.
I don't know much about remap
, but maybe a RemapOperator would make more sense. In general, it's probably a better approach, so that instead of receiving generic enpoint
and payload
, it can receive something more meaningful like job_name
, parameters_dict
or whatever makes sense for that specific external system.
A side note about hooks, they use the Connection model to store connection information as opposed to hard-coding it in script. It may be nice to have a thin HttpHook that would retrieve that info from the DB and acts as some thin wrapper around the requests lib. I'm not 100% on this though, it may not add a whole lot of value (vs confusion)...
from airflow.
So I put something preliminary together of a hook and a sensor to get an idea of the complexity. You can see this development branch at:
master...gtoonstra:http_protocol_sensor
I agree, the flavours of what's written with HTTP is too rich to create anything that is generic enough and attempts in such generic approaches usually start to pollute other areas with logic, for example here proliferation of logic into the DAG's.
In the branch, the hook raises exceptions, but I think most of those should be moved to the operator instead. This is based on the assumption that the operator class decides on success or failure based on the responses of the hook, not the hook itself. In cases of database when a db was expected and didn't exist, the hook is allowed to raise exceptions.
Then all we need probably for now is a SimpleHttpOperator, which is limited to the following:
- it fails or succeeds on the return code only, not on the response content.
- it only supports a 'one-shot' call, no cookies, sessions or conversations, so no calls in a sequence.
- maybe add a 'regex' on the "GET" call to check the state of the object beyond the return code (existence).
- it has a timeout and never uses a persistent connection.
- only supports basic auth of login and password.
Anything beyond this simple use requires a specific operator:
- file upload/download over http, special auth schemes, http conversations and sessions, special mime-types, etc.
So the remap operator probably falls into the latter category.
from airflow.
Related Issues (20)
- ProcessPoolExecutor in CeleryExecutor should be reused
- Resolve `AirflowProviderDeprecationWarning` in providers system tests HOT 1
- [Bug] Strict validation in Dataset URI in Airflow 2.9 breaks some DAGs HOT 2
- Upgrade `gcloud-aio-auth` to 5.2.+ HOT 2
- SageMakerTransformOperator does not deduplicate model name HOT 2
- Context is not preserved after execution returns from deferred state HOT 1
- mypy errors - Argument 1 has incompatible type "XComArg" HOT 2
- Unsupported credential type AzureIdentityCredentialAdapter when using AzureDataLakeStorageV2Hook with DefaultAzureCredential as authentication method (AKS workload indentity) HOT 1
- Unable to see dag_id and task_id tagging for mentioned metrics HOT 2
- SparkSubmitOperator not mark task as success after spark job complete job HOT 2
- Add `json` and `sql` template rendering to Grid View HOT 8
- Task Instance Note accordion doesn't collapse HOT 1
- Xcom support for reschedule sensors HOT 4
- Airflow 与 任务 失联,任务状态没办法更新 HOT 2
- BigQuery task decorated functions failing in Airflow 2.9.1 HOT 7
- Add service account impersonation with Google Cloud SQL Proxy in Google Cloud SQL Operators
- The Rendered Template button has disappeared from the Task Detail. HOT 3
- Support different plugins_folder setting in helm chart when airflow version > 2.0.0 HOT 1
- BigQueryInsertJobOperator fails for task IDs with 64 characters HOT 1
- Airflow statsd stops sending metrics during maximum dagrun HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from airflow.