Git Product home page Git Product logo

astronomer-providers's Introduction

Astronomer Providers

PyPI Version

PyPI - Python Version

PyPI - License

Code style: black

CodeCov

Documentation Status

Security: bandit

Warning

The majority of operators and sensors within this repository have been deprecated and will not receive further updates. Read more about the deprecation in the Deprecation Notice section below.

Deprecation Notice

With the release 1.19.0 of the astronomer-providers package, most of the operators and sensors are deprecated and will no longer receive updates. We recommend migrating to the official Apache Airflow Providers for the latest features and support. For the operators and sensors that are deprecated in this repository, migrating to the official Apache Airflow Providers is as simple as changing the import path from

from astronomer.providers.*.*.operator_module import SomeOperatorAsync

to

from airflow.providers.*.*.operator_module import SomeOperator

and setting the deferrable argument to True while using the operator or sensor in your DAG. Setting the deferrable argument to True will ensure that the operator or sensor is using the async version of the operator or sensor from the official Apache Airflow Providers.

For example, to migrate from astronomer.providers.amazon.aws.operators.batch.BatchOperatorAsync to airflow.providers.amazon.aws.operators.s3.BatchOperator, simply change the import path and pass the deferrable argument:

BatchOperator(
    task_id="copy_object",
    your_arguments,
    your_keyword_arguments,
    deferrable=True,
)

For more information on using the deferrable operators and sensors from the official Apache Airflow Providers, visit the following links:

Note

Although the default value for the deferrable argument is False, it's possible to configure the default value for the deferrable argument across your deployment by setting the default_deferrable flag in the operators sections of your Airflow configuration. Once you set the default_deferrable flag to True, you can remove the deferrable argument from your operators and sensors and they will use the async version of the operator or sensor from the official Apache Airflow Providers if it exists.

See more at: https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#default-deferrable

For troubleshooting of issues with migrations, you are suggested to open up a GitHub discussion

Installation

Install and update using pip:

pip install astronomer-providers

This only installs dependencies for core provider. To install all dependencies, run:

pip install 'astronomer-providers[all]'

To only install the dependencies for a specific provider, specify the integration name as extra argument, example to install Kubernetes provider dependencies, run:

pip install 'astronomer-providers[cncf.kubernetes]'

Extras

Extra Name Installation Command Dependencies
all pip install 'astronomer-providers[all]' All
amazon pip install 'astronomer-providers[amazon]' Amazon
apache.hive pip install 'astronomer-providers[apache.hive]' Apache Hive
apache.livy pip install 'astronomer-providers[apache.livy]' Apache Livy
cncf.kubernetes pip install 'astronomer-providers[cncf.kubernetes]' Cncf Kubernetes
databricks pip install 'astronomer-providers[databricks]' Databricks
dbt.cloud pip install 'astronomer-providers[dbt.cloud]' Dbt Cloud
google pip install 'astronomer-providers[google]' Google
http pip install 'astronomer-providers[http]' Http
microsoft.azure pip install 'astronomer-providers[microsoft.azure]' Microsoft Azure
openlineage pip install 'astronomer-providers[openlineage]' Openlineage
sftp pip install 'astronomer-providers[sftp]' Sftp
snowflake pip install 'astronomer-providers[snowflake]' Snowflake

Example Usage

This repo is structured same as the Apache Airflow's source code, so for example if you want to import Async operators, you can import it as follows:

from astronomer.providers.amazon.aws.sensors.s3 import S3KeySensorAsync as S3KeySensor

waiting_for_s3_key = S3KeySensor(
    task_id="waiting_for_s3_key",
    bucket_key="sample_key.txt",
    wildcard_match=False,
    bucket_name="sample-bucket",
)

Example DAGs for each provider is within the respective provider's folder. For example, the Kubernetes provider's DAGs are within the astronomer/providers/cncf/kubernetes/example_dags folder.

Principle

We will only create Async operators for the "sync-version" of operators that do some level of polling (take more than a few seconds to complete).

For example, we won’t create an async Operator for a BigQueryCreateEmptyTableOperator but will create one for BigQueryInsertJobOperator that actually runs queries and can take hours in the worst case for task completion.

To create async operators, we need to inherit from the corresponding airflow sync operators. If sync version isn't available, then inherit from airflow BaseOperator.

To create async sensors, we need to inherit from the corresponding sync sensors. If sync version isn't available, then inherit from airflow BaseSensorOperator.

Changelog

We follow Semantic Versioning for releases. Check CHANGELOG.rst for the latest changes.

Contributing Guide

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.

A detailed overview on how to contribute can be found in the Contributing Guide.

As contributors and maintainers to this project, you are expected to abide by the Contributor Code of Conduct.

Goals for the project

  • Our focus is on the speed of iteration and development in this stage of the project and so we want to be able to quickly iterate with our community members and customers and cut releases as necessary
  • Airflow Providers are separate packages from the core apache-airflow package and we would like to avoid further bloating the Airflow repo
  • We want users and the community to be able to easily track features and the roadmap for individual providers that we develop
  • We would love to see the Airflow community members create, maintain and share their providers to build an Ecosystem of Providers.

Limitations

  • In Airflow sensors have a param mode which can be poke and reschedule. In async sensors, this param has no usage since tasks gets deferred to Triggerer.

License

Apache License 2.0

astronomer-providers's People

Contributors

abhishekbhakat avatar andrewgodwin avatar basph avatar bharanidharan14 avatar dstandish avatar erdos2n avatar feluelle avatar github-actions[bot] avatar jarfgit avatar josh-fell avatar kaxil avatar lee-w avatar pankajastro avatar pankajkoti avatar park-peter avatar phanikumv avatar pre-commit-ci[bot] avatar rajaths010494 avatar raphaelauv avatar rnhttr avatar shr3kst3r avatar sunank200 avatar tanelk avatar thecodyrich avatar tseruga avatar vatsrahul1001 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

astronomer-providers's Issues

Split the dependencies

Is your feature request related to a problem? Please describe.
I would like to use only a part of this provider, in my case I only need the CNCF ( so I do not want all the aws , gcp .. depedencies )

Describe the solution you'd like
if I could do a pip install :

astronomer-providers[cncf]==1.0.0

Describe alternatives you've considered
Copy paste the CNCF folder of this repo inside my private dags repo

Additional context
I guess having all this great operators and hooks out of the official Airlfow project is on purpose, so I guess there is no plan to merge theses operators and hooks in the existing airflow providers ?

Thanks for all your work and open sourcing this great code 👍

Async KubernetesPodOperator

We want to write a version of the KubernetesPodOperator where the core "waiting for the pod to finish" part of the operator is offloaded to a trigger.

Implement Async `RedshiftOperator`

Based on the research in #32 - Async RedshiftOperator.

Acceptance Criteria:

Automate naming conventions check

Research and integrate a pre-commit hook that checks the files for bad naming convention within this repository.
Example of a bad naming convention would be to name a file as AsyncBigQueryInsertJobOperator rather than BigQueryInsertJobOperatorAsync.

Create `SnowflakeOperatorAsync`

Develop SnowflakeOperatorAsync

The Python Connector for Snowflake support Asynchronous query:
https://docs.snowflake.com/en/user-guide/python-connector-example.html#performing-an-asynchronous-query

So we should be able to make the SnowflakeOperator Async by creating a run_async method in the SnowflakeHook which will be used by SnowflakeOperatorAsync.

Take example from the DatabricksSubmitRunOperatorAsync .

The SnowflakeOperator in Open source Airflow (code) executes a query, waits and blocks an Airflow worker slot until the query is completed. Our job will be to replace the blocking call with execute_async and time based poll where we use asyncio.sleep() (similar to DatabricksTrigger)

Tasks

  • Create a SnowflakeHookAsync
  • Create a SnowflakeTrigger
  • Create a SnowflakeOperatorAsync that uses AsyncSnowflakeHook and SnowflakeTrigger

Testing

Production-readiness

  • Our data team uses SnowflakeOperator so let's create a wheel package of this repo and give it to them to get feedback. If it works in their production environment, then we should be good to go. If not, fix the issues from their feedback and rinse and repeat.
  • Documentation: Add proper docstrings within the code

HTTPSensorAsync not working

The HTTPSensorAsync does not defer the task to trigger. Its essentially calling the HTTPSensor run method. Pasting the log for the example dag for the reference.

[2022-02-28, 13:43:41 UTC] {warnings.py:109} WARNING - /usr/local/lib/python3.9/site-packages/***/utils/context.py:152: AirflowContextDeprecationWarning: Accessing 'yesterday_ds_nodash' from the template is deprecated and will be removed in a future version.
  warnings.warn(_create_deprecation_warning(key, self._deprecation_replacements[key]))

[2022-02-28, 13:43:46 UTC] {http.py:101} INFO - Poking: 
[2022-02-28, 13:43:46 UTC] {base.py:70} INFO - Using connection to: id: http_default. Host: randomuser.me, Port: None, Schema: , Login: , Password: None, extra: {}
[2022-02-28, 13:43:46 UTC] {http.py:140} INFO - Sending 'GET' to url: http://randomuser.me
[2022-02-28, 13:43:52 UTC] {http.py:101} INFO - Poking: 
[2022-02-28, 13:43:52 UTC] {base.py:70} INFO - Using connection to: id: http_default. Host: randomuser.me, Port: None, Schema: , Login: , Password: None, extra: {}
[2022-02-28, 13:43:52 UTC] {http.py:140} INFO - Sending 'GET' to url: http://randomuser.me
[2022-02-28, 13:43:57 UTC] {http.py:101} INFO - Poking: 
[2022-02-28, 13:43:57 UTC] {base.py:70} INFO - Using connection to: id: http_default. Host: randomuser.me, Port: None, Schema: , Login: , Password: None, extra: {}
[2022-02-28, 13:43:58 UTC] {http.py:140} INFO - Sending 'GET' to url: http://randomuser.me
[2022-02-28, 13:44:03 UTC] {http.py:101} INFO - Poking: 
[2022-02-28, 13:44:03 UTC] {base.py:70} INFO - Using connection to: id: http_default. Host: randomuser.me, Port: None, Schema: , Login: , Password: None, extra: {}
[2022-02-28, 13:44:03 UTC] {http.py:140} INFO - Sending 'GET' to url: http://randomuser.me

Implement Async `S3ToRedshiftOperator`

Follow up of #34 to implement async version of S3ToRedshiftOperator.

Acceptance Criteria:

[Spike] Async `GCSObjectExistenceSensor`

Async version of https://github.com/apache/airflow/blob/1008d8bf8acf459dbc692691a589c27fa4567123/airflow/providers/google/cloud/sensors/gcs.py#L30 using one of the following libraries:

Official Python client does not support it yet: googleapis/google-cloud-python#3103

Acceptance Criteria:

  • Document possible options and selection reason for a particular library in this GitHub issue via a Summary comment

Create Async version of DatabricksSqlOperator

Implement async version of DatabricksSqlOperator

Acceptance Criteria:

Unit Tests coverage in the PR (90% Code Coverage -- We will need to add CodeCov separately to measure code cov) with all of them passing
Example DAG using the async Operator that can be used to run Integration tests that are parametrized via Environment variables. Example - https://github.com/apache/airflow/blob/8a03a505e1df0f9de276038c5509135ac569a667/airflow/providers/google/cloud/example_dags/example_bigquery_to_gcs.py#L33-L35
Add proper docstrings for each of the methods and functions including Example DAG on how it should be used (populate
Exception Handling in case of errors
Improve the OSS Docs to make sure it covers the following:
Has an example DAG for the sync version
How to add a connection via Environment Variable & explain each of the fields. Example - https://airflow.apache.org/docs/apache-airflow-providers-postgres/stable/connections/postgres.html
How to use Guide for the Operator - example: https://airflow.apache.org/docs/apache-airflow-providers-postgres/stable/operators/postgres_operator_howto_guide.html

Enhance example DAG for S3 sensors

The example DAG for S3 sensors is still not self sufficient. For example, when you are deferring the task, the task will keep on waiting for the expected file to come into S3, which currently doesnt move to complete until there is a manual intervention of uploading the file to s3. We should create a file automatically in the expected location through a parallel task which runs, say a minute after the deferral , and then this action should move the deferred task to completion.

Use Sphinx to autogenerate user-facing docs

Use Sphinx and Sphinx auto-api extension to autogenerate docs to achieve the following:

  • Better discoverability of the available Async Operators, Sensors and Hooks
  • Allows us to document the limitations of certain async operators
  • Add better compatibility matrix of Airflow and astronomer-providers package.
  • Better Document versioning if we host it to ReadTheDocs

To better control what we expose in the docs we could take the Docker project for inspiration. Docs and it's source-code.

Create `PostgresOperatorAsync`

Develop PostgresOperatorAsync

  • Research and find the library to use that supports async with Postgres
    Check if the https://www.psycopg.org/docs/ ( library used by Airflow Postgres provider) supports python Async (most likely not).

If not there are potentially other libraries we can use:

So we should be able to make the PostgresOperator Async by creating a run_async method in the PostgresHook which will be used by PostgresOperatorAsync.

Take example from the DatabricksSubmitRunOperatorAsync .

The PostgresOperator in Open source Airflow (code) executes a query, waits and blocks an Airflow worker slot until the query is completed. Our job will be to replace the blocking call with an async call and time based poll where we use asyncio.sleep() (similar to DatabricksTrigger)

Tasks

  • Create a PostgresHookAsync
  • Create a PostgresTrigger
  • Create a PostgresOperatorAsync that uses PostgresHookAsync and PostgresTrigger

Testing

Production-readiness

Integration Test with an example should cover all the actual testing

  • Documentation: Add proper docstrings within the code

DB-based Operators (First Batch)

Providers:

  • Snowflake
  • Postgres
  • KubernetesPodOperator
  • Databricks (done)

Testing & Production Readiness
- Setup dev environments
- Write and run actual dags to tests apart from the unit tests in the repo

Implement async versions of the remaining GCS sensors

Follow up of #35 to Implement async versions of the remaining GCS sensors:


Acceptance Criteria:

Implement Async `S3KeySensor`

Based on the research in #32 - Async RedshiftOperator

Acceptance Criteria:

Reorganize S3KeySensor

Add the trigger's common method to hooks as the methods can be reused by other sensors and triggers.

Acceptance Criteria:

Async Databricks Operator

We'd like a pair of async operators that mirror the functionality of Airflow's Databricks operators.

Change BigQueryInsertJobOperatorAsync to use OSS BigQueryHook

This story is the technical debt story for the pending work to use the OSS BigQueryHook within the BigQueryInsertJobOperatorAsync.

The code change needs to

  • Remove _BigQueryHook usage in google/hooks/bigquery_async.py
  • Import the OSS BigQueryHook within google/operators/bigquery_async.py
  • Change the tests accordingly for the google async hooks and operators

Note
Ensure that the PR apache/airflow#21385 is released before this story is implemented.

Implement Async `BigqueryOperator`

Build async version of https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/operators/bigquery.py

Acceptance Criteria:

Solution to detect classpath issues in serialize method of Async Trigger

We need a solution to highlight to the developer whenever an incorrect trigger classpath is used in the serialize method.

We ended up spending unreasonable time to detect this last time, as the task will just show the exception as "Trigger failure" and no other information

Either write a unit test to detect this (or) come up with an alternative approach.

Auto terminate unused cloud(AWS & Google) resources

Create a utility script to auto terminate the cloud resources, if not used within the last 1 hr. This will improve our resource usage efficiency and reduce costs.

For instance, we can create a Lambda function on AWS which checks whether a particular resource(say, for example an EMR cluster) is unused for the last 1 hr and then terminate it automatically.

Open source astronomer-operators repo

  • Use implicit namespaces - astronomer.providers.XYZ (#57)
  • Rename repo name? Do we need to move the repo to a different organization like https://github.com/astro-projects/ ?
  • Package Rename for publishing to PyPI
  • Update description and other metadata in setup.cfg
  • Add Apache2 LICENSE (#56)
  • SECURITY.md for reporting CVEs / security issues in the repo (#56)
  • Add Repo description and labels for better discovery on GitHub
  • Release it to open source PyPI repository.
  • Add contributing guide (@phanikumv )
  • Add principle on when we should create an Async version an Airflow Operator/Sensor (49d5b10)
  • Code standardisation (@phanikumv )
  • Add Compatibility matrix with Airflow versions - covered in #75

Commercial Operators documentation

We're going to run this primarily through the beta docs UI - Jake W will handle the final formatting and markup, we just need to get him a reasonable draft in Notion.

Implement Async `AzureDataFactoryPipelineRunStatusSensor`

Async version of AzureDataFactoryPipelineRunStatusSensor: https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/_api/airflow/providers/microsoft/azure/sensors/data_factory/index.html#airflow.providers.microsoft.azure.sensors.data_factory.AzureDataFactoryPipelineRunStatusSensor

Acceptance Criteria:

Implement Async Azure Operators

Please look at docs to see if there are other operators we should make Async (Principle - Only create Async operators for the “sync-version” of operators that do some level of polling; take more than a few seconds to complete) for the Azure Provider.

  • ADLSListOperator
  • AzureDataLakeStorageDeleteOperator
  • AzureDataLakeStorageListOperator
  • AzureDataExplorerQueryOperator
  • AzureBatchOperator
  • AzureContainerInstancesOperator
  • AzureCosmosInsertDocumentOperator
  • WasbDeleteBlobOperator

Acceptance Criteria:

OSS provider category sizing

Get the numbers for a rough sizing (how many) in each category from the OSS?

Database (Postgres, Snowflake, …)
File (Local, GCS, S3, …)
Job submit (Spark, Databricks?, …)
Notification / Alerting (Slack, …)
REST / JSON API (…) Not sure how many of these exist in Airflow, but big market for these.
Monitoring (Datadog, ..)
Data Quality (Great Expectation, … )
Observability / Lineage (Datakin, …)

Databricks Operator Coding

Get the Databricks operator coded up and working in manual testing, but without a full unit test suite

Implement async for remaining Bigquery sensors

Follow up of #31

Implement async versions for the following sensors:

Acceptance Criteria:

Implement Async `AzureCosmosDocumentSensor`

AzureCosmosDocumentSensor. This task can be done after the research task in #185.

Acceptance Criteria:

BigQueryGetDataOperator: fails when the table has Date field

Apache Airflow Provider(s)

google

Versions of Apache Airflow Providers

apache-airflow-providers-google==6.3.0

Apache Airflow version

2.2.3

Operating System

Any

Deployment

Docker-Compose

Deployment details

No response

What happened

The operator airflow.providers.google.cloud.operators.bigquery.BigQueryGetDataOperator fails when the table for which data needs to be fetched has data field with the following error.

2022-02-24, 06:16:45 UTC] {warnings.py:109} WARNING - /usr/local/lib/python3.9/site-packages/***/providers/google/cloud/operators/bigquery.py:475: DeprecationWarning: The bigquery_conn_id parameter has been deprecated. You should pass the gcp_conn_id parameter.
  hook = BigQueryHook(

[2022-02-24, 06:16:47 UTC] {bigquery.py:489} INFO - Total extracted rows: 10
[2022-02-24, 06:16:47 UTC] {xcom.py:333} ERROR - Could not serialize the XCom value into JSON. If you are using pickle instead of JSON for XCom, then you need to enable pickle support for XCom in your *** config.
[2022-02-24, 06:16:47 UTC] {taskinstance.py:1700} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1329, in _run_raw_task
    self._execute_task_with_callbacks(context)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1455, in _execute_task_with_callbacks
    result = self._execute_task(context, self.task)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1514, in _execute_task
    self.xcom_push(key=XCOM_RETURN_KEY, value=result)
  File "/usr/local/lib/python3.9/site-packages/airflow/utils/session.py", line 70, in wrapper
    return func(*args, session=session, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 2135, in xcom_push
    XCom.set(
  File "/usr/local/lib/python3.9/site-packages/airflow/utils/session.py", line 67, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/xcom.py", line 100, in set
    value = XCom.serialize_value(value)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/xcom.py", line 331, in serialize_value
    return json.dumps(value).encode('UTF-8')
  File "/usr/local/lib/python3.9/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/local/lib/python3.9/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/lib/python3.9/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/local/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type date is not JSON serializable
[2022-02-24, 06:16:47 UTC] {taskinstance.py:1267} INFO - Marking task as FAILED. dag_id=example_async_bigquery_queries, task_id=get_data, execution_date=20220224T061606, start_date=20220224T061644, end_date=20220224T061647
[2022-02-24, 06:16:47 UTC] {standard_task_runner.py:89} ERROR - Failed to execute job 86 for task get_data

The data has been fetched but while pushing to XCOM Could not serialize the XCom value into JSON. If you are using pickle instead of JSON for XCom,

What you expected to happen

Expected to the return all the records properly.

How to reproduce

Create a table with date field column and try to fetch the records using BigQueryGetDataOperator

Parameterize example DAGs for integration tests

We want to run all the example DAGs in this repo on a weekly basis at least to start with (we can change frequency later on).

There is also a recent related AIP worth reading and adding your thoughts on it - https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-47+New+design+of+Airflow+System+Tests

Acceptance Criteria:

  • All the example DAGs in this repo should be updated so the parameters / env related values come from Environment Variables like google provider but not like Snowflake Provider. Here Snowflake example needs to be corrected.
  • Have one of the following that pushes all the example DAG in Gen2 cloud deployment, creates necessary connections and runs all the DAGs. And send a Slack message with the summary of the run :
    • (a) Scheduled CI job
    • (b) A master DAG that runs on a schedule in the same deployment - #124 addresses this point

Note: The DAG should contain cleanup tasks at the end to destroy all the resources, example nuking the BigQuery table as the last step.

Incompatibility with Airflow 2.2.4

Describe the bug
If you try to install airflow==2.2.4 with the astronomer-providers==1.0.0 it will run into a dependency version conflict.

To Reproduce
Steps to reproduce the behavior:

  1. Create a fresh virtualenv with Python 3.9: pyenv virtualenv 3.9.7 airflow-test
  2. Activate the virtualenv: pyenv virtualenv activate airlfow-test
  3. Install Airflow 2.2.4 with S3 extras and astronomer-provider 1.0.0: pip install "apache-airflow[s3]==2.2.4" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.2.4/constraints-3.9.txt" "astronomer-providers==1.0.0"
  4. You get an error halfway through:
ERROR: Cannot install astronomer-providers because these package versions have conflicting dependencies.

The conflict is caused by:
    aiobotocore 2.1.1 depends on botocore<1.23.25 and >=1.23.24
    The user requested (constraint) botocore==1.24.2

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

Expected behavior
It should install cleanly.

Additional context
Airflow 2.2.4 constraints: botocore==1.24.2
astronomer-providers 1.0.0 requirements: aiobotocore>=2.1.1
aiobotocore 2.1.1 requirements: botocore>=1.23.24,<1.23.25

There might be similar issues with the other boto dependencies... I have tried with 2.2.3 and 2.2.2 but there are similar issues with a different packages:
e.g. 2.2.3:

The conflict is caused by:
    astronomer-providers 1.0.0 depends on apache-airflow-providers-cncf-kubernetes>=3
    The user requested (constraint) apache-airflow-providers-cncf-kubernetes==2.2.0

e.g. 2.2.2:

The conflict is caused by:
    apache-airflow[s3] 2.2.2 depends on apache-airflow-providers-amazon; extra == "s3"
    astronomer-providers 1.0.0 depends on apache-airflow-providers-amazon>=3.0.0
    The user requested (constraint) apache-airflow-providers-amazon==2.4.0

Implement remaining S3 Sensors

As a follow-up to #14 we should complete the remaining operators/sensors in the s3 module for the Amazon provider. Please look at docs to see if there are other operators we should make Async (Principle - Only create Async operators for the “sync-version” of operators that do some level of polling; take more than a few seconds to complete).

Acceptance Criteria:

Implement Async `GCSObjectExistenceSensor`

Based on the research in #13 - Async GCSObjectExistenceSensor.

Async version of https://github.com/apache/airflow/blob/1008d8bf8acf459dbc692691a589c27fa4567123/airflow/providers/google/cloud/sensors/gcs.py#L30 using one of the following libraries:

https://github.com/talkiq/gcloud-aio/blob/master/storage/README.rst
https://github.com/omarryhan/aiogoogle
Official Python client does not support it yet: googleapis/google-cloud-python#3103

Acceptance Criteria:

Add guidelines for example DAG in CONTRIBUTING.rst

Some of the rules that can be added are:-

  • Include a long running query always in the example DAG.
  • Include a clean up step at the start of the example DAG so that there wont be failures if the resources are already present.
  • Run all the steps in example DAG even if a particular task fails.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.