Git Product home page Git Product logo

Comments (10)

nathadfield avatar nathadfield commented on June 15, 2024 1

@dondaum Looks reasonable from what I can tell. I'd suggest trying to get some eyes on it from the committers.

from airflow.

collinmcnulty avatar collinmcnulty commented on June 15, 2024

As a workaround, setting force_rerun=False and setting retries should allow another retry to pick up and monitor the original job instead of starting a new one.

from airflow.

shahar1 avatar shahar1 commented on June 15, 2024

What location do you provide as an input? (checking if it might be related to #37282)

from airflow.

nathadfield avatar nathadfield commented on June 15, 2024

As a workaround, setting force_rerun=False and setting retries should allow another retry to pick up and monitor the original job instead of starting a new one.

@collinmcnulty Thanks but this doesn't occur enough to warrant this change and would likely have it's own side effects because forcing a rerun is what we want to do in most scenarios.

from airflow.

nathadfield avatar nathadfield commented on June 15, 2024

What location do you provide as an input? (checking if it might be related to #37282)

It's EU, so I don't think it would be.

from airflow.

dondaum avatar dondaum commented on June 15, 2024

Maybe I have some time to look at it.

A few questions:

  • What authentication method do you use? Application Default Credentials, service account or a credential configuration file ?
  • Are you running other GCP-related asynchronous tasks on the triggerer when you see these exceptions (e.g. multiple BigQuery tasks at the same time)?
  • It seems that the exception is thrown after about 30 minutes. So do these exceptions only occur after some time or is it random and they also occur after a few minutes?

from airflow.

nathadfield avatar nathadfield commented on June 15, 2024

Maybe I have some time to look at it.

A few questions:

  • What authentication method do you use? Application Default Credentials, service account or a credential configuration file ?
  • Are you running other GCP-related asynchronous tasks on the triggerer when you see these exceptions (e.g. multiple BigQuery tasks at the same time)?
  • It seems that the exception is thrown after about 30 minutes. So do these exceptions only occur after some time or is it random and they also occur after a few minutes?

@dondaum This will be using ADC as authentication. Yes, there could be multiple BigQuery tasks running at the same time. I don't think that the time is a factor. The 30 mins in the log I gave might just be how long the query ran for because I have examples that occur in the space of a couple of minutes.

from airflow.

dondaum avatar dondaum commented on June 15, 2024

I tried to reproduce the exact error but with no success. I tried to reproduce it with the following DAG:

import datetime
import os

from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator
from airflow import DAG



WAIT_QUERY = """
DECLARE retry_count INT64;
DECLARE success BOOL;
DECLARE size_bytes INT64;
DECLARE row_count INT64;
DECLARE DELAY_TIME DATETIME;
DECLARE WAIT STRING;
SET retry_count = 2;
SET success = FALSE;


WHILE retry_count <=  3 AND success = FALSE DO
BEGIN
  SET row_count = (with a as (SELECT 1 as b) SELECT * FROM a WHERE 1 = 2);
  IF row_count > 0  THEN
    SELECT 'Table Exists!' as message, retry_count as retries;
    SET success = TRUE;
  ELSE
    SELECT 'Table does not exist' as message, retry_count as retries, row_count;
    SET retry_count = retry_count + 1;
--      WAITFOR DELAY '00:00:10';
    SET WAIT = 'TRUE';
    SET DELAY_TIME = DATETIME_ADD(CURRENT_DATETIME,INTERVAL 90 SECOND);
    WHILE WAIT = 'TRUE' DO
      IF (DELAY_TIME < CURRENT_DATETIME) THEN
         SET WAIT = 'FALSE';
      END IF;
    END WHILE;
  END IF;
END;
END WHILE;
"""


with DAG(
    dag_id=os.path.splitext(os.path.basename(__file__))[0],
    schedule=None,
    start_date=datetime.datetime(2024, 1, 1),
    catchup=False,
    tags=["testing"],
) as dag:
    
    for i in range(10):
        bq_task = BigQueryInsertJobOperator(
            task_id=f"debug_query_{i}",
            configuration={
                "query": {
                    "query": WAIT_QUERY,
                    "useLegacySql": False,
                    "priority": "BATCH",
                }
            },
            location="europe-west3",
            deferrable=True,
        )

Also, I set the retry option in the GCP connection to 0 so as not to implicitly retry on failure.

Could you perhaps create a DAG that reproduces the error? And maybe you could also check which apache-airflow-providers-google you are using?

My setup:

Apache Airflow
version                | 2.7.3                                                 
executor               | LocalExecutor                                         
task_logging_handler   | airflow.utils.log.file_task_handler.FileTaskHandler   
sql_alchemy_conn       | postgresql+psycopg2://airflow:airflow@postgres/airflow
dags_folder            | /opt/airflow/dags                                     
plugins_folder         | /opt/airflow/plugins                                  
base_log_folder        | /opt/airflow/logs                                     
remote_base_log_folder |                                                       
                                                                               

System info
OS              | Linux                                                                                                                                                         
architecture    | x86_64                                                                                                                                                        
uname           | uname_result(system='Linux', node='42d8cf034cfc', release='5.10.16.3-microsoft-standard-WSL2', version='#1 SMP Fri Apr 2 22:23:49 UTC 2021', machine='x86_64')
locale          | ('en_US', 'UTF-8')                                                                                                                                            
python_version  | 3.11.6 (main, Nov  1 2023, 14:02:22) [GCC 10.2.1 20210110]                                                                                                    
python_location | /usr/local/bin/python                                                                                                                                         
                                                                                                                                                                                

Tools info
git             | NOT AVAILABLE                                                                              
ssh             | OpenSSH_8.4p1 Debian-5+deb11u2, OpenSSL 1.1.1w  11 Sep 2023                                
kubectl         | NOT AVAILABLE                                                                              
gcloud          | NOT AVAILABLE                                                                              
cloud_sql_proxy | NOT AVAILABLE                                                                              
mysql           | mysql  Ver 8.0.35 for Linux on x86_64 (MySQL Community Server - GPL)                       
sqlite3         | 3.34.1 2021-01-20 14:10:07 10e20c0b43500cfb9bbc0eaa061c57514f715d87238f4d835880cd846b9ealt1
psql            | psql (PostgreSQL) 16.0 (Debian 16.0-1.pgdg110+1)                                           
                                                                                                             

Paths info
airflow_home    | /opt/airflow                                                                                                                                                            
system_path     | /root/bin:/home/airflow/.local/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin                                                          
python_path     | /home/airflow/.local/bin:/usr/local/lib/python311.zip:/usr/local/lib/python3.11:/usr/local/lib/python3.11/lib-dynload:/home/airflow/.local/lib/python3.11/site-packages:
                | /usr/local/lib/python3.11/site-packages:/opt/airflow/dags:/opt/airflow/config:/opt/airflow/plugins                                                                      
airflow_on_path | True                                                                                                                                                                    
                                                                                                                                                                                          

Providers info
apache-airflow-providers-amazon          | 8.10.0 
apache-airflow-providers-apache-beam     | 5.6.2  
apache-airflow-providers-celery          | 3.6.0  
apache-airflow-providers-cncf-kubernetes | 8.0.1  
apache-airflow-providers-common-sql      | 1.11.1 
apache-airflow-providers-daskexecutor    | 1.1.0  
apache-airflow-providers-dbt-cloud       | 3.7.0  
apache-airflow-providers-docker          | 3.8.0  
apache-airflow-providers-elasticsearch   | 5.1.0  
apache-airflow-providers-ftp             | 3.7.0  
apache-airflow-providers-google          | 10.16.0
apache-airflow-providers-grpc            | 3.3.0  
apache-airflow-providers-hashicorp       | 3.6.4  
apache-airflow-providers-http            | 4.10.0 
apache-airflow-providers-imap            | 3.5.0  
apache-airflow-providers-microsoft-azure | 8.1.0  
apache-airflow-providers-mysql           | 5.5.4  
apache-airflow-providers-odbc            | 4.1.0  
apache-airflow-providers-openlineage     | 1.2.0  
apache-airflow-providers-postgres        | 5.10.2 
apache-airflow-providers-redis           | 3.4.0  
apache-airflow-providers-sendgrid        | 3.4.0  
apache-airflow-providers-sftp            | 4.9.0  
apache-airflow-providers-slack           | 8.3.0  
apache-airflow-providers-snowflake       | 5.1.0  
apache-airflow-providers-sqlite          | 3.7.1  
apache-airflow-providers-ssh             | 3.10.1 

from airflow.

nathadfield avatar nathadfield commented on June 15, 2024

@dondaum Thanks but, as I mentioned, it isn't possible to replicate consistently. The error that is returned is a 502 HTTP error from Google which means the problem was on ultimately on their side when the trigger is trying to obtain impersonated credentials in order to check the status of a BigQuery job. It doesn't have anything to do with query times or whether the table exists or not.

Perhaps it is possible to simulate the exception that is received by the trigger though?

 ('Unable to acquire impersonated credentials', '<!DOCTYPE html>\n<html lang=en>\n  <meta charset=utf-8>\n  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">\n  <title>Error 502 (Server Error)!!1</title>\n  <style>\n    {margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px} > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_colo

Oh, and for clarity on the Google provider.

$ pip freeze | grep apache-airflow-providers-google
apache-airflow-providers-google==10.16.0

from airflow.

dondaum avatar dondaum commented on June 15, 2024

@nathadfield Thanks. I think I got it now.

I worked on a change that adds a retry in such cases. Can you perhaps have a look and check ?

from airflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.