Git Product home page Git Product logo

Comments (9)

chasinandrew avatar chasinandrew commented on June 11, 2024 2

Assuming based on the screenshot that this is a GCP environment, correct? Does this happen on every job or intermittent jobs?

Jobs can get stuck with the "RECEIVED" status when the instances within the managed instance group (MIG) are not running or have crashed. If service account onboarding has not been completed, the MIG could be in an unhealthy state. Can you confirm that the service account has been onboarded?

To check the MIGโ€™s health in the cloud console:

  1. Navigate to Compute Engine > Instance Groups
  2. Select your instance group then select the errors tab

To check the MIG's health using gcloud CLI:
gcloud compute instance-groups managed list-errors <MIG_NAME> --region=<REGION>

from trusted-execution-aggregation-service.

yanghuang1028 avatar yanghuang1028 commented on June 11, 2024

Hi @chasinandrew ,

Yes, it is a GCP environment and this issue happens on each job.

I checked the MIG's health, and there exists no error, only one warning.
image
image

BTW, we have 4 worker VM instances, and I found the 403 error in three of them. Some of them seem unstable that keep restarting. Could this be the key point ?
image
image

Thank you for quickly replying and it really helps !!!

from trusted-execution-aggregation-service.

chasinandrew avatar chasinandrew commented on June 11, 2024

No problem! This could be happening because of the unstable VMs. To help us replicate this could you provide the following info:

  1. Which aggregation service version do you have deployed?
  2. Can you please provide the terraform deployment parameters if they're available?
  3. Can you send the JSON in plaintext or file form with the request and response?
  4. If available, can you send the avro report and output_domain.avro?

from trusted-execution-aggregation-service.

yanghuang1028 avatar yanghuang1028 commented on June 11, 2024

Hi @chasinandrew ,

1.We used the latest repo(https://github.com/privacysandbox/aggregation-service) to deploy. So is the version v2.4.2 ?
image
2. The deployment parameters: dev.auto.tfvars.txt
3. Request & response: request&response.txt
4. Due that comment doesn't support to attach an avro file, I upload avro files to my github repo.
avro report
output_domain.avro

Our google cloud link is https://console.cloud.google.com/home/dashboard?project=ecs-1709881683838. but I don't know if you have the permission to access it.

Thank you for helping to delve into the issue~

from trusted-execution-aggregation-service.

chasinandrew avatar chasinandrew commented on June 11, 2024

Thanks @yanghuang1028! This 403 error can happen when onboarding is incomplete. Can you please fill out this onboarding form to register your domain and service account?

from trusted-execution-aggregation-service.

yanghuang1028 avatar yanghuang1028 commented on June 11, 2024

@chasinandrew We filled out the form a few weeks ago, and your team sent a email to us.

image

Oh, I see. We used a different service account to do this deployment. Could you help us to update the worker service account ?
our new worker service account is sa-worker-aggregation-service@ecs-1709881683838.iam.gserviceaccount.com

BTW, we just registered the domain in the production environment. If we do not register the domain of the staging environment, can the aggregation service correctly handle the reports from the staging environment(we can manually change chrome's settings to receive the reports from staging env now)?
our staging reporting site is https://adservice-1.stratus.qa.ebay.com/

Thanks again!

from trusted-execution-aggregation-service.

hostirosti avatar hostirosti commented on June 11, 2024

Hi @yanghuang1028, I recommend to communicate this information through our support email alias. I'll be hiding your previous comment to avoid having that information in the public.

@chasinandrew please move support conversations around onboarding to email.

Re your question on prod vs staging: Your service account is connected to the site that is onboarded --> if the same service account (in the same GCP project) is used to process your reports you'll be able to process them in staging / prod. If a different account is used a separate onboarding request will be required.

from trusted-execution-aggregation-service.

yanghuang1028 avatar yanghuang1028 commented on June 11, 2024

Hi @hostirosti @chasinandrew

Thanks for protecting our private infomation!

The separate onboarding request is completed, and the job can be processed now. However, the job threw a TRANSACTION_MANAGER_RETRIES_EXCEEDED error when processing.

{
    "job_status": "FINISHED",
    "request_received_at": "2024-05-16T01:19:59.234435Z",
    "request_updated_at": "2024-05-16T01:29:35.184066241Z",
    "job_request_id": "test05",
    "input_data_blob_prefix": "output/output_regular_reports_2024-04-24T02:38:04-07:00.avro",
    "input_data_bucket_name": "tracking_tf_state_bucket",
    "output_data_blob_prefix": "output/summary_report.avro",
    "output_data_bucket_name": "tracking_tf_state_bucket",
    "postback_url": "",
    "result_info": {
        "return_code": "PRIVACY_BUDGET_ERROR",
        "return_message": "com.google.aggregate.adtech.worker.exceptions.AggregationJobProcessException: Exception while consuming privacy budget. Exception message: TRANSACTION_MANAGER_RETRIES_EXCEEDED \n com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor.consumePrivacyBudgetUnits(ConcurrentAggregationProcessor.java:466) \n com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor.process(ConcurrentAggregationProcessor.java:329) \n com.google.aggregate.adtech.worker.WorkerPullWorkService.run(WorkerPullWorkService.java:142)\nThe root cause is: com.google.scp.operator.cpio.distributedprivacybudgetclient.TransactionEngine$TransactionEngineException: TRANSACTION_MANAGER_RETRIES_EXCEEDED \n com.google.scp.operator.cpio.distributedprivacybudgetclient.TransactionEngineImpl.proceedToNextPhase(TransactionEngineImpl.java:100) \n com.google.scp.operator.cpio.distributedprivacybudgetclient.TransactionEngineImpl.executeDistributedPhase(TransactionEngineImpl.java:196) \n com.google.scp.operator.cpio.distributedprivacybudgetclient.TransactionEngineImpl.executeCurrentPhase(TransactionEngineImpl.java:138)",
        "error_summary": {
            "error_counts": [],
            "error_messages": []
        },
        "finished_at": "2024-05-16T01:29:35.113618072Z"
    },
    "job_parameters": {
        "output_domain_blob_prefix": "domain/output_local_domain.avro",
        "output_domain_bucket_name": "tracking_tf_state_bucket",
        "attribution_report_to": "https://adservice-1.stratus.qa.ebay.com"
    },
    "request_processing_started_at": "2024-05-16T01:20:00.743721759Z"
}

The reports and domain.avro files are as followed:
avro report
output_domain.avro

BTW, where can I see the detail logs of each job processing on google cloud console ? I can't find it anywhere. Thanks a lot !

from trusted-execution-aggregation-service.

yanghuang1028 avatar yanghuang1028 commented on June 11, 2024

The job can be processed now, thanks a lot!

from trusted-execution-aggregation-service.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.