privacysandbox / aggregation-service Goto Github PK

This repository contains instructions and scripts to set up and test the Privacy Sandbox Aggregation Service

License: Apache License 2.0

Shell 1.08% HCL 0.49% Starlark 12.90% Dockerfile 0.28% Java 85.25%

aggregation-service's Issues

Job status is always RECEIVED

Hi team,

Our aggregation service is deployed successfully. But after creating a job, the job status is always RECEIVED. Do you have some clues about that ? our projectId is ecs-1709881683838

Thanks a lot~~

Configure CodeBuild Setup - getting error

When running the terraform code on step https://github.com/privacysandbox/aggregation-service/blob/main/build-scripts/aws/README.md#configure-codebuild-setup
I got the following error:

│ Error: error creating S3 bucket ACL for aggregation-service-artifacts: AccessControlListNotSupported: The bucket does not allow ACLs

To resolve this error i had to add to: build-scripts/aws/terraform/codebuild.tf
The following resource:

resource "aws_s3_bucket_ownership_controls" "artifacts_output_ownership_controls" {
  bucket = aws_s3_bucket.artifacts_output.id

  rule {
    object_ownership = "BucketOwnerEnforced"
  }
}

Could you provide encrypted sample report for testing?

I have the aggregation service set up, but our system to produce encrypted reports is not ready to go yet. This repo's sampledata directory has a sample report, but it is unencrypted and so only works with the local testing tool only, not with AWS Nitro Enclaves.

Could you provide, either in the repo or in a zip file in this thread, an encrypted sample output.avro and accompanying domain.avrp that we can use to test our AWS aggregation service to make sure everything is running properly?

403 errors when deploying aggregation-service

Hi team,

I’m trying to set up our deployment environment. But I encountered this error. Could you please help to look at it ? Thanks a lot !!!

These are the roles of our service accounts. Do I need to add some additional role permissions?
our projectId: ecs-1709881683838

Error: Error creating function: googleapi: Error 403: Could not create Cloud Run service dev-us-west2-worker-scale-in. Permission ‘iam.serviceaccounts.actAs’ denied on service account [worker-sa-aggregation-service@microsites-sa.iam.gserviceaccount.com](mailto:worker-sa-aggregation-service@microsites-sa.iam.gserviceaccount.com) (or it may not exist).
│
│   with module.job_service.module.autoscaling.google_cloudfunctions2_function.worker_scale_in_cloudfunction,
│   on ../../coordinator-services-and-shared-libraries/operator/terraform/gcp/modules/autoscaling/workerscalein.tf line 35, in resource “google_cloudfunctions2_function” “worker_scale_in_cloudfunction”:
│   35: resource “google_cloudfunctions2_function” “worker_scale_in_cloudfunction” {
│
╵
╷
│ Error: Error creating function: googleapi: Error 403: Could not create Cloud Run service dev-us-west2-frontend-service. Permission ‘iam.serviceaccounts.actAs’ denied on service account [[email protected]](mailto:[email protected]) (or it may not exist).
│
│   with module.job_service.module.frontend.google_cloudfunctions2_function.frontend_service_cloudfunction,
│   on ../../coordinator-services-and-shared-libraries/operator/terraform/gcp/modules/frontend/main.tf line 43, in resource “google_cloudfunctions2_function” “frontend_service_cloudfunction”:
│   43: resource “google_cloudfunctions2_function” “frontend_service_cloudfunction” {
│
╵
╷
│ Error: Error creating instance template: googleapi: Error 409: The resource ‘projects/ecs-1709881683838/global/instanceTemplates/dev-collector’ already exists, alreadyExists
│
│   with module.job_service.module.worker.google_compute_instance_template.collector,
│   on ../../coordinator-services-and-shared-libraries/operator/terraform/gcp/modules/worker/collector.tf line 49, in resource “google_compute_instance_template” “collector”:
│   49: resource “google_compute_instance_template” “collector” {

Confused about the output_domain.avro

Hi aggregation-service team,

I'm really confused about the file "output_domain.avro" used for producing a summary report locally. In your nodejs example(code), how can I generate a "output_domain.avro" for the aggregation report ?

Here is your sample doc: https://github.com/privacysandbox/aggregation-service/blob/main/docs/collecting.md#collecting-and-batching-aggregatable-reports

{
    "bucket": "\u0005Y"
}

Will this "output_domain.avro" work for your nodejs example ?

If convenient, could you explain what this domain file is generated according to ? Thanks a lot !!

Update Docs

The sample provided here is using an out of date shared_info which also doesn't contain a version.

Better to use the one from the sampledata dir - here is the plaintext

"{\"api\":\"attribution-reporting\",\"version\":\"0.1\",\"scheduled_report_time\":1698872400.000000000,\"reporting_origin\":\"http://adtech.localhost:3000\",\"source_registration_time\":1698796800.000000000,\"attribution_destination\":\"dest.com\",\"debug_mode\":\"enabled\",\"report_id\":\"b360383a-108d-4ae3-96bd-aecde1c3c30b\"}"

Which has an allowed version, an actual 'api' key and also has attribution_destination moved within shared_info.

Clarification on aggregated report batching and privacy budget exhaustion

Hi aggregation service team, we(Adform) are facing issues Privacy budget Exhaustion issue due to duplicate reports. We are following the batching criteria
mentioned at

aggregation-service/java/com/google/aggregate/adtech/worker/aggregation/engine/AggregationEngine.java

Line 94 in e555200

PrivacyBudgetUnit budgetUnitId =

and

aggregation-service/java/com/google/aggregate/privacy/budgeting/AttributionReportingPrivacyBudgetKeyGenerator.java

Line 37 in e555200

private String createPrivacyBudgetKey(SharedInfo sharedInfo) {

Based on the above rules, we tried to reverse engineer the batch data to check if we do have any duplicate reports across all our batch data but we couldn't find any .

We also looked at #35 and cross verified our assumption with the code as well.

Is there any other way we can have more debug information as to across which batches we have these duplicate reports with the same key.

Can you please provide any information on how to proceed with debugging this issue.

Aggregation service, ARA browser retries and duplicate reports

The way the browser and adtech's servers interact over the network makes it inherently unavoidable that some reports will be received by the adtech but not considered as such by the browser (e.g. when a timeout happens) and hence retried and received several times by the adtech; as is mentioned in your documentation:

The browser is free to utilize techniques like retries to minimize data loss.

Sometimes, these duplicate reports reach upwards of hundreds of reports each day, for several days (sometimes several months) in a row, all having the same report_id.
The aggregation service runs the no-duplicates rule basing itself on a combination of information:

Instead, each aggregatable report will be assigned a shared ID. This ID is generated from the combined data points: API version, reporting origin, destination site, source registration time and scheduled report time. These data points come from the report's shared_info field.
The aggregation service will enforce that all aggregatable reports with the same ID must be included in the same batch. Conversely, if more than one batch is submitted with the same ID, only one batch will be accepted for aggregation and the others will be rejected.

As an adtech company, when trying to provide timely reporting to clients, it is paramount to try and use all of the available information (in this case, reports) in order to have our reporting be as precise as possible.
In this scenario, however, if we try to batch together all of our reports for a chosen client on a chosen day, even by deduplicating all of the chosen day's reports through the report_id (or the overall shared_info) field, we may have a batch accepted on day 1, and then all subsequent batches for the next month be rejected because they all contain that same shared_info-based id.
This means that we have to check further back in the data for possible duplicate reports. To be able to implement this check in an efficient manner we would benefit from a more precise description of the retry policy, namely for how long the retries can happen.

I guess the questions this issue raises are as follows:

In what scenarii does a browser go for the aforementioned retries?
Is there a time limit for those retries (i.e. a date after the original report when the browser no longer retries sending a report)?
If there is not, could you please advise on a way for adtech companies to efficiently filter out duplicate reports without having to process all of their available reports for duplicate shared info values?
Also, the described problem of "duplicate retried" reports, but not only, makes us believe that adtechs would benefit from a modification to the way the AS handles duplicates. Indeed if the AS gracefully dropped the duplicates from the aggregation instead of failing the batch altogether, we wouldn't necessarily need to filter out such reports from a batch. Could this possibility be considered on your side?

Why must we supply a github_personal_access_token when building the AMI?

In the instructions for building the AMI (Building aggregation service artifacts), part of the instructions is to put a github_personal_access_token in codebuild.auto.tfvars.

Can you provide more information on this token?

What scopes are required? All of them?
Why do we need to supply a GitHub Personal Access Token? Is it to read something from GtHub?
I feel uncomfortable putting this sensitive token in AWS where anyone in my company can access it.

Missing required properties: jobKey

Hello,

While executing /createJob request with following payload
Please see an example below:
{ "job_request_id": "Job-1010", "input_data_blob_prefix": "reports/inputs/input.avro", "input_data_bucket_name": "test-android-sandbox", "output_data_blob_prefix": "reports/output/result_1.avro", "output_data_bucket_name": "test-android-sandbox", "job_parameters": { "output_domain_blob_prefix": "reports/domains/domain.avro", "output_domain_bucket_name": "test-android-sandbox", "debug_privacy_epsilon": 30 } }
The response of this request will be 202

When executing /getJob?job_request_id=Job-1010

{ "job_status": "IN_PROGRESS", "request_received_at": "2023-06-12T15:14:17.891601Z", "request_updated_at": "2023-06-12T15:14:23.222830Z", "job_request_id": "Job-1010", "input_data_blob_prefix": "reports/inputs/input.avro", "input_data_bucket_name": "test-android-sandbox", "output_data_blob_prefix": "reports/output/result_1.avro", "output_data_bucket_name": "test-android-sandbox", "postback_url": "", "result_info": { "return_code": "", "return_message": "", "error_summary": { "error_counts": [], "error_messages": [ "Missing required properties: jobKey" ] }, "finished_at": "1970-01-01T00:00:00Z" }, "job_parameters": { "debug_privacy_epsilon": "30", "output_domain_bucket_name": "test-android-sandbox", "output_domain_blob_prefix": "reports/domains/domain.avro" }, "request_processing_started_at": "2023-06-12T15:14:23.133071Z" }

The error is Missing required properties: jobKey
The job stays in status IN_PROGRESS

When running same /createJob request without the job_request_id property -
the response from /createJob will be:

{ "code": 3, "message": "Missing required properties: jobRequestId\r\n in: {\n \"input_data_blob_prefix\": \"reports/inputs/input.avro\",\n \"input_data_bucket_name\": \"test-android-sandbox\",\n \"output_data_blob_prefix\": \"reports/output/result_1.avro\",\n \"output_data_bucket_name\": \"test-android-sandbox\",\n \"job_parameters\": {\n \"output_domain_blob_prefix\": \"reports/domains/domain.avro\",\n \"output_domain_bucket_name\": \"test-android-sandbox\"\n }\n}", "details": [ { "reason": "JSON_ERROR", "domain": "", "metadata": {} } ] }

Ability to query key aggregate

Hello,

One interesting evolution of the aggregation service would be to enable querying aggregate of keys. I think this was mentioned in the aggregate attribution API at a time when the aggregation was supposed to be performed by MPC rather than TEEs.
In other words, I would love to be able to query a bit mask (eg for a 8 bit key, 01100*01 would be 01100101 and 01100001).
This would enable a greater flexibility for decoding (ie chosing which encoded variables to get depending on the number of reports), and negate the need to adapt the encoding depending on the expected traffic to the destination website.
Thanks!
P.S. I can cross-post on https://github.com/WICG/attribution-reporting-api if needed.

Aggregation Service: request to publish complete set of binaries [with release]

For Aggregation Service releases (e.g. Aggregation Service v2.0.0 ), can a more complete set of binaries be published? The use-case is to enable adtechs to more easily customize and build Aggregation Service AMI images to meet adtech deployment requirements.

For Aggregation Service v2.0.0 this set would include:

EIF image (with matching PCR0 Hash)
vsockproxy RPM
aggregate_worker_rpm RPM

feedback on deploying aggregation service

I got an api gateway error The API with ID my-api-id doesn’t include a route with path /* having an integration arn:aws:lambda:us-east-1:my-aws-account-id:function:stg-create-job. on aws console, after deploying aggregation service using terraform.

I changed Source ARN of lambda's permisson from arn:aws:execute-api:us-east-1:my-aws-account-id:my-api-id/*/** to arn:aws:execute-api:us-east-1:my-aws-account-id:my-api-id/*/*/v1alpha/getJob, and it solved the error.
https://github.com/privacysandbox/control-plane-shared-libraries/blob/9efe5591acc18e46263399d9785432a146d9675c/operator/terraform/aws/modules/frontend/api_gateway.tf#L62

Debugging support in the aggregation service: Feedback Requested

Hi,

The Aggregation service team is looking for your feedback to improve debugging support in the service.

Adtech can already get metrics for their jobs (status, errors, execution time etc.) from the Cloud metadata (DynamoDb in AWS and Spanner on GCP).

We are exploring other metrics, traces and logs that can provide a better understanding of the job processing within the Trusted Execution Environment without impacting privacy. We are considering providing CPU and memory metrics and total execution time traces for the adtech deployment and will benefit from your feedback on other metrics that adtech may find useful.

We are also considering adding useful logs which can give information about the job processing for debugging purposes such as ‘Job at data reading stage’ etc.. This is subject to review and approval considering user privacy.

Your inputs will be reviewed by the Privacy Sandbox team. We welcome any feedback on debugging Aggregation Service jobs.

Thank you!

Discussion - debugging support for summary reports

We are working on adding the possibility to generate debug summary reports from encrypted aggregatable reports with the AWS based aggregation service. This capability will be time-limited and be phased out at a later time.

We would like to hear from you what capabilities you'd like to see in these debug summary reports.

Some ideas we are considering:

return unnoised metric and noise that would have been applied to the metric with the given epsilon
return metrics that have not been listed in the output domain with an annotation hinting to the omission

Questions:

What helps you to understand the inner workings of the system better and helps you to develop experiments using the attribution reporting APIs?
How are you currently generating your output domains and what tools would help you to simplify this process?

Decode the final `output.json` from the `LocalTestingTool`

Hi,

I have managed to get the full flow running to aggregate debug reports in the browser and process them locally with the provided tool.

The final file output I have is:

[{"bucket": "d0ZHnRzgTJMAAAAAAAAAAA==", "metric": 195000}]

Which looks correct in terms of there should be a single key and the metric value is correct.

The issue I have is now decoding this bucket to get my original input data, I assumed the steps would be:

base64 decode
CBOR decode

But this causes the following error:

_cbor2.CBORDecodeEOF: premature end of stream (expected to read 23 bytes, got 15 instead)

Would really appreciate any help on how to get the input data back out of this bucket.

Best,
D

Invalid value for member: issue when trying to deploy Aggregation Service to GCP

Hello!

I am following the guide outlined here: https://github.com/privacysandbox/aggregation-service/blob/main/docs/gcp-aggregation-service.md#adtech-setup-terraform

And I am now at the stage where I am trying to deploy the individual environments:

GOOGLE_IMPERSONATE_SERVICE_ACCOUNT="aggregation-service-deploy-sa@ag-edgekit-prod.iam.gserviceaccount.com" terraform plan

However I am faced with this error:

╷
│ Error: invalid value for member (IAM members must have one of the values outlined here: https://cloud.google.com/billing/docs/reference/rest/v1/Policy#Binding)
│
│   with module.job_service.module.autoscaling.google_cloud_run_service_iam_member.worker_scale_in_sched_iam,
│   on ../../coordinator-services-and-shared-libraries/operator/terraform/gcp/modules/autoscaling/workerscalein.tf line 104, in resource "google_cloud_run_service_iam_member" "worker_scale_in_sched_iam":
│  104:   member   = "serviceAccount:${var.worker_service_account}"
│
╵
╷
│ Error: invalid value for member (IAM members must have one of the values outlined here: https://cloud.google.com/billing/docs/reference/rest/v1/Policy#Binding)
│
│   with module.job_service.module.worker.google_spanner_database_iam_member.worker_jobmetadatadb_iam,
│   on ../../coordinator-services-and-shared-libraries/operator/terraform/gcp/modules/worker/main.tf line 98, in resource "google_spanner_database_iam_member" "worker_jobmetadatadb_iam":
│   98:   member   = "serviceAccount:${local.worker_service_account_email}"
│
╵
╷
│ Error: invalid value for member (IAM members must have one of the values outlined here: https://cloud.google.com/billing/docs/reference/rest/v1/Policy#Binding)
│
│   with module.job_service.module.worker.google_pubsub_subscription_iam_member.worker_jobqueue_iam,
│   on ../../coordinator-services-and-shared-libraries/operator/terraform/gcp/modules/worker/main.tf line 104, in resource "google_pubsub_subscription_iam_member" "worker_jobqueue_iam":
│  104:   member       = "serviceAccount:${local.worker_service_account_email}"
│
╵

I am new to terraform and have not been able to find a way to log the value of serviceAccount:${var.worker_service_account} & serviceAccount:${local.worker_service_account_email}.

Any help here would be greatly appreciated!

EDIT: The below seems to show that TF state does correctly store the two service accounts created in the adtech_setup step.

terraform state show 'module.adtech_setup.google_service_account.deploy_service_account[0]'

# module.adtech_setup.google_service_account.deploy_service_account[0]:
resource "google_service_account" "deploy_service_account" {
    account_id   = "aggregation-service-deploy-sa"
    disabled     = false
    display_name = "Deploy Service Account"
    email        = "aggregation-service-deploy-sa@ag-edgekit-prod.iam.gserviceaccount.com"
    id           = "projects/ag-edgekit-prod/serviceAccounts/aggregation-service-deploy-sa@ag-edgekit-prod.iam.gserviceaccount.com"
    member       = "serviceAccount:aggregation-service-deploy-sa@ag-edgekit-prod.iam.gserviceaccount.com"
    name         = "projects/ag-edgekit-prod/serviceAccounts/aggregation-service-deploy-sa@ag-edgekit-prod.iam.gserviceaccount.com"
    project      = "ag-edgekit-prod"
    unique_id    = "106307936135287037408"
}

How to generate output_domain.avro when the values that make up your bucket are dynamic (example: creative id)

Hi guys, does anyone here have any strategy on how to generate the output_domain.avro when the values that make up the bucket are dynamic (example: a creative id)? It's just that we are implementing attribution-reporting in our company and our report key is not fixed as we already use the ID of the creatives, this means that we cannot map the keys (the volume of creatives we have would be very large)

Example do código que thet define the value of the keys (bucket):

const registerSource = (req, res) => {
    if (req.headers['attribution-reporting-eligible']) {
        let SOURCE_PARAMS = {
            source_event_id: Date.now().toString(), 
            destination: req.query.destination, 
            expiry: 2592000, 
            event_report_window: 3600, 
            priority: "0", 
            aggregation_keys: { // Defining the value of the keys (bucket)
                creativeId: Utils.toHex(req.query.creativeId), 
                lineItemId: Utils.toHex(req.query.lineItemId), 
                pixelId: Utils.toHex(0),
            },
            aggregatable_report_window: "86400", 
            filter_data: {
                creativeId: [`${req.query.creativeId}`], 
                lineItemId: [`${req.query.lineItemId}`] 
            },
            debug_key: "260893",
        }


        res.set('Attribution-Reporting-Register-Source', JSON.stringify(SOURCE_PARAMS));
        res.status(200).send('OK');
    } else {
        res.statusCode = 400;
        res.end('Invalid request');
    }
}

Support for aggregation over a set of keys

Hello,

Currently, the aggregation service does a sum of the values on the set of keys which is declared in the output domain files. This explicit declaration of keys mean that the encoding must be well-done at report creation time (eg on the source and trigger side for ARA or in Shared Storage for Private Aggregation API). This is quite inflexible in its use.

To bring in some flexibility, I propose to add a system to the aggregation service where a predeclared set of keys would be summed by the aggregation service. This set of keys would constitute a partition of the key space for the service not to violate the DP limit. A simple check done by the aggregation service could reject the query if a key is in two sets.

Here is what the output domain file would look like. I am not sure "super bucket" is a great name, but this is the only I could think of right now.

Super bucket	Bucket
0x123	0x456
0x123	0x789
0x124	0xaef
0x125	0x12e

The aggregation service would provide the output only on the "super buckets".

The operational benefits of this added flexibility would be huge. Currently, one has to decide on an encoding before knowing what one can measure. For ARA or PAA for Fledge, this means having a very good idea before hand of the size and the performance of the campaign. When the campaign is running, then adjustment have to be made if the volume estimate was not good (or if the settings of the campaign are changed). Encoding change can be difficult to track, especially in ARA where sources and triggers both contribute to the keys, but at different point in time. This proposal allows to have a fixed encoding, and adjust after the fact (using the volume of reports as a proxy) the encoding actually used.

Aggregation Service with Private Aggregation API

When an aggregatable report is created by sendHistogramReport() (i.e. called inside reportWin function) it contains shared info without attribution_destination nor source_registration_time. This seems to be logical as these keys are strictly related with attribution logic. Example:

"shared_info": "{\"api\":\"fledge\",\"debug_mode\":\"enabled\",\"report_id\":\"9ae1a0d0-8cf5-4951-b752-e932bf0f7705\",\"reporting_origin\":\"https://fledge-eu.creativecdn.com\",\"scheduled_report_time\":\"1668771714\",\"version\":\"0.1\"}"

Aggregation service deployment using user provided vpc fails

When specifying "enable_user_provided_vpc = true", creation of the environment following the instructions at https://github.com/privacysandbox/aggregation-service/tree/main#set-up-your-deployment-environment
fails with error:
Out of index vpc[0], 182: dynamodb_vpc_endpoint_id = module.vpc[0].dynamodb_vpc_endpoint_id

At file: terraform/aws/applications/operator-service/main.tf
Lines 182 & 183 refers to module.vpc[0]
While module.vpc is not set when "enable_user_provided_vpc = true"
module "vpc" {
count = var.enable_user_provided_vpc ? 0 : 1

Staging environments PRIVACY_BUDGET_AUTHORIZATION_ERROR

Hi team,

We enrolled https://ebayadservices.com/ as our production environment a few weeks ago, and confirmed with your team that it was completed.

and we noticed this sentence on your document: Your staging, beta, QA and test environments will be automatically enrolled if they use the same site as your production environment.

However, when we use https://staging.ebayadservices.com/ to do the tests, the job failed to pass the authorization. Could you please help to investigate this issue?
BTW, because our company’s testing environment does not allow external sites, we use a internal proxy to access https://staging.ebayadservices.com/. I don't know if this is the root cause of this issue.

The avro files:
report avro
domain avro
The api response are as followed.

{
    "job_status": "FINISHED",
    "request_received_at": "2024-05-22T02:15:53.301731Z",
    "request_updated_at": "2024-05-22T02:16:02.790976116Z",
    "job_request_id": "test11",
    "input_data_blob_prefix": "output/output_regular_reports_2024-05-21T19:12:54-07:00.avro",
    "input_data_bucket_name": "tracking_tf_state_bucket",
    "output_data_blob_prefix": "output/summary_report.avro",
    "output_data_bucket_name": "tracking_tf_state_bucket",
    "postback_url": "",
    "result_info": {
        "return_code": "PRIVACY_BUDGET_AUTHORIZATION_ERROR",
        "return_message": "com.google.aggregate.adtech.worker.exceptions.AggregationJobProcessException: Aggregation service is not authorized to call privacy budget service. This could happen if the createJob API job_paramaters.attribution_report_to does not match the one registered at enrollment. Please verify and contact support if needed. \n com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor.consumePrivacyBudgetUnits(ConcurrentAggregationProcessor.java:451) \n com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor.process(ConcurrentAggregationProcessor.java:329) \n com.google.aggregate.adtech.worker.WorkerPullWorkService.run(WorkerPullWorkService.java:142)\nThe root cause is: com.google.scp.operator.cpio.distributedprivacybudgetclient.TransactionEngine$TransactionEngineException: PRIVACY_BUDGET_CLIENT_UNAUTHORIZED \n com.google.scp.operator.cpio.distributedprivacybudgetclient.TransactionEngineImpl.createTransactionEngineException(TransactionEngineImpl.java:203) \n com.google.scp.operator.cpio.distributedprivacybudgetclient.TransactionEngineImpl.proceedToNextPhase(TransactionEngineImpl.java:67) \n com.google.scp.operator.cpio.distributedprivacybudgetclient.TransactionEngineImpl.executeDistributedPhase(TransactionEngineImpl.java:196)",
        "error_summary": {
            "error_counts": [],
            "error_messages": []
        },
        "finished_at": "2024-05-22T02:16:02.778068915Z"
    },
    "job_parameters": {
        "output_domain_blob_prefix": "domain/output_local_domain.avro",
        "output_domain_bucket_name": "tracking_tf_state_bucket",
        "attribution_report_to": "https://staging.ebayadservices.com"
    },
    "request_processing_started_at": "2024-05-22T02:15:55.674601807Z"
}

Aggregation job failing in AWS with error DECRYPTION_KEY_NOT_FOUND

I am able to trigger the aggregation job with /createJob endpoint deployed via terraform in aws. While running the /getJob with the request id, I am getting below error:

"result_info": { "return_code": "REPORTS_WITH_ERRORS_EXCEEDED_THRESHOLD", "return_message": "Aggregation job failed early because the number of reports excluded from aggregation exceeded threshold.", "error_summary": { "error_counts": [ { "category": "DECRYPTION_KEY_NOT_FOUND", "count": 1, "description": "Could not find decryption key on private key endpoint." }, { "category": "NUM_REPORTS_WITH_ERRORS", "count": 1, "description": "Total number of reports that had an error. These reports were not considered in aggregation. See additional error messages for details on specific reasons." } ], "error_messages": [] }, "finished_at": "2024-05-0

I could see @ydennisy also had similar issue but could not find the solution for it.

Aggregation service setup notes, snags & suggestions.

Hello All!

Having spent the past few days on trying to get the AS live, I have been jotting down various questions, suggestions & bugs which I think could be a great addition to the documentation and workflow.

Full architecture diagram

Maybe for those who use terraform in their project this is not required, but we do not use terraform and essentially followed the instructions to get all the resources built. I have since had to traverse the GCP console to try and understand what the scripts created. A high level overview diagram with the main data flows, table names etc would be extremely useful.

Resource naming

Similar to the point above, the terraform scripts are spread over many files so it is not clear exactly what will be created. I think it would be great to have a single file config showing all the names of the resources as they are very obscure in the context of our overall infra, for example prod-jobmd, is a name of a newly created Cloud Spanner instance, which is a pretty unhelpful name. At the very least everything should be prefixed with aggregation-service, or even better allow users to transparently set this as a first step.

Resource costs

It would be good to have an understanding of the cost of the full set up at idle, and maybe have some suggestions for development and staging setups which can minimise costs by using more serverless infra for example.

Cloud Function / Run

I would suggest to drop the use of cloud functions and migrate fully to cloud run, the docs seems to use these interchangeable and although they sort of are (gen2 functions are powered by cloud run), I think this can cause extra confusion. There is also a small typo on the endpoint:

This is the value in the docs

https://<environment>-<region>-frontend-service-<cloud-funtion-id>-uc.a.run.app/v1alpha/createJob

But -uc. was -ew. in my case, so this does not seem to a value which can be hardcoded in the docs in this manner.

Known errors and solutions

Running the jobs stores a nice error in the DB, which is awesome! But even with this nice error it would be great to have a document to show common errors and their solutions. For example my latest error is:

{"errorSummary":{"errorCounts":[{"category":"DECRYPTION_KEY_NOT_FOUND","count":"445","description":"Could not find decryption key on private key endpoint."},{"category":"NUM_REPORTS_WITH_ERRORS","count":"445","description":"Total number of reports that had an error. These reports were not considered in aggregation. See additional error messages for details on specific reasons."}]},"finishedAt":"2024-04-30T13:17:24.233681575Z","returnCode":"REPORTS_WITH_ERRORS_EXCEEDED_THRESHOLD","returnMessage":"Aggregation job failed early because the number of reports excluded from aggregation exceeded threshold."}

Which is very clear - but still does not leave me any paths open to try and rectify the issue apart from troubling people over email or in this repo :)

Some missing configuration

This was addressed in #48 but needs to be added to the repo.

Show data conversion flows

There are quite a few flows in which data must be converted from one format to another, for example some hashed string into a byte array, whilst it is possible to figure this out given some disparate pieces of information available in the repository it would be very useful to have a few examples for various platforms, eg:

-- Convert hashes to domain avro for processing.
CAST(FROM_HEX(SUBSTR(reports.hashed_key, 3)) AS BYTES) AS bucket

I hope you do not mind if I keep updating this issue as I hopeful near completion of getting the service up!

All the best!
D

Aggregation Service: Run a job without output domain. Unable to set domain_optional flag

I am trying to process a job without an output domain. I found the domain_optional flag in the AggregationWorkerArgs class (link). I can’t set this flag as a JobParameter. Can you guide me on how to set the flag?

How to mock up Hundreds of thousands reports for Stress Test

Hi team,

Our aggregation service is running successfully now, and we plan to daily use it when our app releases.

But before the release, we have to do some stress tests on it. To simulate real business scenarios, we need to mock 400k aggregatable reports for aggregation service to decrypt. Is there any convenient way for us to create so many reports for testing?
Currently, I have to manually register souce & trigger event to send an aggregatable report to GCS, which is really inefficient...

Thanks,
Yang

Aggregation Service scaling needs

We have established a fully automated workflow that collects ARA production reports, processes them, and forwards them to the Aggregation Service. Additionally, we are considering utilizing PAA to process bid request data to enhance our bidding models by collecting data about lost auctions. However, this approach would substantially increase the workload on the Aggregation Service (AS), as the volume of bid requests far exceeds the data used for attribution. Specifically, we exceed 400 million rows with 200 million domain buckets as indicated in the aggregation service sizing guide.

We used 1 day of our prod bidding data as input and we modified aggregation service tool to generate valid PAA reports with debug mode enabled, In the end, we had approximately 2.23 billion reports with an associated domain file of 685 Million distinct buckets.

We cannot batch PAA reports the same way we do for Attribution Reporting summary reports. For ARA summary reports we group individual reports by shared_info.attribution_destination to create report batches, for each group we create the associated domain by listing all the possible buckets of the advertiser identified using shared_info.attribution_destination field.

There is no such field in PAA, the only remaining batching option we have is to batch by the reception time, either daily or hourly, by design, the more data we aggregate the less noise we have so it’s always better to launch a daily aggregation. to stress test the AS we first split our daily data into 100 batches and run 100 different aggregations we then try to scale up and run a daily aggregation by first using the domain file for the whole day. The number of 100 was picked arbitrarily.

Attempt 1: Splitting both of reports and domains

Daily data was split into 100 batches resulting in approx. 22 Million reports and 6.8 Million domains per batch, this falls within the instance recommendation matrix.

AWS setup

The configuration used m5.12xlarge EC2 instances with a maximum auto-scaling capacity of 15 instances. A custom script simultaneously triggered all 100 aggregation jobs, which were executed with debug mode enabled

Results

Aggregation jobs were launched sequentially All the executions completed with the status DEBUG_SUCCESS_WITH_PRIVACY_BUDGET_EXHAUSTED except for one batch, which finished with a SUCCESS status

Each execution took about 30~35min to finish, the whole aggregation took approximately 4h to execute.

Number of visible messages in AWS SQS

The graph represents the number of jobs that remain to be processed on AWS. The first query was received at approx 11:50 and the last job finished at 15:48.

Note:

Almost all of our executions completed with DEBUG_SUCCESS_WITH_PRIVACY_BUDGET_EXHAUSTED, the batching strategy that we used for the load test is not viable in production.

Attempt 2: 100 batches with a unique domain

As we wanted to run the aggregation on the entire domain we only batched the reports and we kept the domain as it is, it resulted in 100 aggregation jobs with 22 million input reports and 684 Million input domains.

Job executions resulted in INPUT_DATA_READ_FAILED error, AS logs aren’t very explicit but this seems to be related to the domain being too large. We used an m5.12xlarge and then m5.24xlarge instances, results were the same. Below is the job stack trace from AS.

com.google.aggregate.adtech.worker.exceptions.AggregationJobProcessException: Exception while reading domain input data.
com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor.process(ConcurrentAggregationProcessor.java:305)
com.google.aggregate.adtech.worker.WorkerPullWorkService.run(WorkerPullWorkService.java:142)
com.google.common.util.concurrent.AbstractExecutionThreadService$1.lambda$doStart$1(AbstractExecutionThreadService.java:57)
The root cause is: software.amazon.awssdk.services.s3.model.S3Exception: The requested range is not satisfiable (Service: S3, Status Code: 416, Request ID: XXXXXXXXXXXXXXXX, Extended Request ID: XXXXXXXXXXXXXXXX
software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:125)
software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:82)
software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:60)"

Load test conclusions

When inputting report/domain volumes that fall within the instance recommendation matrix, aggregation finishes successfully with reasonable time frames that are not far from those shared here
When the input report/domain volume is above those mentioned in the sizing guidance, the aggregation service encounters failures even with the largest instances. It poses a problem for Criteo as it puts at risk our target use case for the Private Aggregation API
The sizing shared in the doc is the true maximum of the AS, any use case requiring more domains should be revisited
As we have seen before when investigating ARA issues but even more pronounced here (because of new errors that we had not encountered so far), the errors of the AS are hard to understand and do not tell us why the jobs are failing.

To address these issues and improve the feasibility for our use case, the following actions are recommended

Review and improve the scalability of the Aggregation Service to handle higher volumes of data, both in terms of reports and especially domains.
Enhance the error messages and logs to provide more explicit information about the causes of failures. This will help in diagnosing and addressing issues more efficiently.

Clarifications on aggregation service batches + enhanced debugging possibilities

Hello aggregation service team,

We (Criteo) would like to seek clarification on a couple of points to ensure we have a comprehensive understanding of certain features.
Your insights will greatly assist us in optimizing our utilization of the platform:

Batch Size Limit (30k reports):
Could you kindly provide more details about the batch size limit of 30,000?
We are a little unsure as to how this limit behaves: it is our understanding that the aggregation service will expect loads of up to tens (even hundreds) of thousands of reports. However when we provide it with batches of 50k+ reports, our aggregations fail.
Is the limit of 30k a limit that is to be enforced per avro file within the batch? Per batch overall?
If it is per overall batch, is there any kind of suggestion on your side to aggregate batches of more than 30k reports?
If we need to split these larger aggregations over several smaller requests, that will greatly increase the noise levels we see in our final results, and would work against the idea of the aggregation service, which encourages adtechs to aggregate as many reports as possible to increase privacy.
Understanding the specifics of this limit should greatly help us in tailoring our processes more effectively.
Debug Information on Privacy Budget Exhaustion:
We've been considering ways to enhance our debugging capabilities, especially in situations where the privacy budget is exhausted. Would it be possible to obtain more detailed debug information in such cases, specifically regarding the occurrence of duplicates? We believe that having for instance the report_ids of the duplicates wouldn't compromise privacy, and would significantly contribute to our troubleshooting efforts.

getting service error without explanation when using aggregation service

Hi
i have enrolled and managed to deploy the aggregation service and it looks ok (i see metrics logs and everything)
i do however have some questions:

i have some reports (with and without clear text option) i got a domain file, i took both of them and tried to use them in the local testing tool - everything looked ok- got a good output i have used the non-encrypted output
then, i took the reports and the domain file and use them with the deployed aggregation service (of course this time encrypted since the local tool doesnt accept encrypted files)
i got the following error (in this example i have sent only 1 report but also when i sent 200 i get the same error but with 200 as count):
**"result_info": {
"return_code": "SUCCESS_WITH_ERRORS",
"return_message": "Aggregation job successfully processed but some reports have errors.",
"error_summary": {
"error_counts": [
{
"category": "SERVICE_ERROR",
"count": 1,
"description": "Internal error occurred during operation."
},
{
"category": "NUM_REPORTS_WITH_ERRORS",
"count": 1,
"description": "Total number of reports that had an error. These reports were not considered in aggregation. See additional error messages for details on specific reasons."
}
],
"error_messages": []
}, **
the output report contains all the keys from the domain file but all of the metrics are just pure noise.. also i dont understand this error message, and i think it will be great to get more elaborate error message (since it is empty and doesnt give anymore info)
im using the latests 2.4.2 version
im really stuck right now can you help?
looking at the auto scailing group for the Aggregation Service i saw that no policy for auto scaling has been created after deploy (im looking in AWS->auto scaling groups -> my service -> automatic scaling or instance manager, however when sending some jobs i did see that the instnace number went up and then down.. what am i missing?
do i need to create those policies myself? do you have a recommendations on what metrics to use for auto scaling? including thresholds?
thanks!!

Error using LocalTestingTool_2.0.0.jar with sampledata

I am trying to follow the instructions in Testing locally using Local Testing Tool but when I run the following command with the sampledata:

java -jar LocalTestingTool_2.0.0.jar \
--input_data_avro_file sampledata/output_debug_reports.avro \
--domain_avro_file sampledata/output_domain.avro \
--output_directory .

I get the error below:

2023-10-31 12:21:57:506 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.WorkerPullWorkService - Aggregation worker started
2023-10-31 12:21:57:545 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.WorkerPullWorkService - Item pulled
2023-10-31 12:21:57:555 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor - Reports shards detected by blob storage client: [output_debug_reports.avro]
2023-10-31 12:21:57:566 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor - Reports shards to be used: [DataLocation{blobStoreDataLocation=BlobStoreDataLocation{bucket=/Users/jonaquino/projects/aggregation-service/sampledata, key=output_debug_reports.avro}}]
2023-10-31 12:21:57:566 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.domain.OutputDomainProcessor - Output domain shards detected by blob storage client: [output_domain.avro]
2023-10-31 12:21:57:567 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.domain.OutputDomainProcessor - Output domain shards to be used: [DataLocation{blobStoreDataLocation=BlobStoreDataLocation{bucket=/Users/jonaquino/projects/aggregation-service/sampledata, key=output_domain.avro}}]
2023-10-31 12:21:57:575 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor - Job parameters didn't have a report error threshold configured. Taking the default percentage value 10.000000
return_code: "REPORTS_WITH_ERRORS_EXCEEDED_THRESHOLD"
return_message: "Aggregation job failed early because the number of reports excluded from aggregation exceeded threshold."
error_summary {
  error_counts {
    category: "REQUIRED_SHAREDINFO_FIELD_INVALID"
    count: 1
    description: "One or more required SharedInfo fields are empty or invalid."
  }
  error_counts {
    category: "NUM_REPORTS_WITH_ERRORS"
    count: 1
    description: "Total number of reports that had an error. These reports were not considered in aggregation. See additional error messages for details on specific reasons."
  }
}
finished_at {
  seconds: 1698780117
  nanos: 679576000
}

CustomMetric{nameSpace=scp/worker, name=WorkerJobCompletion, value=1.0, unit=Count, labels={Type=Success}}
2023-10-31 12:21:57:732 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.WorkerPullWorkService - No job pulled.

aggregation-service-artifacts-build CodeBuild build AMI fails on yum lock

Running the step "Building artifacts" from https://github.com/privacysandbox/aggregation-service/blob/main/build-scripts/aws/README.md#building-artifacts
To build the artifacts on region: eu-west-1

The CodeBuild failed with the below error:

amazon-ebs.sample-ami: Loaded plugins: extras_suggestions, langpacks, priorities, update-motd

754 | ==> amazon-ebs.sample-ami: Existing lock /var/run/yum.pid: another copy is running as pid 3465.
755 | ==> amazon-ebs.sample-ami: Another app is currently holding the yum lock; waiting for it to exit...

A Cloud Migration Tool for Aggregation Service: Feedback Requested

Hi all!

The Aggregation service team is currently exploring options for adtechs who may want to migrate from one cloud provider to another. This gives adtechs flexibility in using a cloud provider of their choice to optimize for cost or other business needs. Our proposed migration solution would enable adtechs to re-encrypt their reports from a source cloud provider (let’s call this Cloud A) to a destination cloud provider (let’s call this Cloud B) and enable them to use Cloud B to process reports originally encrypted for Cloud A as part of the migration. After migration is completed, use of Cloud A for processing reports will be disabled and the adtech will only be able to use Cloud B to process their reports.

In the short-term, this solution will support migration of aggregation service jobs from AWS to GCP and vice versa. As we support more cloud options in the future, this solution would be extensible to moving from any supported cloud provider to another.

Depiction of the re-encryption flow:

For any adtechs considering a migration, we encourage completing this migration before third-party cookie deprecation to take advantage of feature benefits such as:

Apples to apples comparison using additional budget: We will allow adtechs to process the same report on both Cloud A and Cloud B during migration.
Flexible migration windows: We will not enforce a timeline by which adtechs need to complete migration.

After third-party cookie deprecation, we plan to continue to support cloud migration with the re-encryption feature, but may not be able to give the additional benefits outlined above to preserve privacy.

We welcome any feedback on this proposal.

Thank you!

Build Feature: GCP Build to upload zips to GCS

Hello,

I'm trying to build and deploy images based on the steps here:
https://github.com/privacysandbox/aggregation-service/blob/2b3d5c450d0be4e2ce0f4cb49444f3f049508917/build-scripts/gcp/cloudbuild.yaml

This uploads the compiled JAR files to the bucket, however I can not use these directly in cloud functions and have to download them, zip them, and reupload them (this is automaticely done for users in terraform). Ideally I'd like to skip this step and was hoping to be able to directly upload those JAR files zipped.

Feedback on consolidating Coordinator Services

We are seeking feedback on consolidating coordinator services for attribution reporting and other workloads. Please review and comment on the main issue posted on the WICG/protected-auction-services-discussion#69 repository.

Could someone help me validate if I am collecting the reports correctly (attribution-report NODE JS version)

Hello everyone, I'm currently trying to create a version of attribution-reporting in NODE JS so far so good, I managed to complete the entire journey (trigger interactions with creatives, conversion on the final website, generate event and aggregable reports)

But I got to this part where I must store the aggregatable reports before sending them to the aggregation services, I wanted to know if anyone else did this step of collecting the reports in NODE JS

Below is the code responsible for collecting and storing the reports (I took the documentation code written in GO as a reference)

*Spoiler: Each report record I receive generates an .avro file

const avro = require('avsc');

const REPORTS_AVRO_SCHEMA = {
    "name": "AvroAggregatableReport",
    "type": "record",
    "fields": [
        { "name": "payload", "type": "bytes" },
        { "name": "key_id", "type": "string" },
        { "name": "shared_info", "type": "string" }
    ]
};
const RECORD_SCHEMA = avro.Type.forSchema(REPORTS_AVRO_SCHEMA);

const registerAggregateReport = (req, res) => {
    try {
        // const report = req.body;
        // Example to illustrate what the request body would be
        const report = {
            "aggregation_coordinator_origin": "https://publickeyservice.msmt.aws.privacysandboxservices.com",
            "aggregation_service_payloads": [
                {
                    "key_id": "bbe6351f-5619-4c98-84b2-4a74fa1ae254",
                    "payload": "7K9SQLdROKqITmnrkgIDulfEXDAR76XUP4vc6uzxPwDycQql3AhR3dxeXdEw2gbUaIAldnu33RSN4SAFcFFKgDQkvnhFzPoxJjO2Yfw4osJ1S0Odp0smu0rC5k5GuG4oIu9YQofCPNmSD7KRVJ9Y6Lucz3BXoI3RQhpQkO31RDyxVJdBbJ8JiS2KBtu8naUf5Z+/mNNKp39ObsNbo7kQKI0TwyRJDSJKqv42Yi3ctoAhOT0eaaUtMfho67i9XaEtVnh8wB4Mi+nzlAfVsGIavP6aXWDe44IgKZvTS/zEKjI68+nzWkyfdRNOf7jtb2XnoB7k5iM+Yu9Ayk5ic/aT1eA1iPEzLvW/tNLcohne3UL2DefZoTLb5l9aludA7Qlf0g+kW9nuvUSmHBuTjE/fTY5s9uRExHH+b2Hjm2sL9DyrFZUFqcl/KLS+McgOT8I0ZTpPRmr+njW8+4b01Hsc2MpY3KKAn1jUDUE45pGbhj/Gqlb1ikJO9nNKS/nnWJgR7+3P8JEpHC2fkfEase4+vrNxZujWolYfTUxswJpiEZs1+fCOroEyyEY6Zjvx5qLbk+7wMNqCeCltDPA6c8WtAPtMreIUvKbco6XUUzaGSnvWLz6/WJqCxG4hjPOfcYAWXIwSboqvNyBHrRr4H5V7C0unSkIjd0j/GeB3ywgnKEqiihuvZ5PPw+O5aYqJdaR3QEFZtpLj+3Uv4OGn2+CvU1thV0A0H1XViP846Tfmb0jVejN1+ih+VO5cf/7T2TPz6oGO9sa6qitWtll5vhwxVyG3vniCo3xghGnUcHSP5ogfp6qgDGSgsGFqSvdiuOpQU+MG/HrCDUjvce0GoXJP6674UcurGxR9UKAnVwZyKRIj/q9qzUgxhWEFC3ssADMmxhZBs3X+rrAxKfhXD12MfuUluRTCzpCKZ9/YapnJQYjngGx7GIkfW6tw8eSCC8yO41vWyHGRz4nKlgNeQkwYafGPzXqUXjyEyiupMUlmSsU/zT52wdCQYLJbQg7xhNuLebb8qh9LW07jMho4Vo9DBP9l463uqA8hcZnJ"
                }
            ],
            "shared_info": "{\"api\":\"attribution-reporting\",\"attribution_destination\":\"https://cliente.com\",\"report_id\":\"4d82121f-7d62-4fa4-bda4-a70c9e850089\",\"reporting_origin\":\"https://attribution.ads.uol.com.br\",\"scheduled_report_time\":\"1714764978\",\"source_registration_time\":\"0\",\"version\":\"0.1\"}"
        }
          

        report.aggregation_service_payloads.map(payload => {
            const payloadBytes = Buffer.from(payload.payload, 'base64');
            const record = {
                payload: payloadBytes,
                key_id: payload.key_id,
                shared_info: report.shared_info,
            };

            const outputFilename = `./reports/output_reports_${Date.now()}.avro`;
            const encoder = avro.createFileEncoder(outputFilename, RECORD_SCHEMA);
            encoder.write(record);
            encoder.end()
        });
        res.status(200).send('Report received successfully.');
    } catch (e) {
        console.error('Error processing report:', e);
        res.status(400).send('Failed to process report.');
    }
};

module.exports = {
    registerAggregateReport
}

*English is not my native language so take it easy

As the death of third-party cookies is something that will affect everyone, it would be nice to have references in more commonly used languages such as NodeJs, Java, etc., I hope this post can contribute in some way to this

Consider migration from origin to site enrollment for Aggregation service: Feedback requested

Hi all!

We are currently exploring migration from origin enrollment to site enrollment for the Aggregation Service (current form using origin here) for the following reasons:

Consistency with the client API enrollment that uses site
Ease of onboarding for adtechs, so they don't have to enroll each origin individually

As a follow up to this proposal, we would like to support multiple origins in a batch of aggregatable reports. Do adtechs have a preference or blocking concern with either specifying a list of origins or the site in the createJob request?

aggregation-service-artifacts-build CodeBuild built the AMI at the wrong region

Running the step "Building artifacts" from https://github.com/privacysandbox/aggregation-service/blob/main/build-scripts/aws/README.md#building-artifacts
To build the artifacts on region: eu-west-1

The CodeBuild failedwith the below error:

836 | --> amazon-ebs.sample-ami: AMIs were created:
837 | us-east-1: ami-069b14bccedc04571
....
[Container] 2023/05/09 15:34:31 Running command bash build-scripts/aws/set_ami_to_public.sh set_ami_to_public_by_prefix aggregation-service-enclave_$(cat VERSION) $AWS_DEFAULT_REGION $AWS_ACCOUNT_ID
841 |
842 | An error occurred (InvalidAMIID.Malformed) when calling the ModifyImageAttribute operation: Invalid id: "" (expecting "ami-...")
843 |
844 | An error occurred (MissingParameter) when calling the ModifySnapshotAttribute operation: Value () for parameter snapshotId is invalid. Parameter may not be null or empty.
845

The reason is that it created the ami on us-east-1 instead of eu-west-1

Aggregation service release and end-of-support plan proposal: feedback requested

Hi all!

We recently published a proposal for the aggregation service release and end-os-support plan. This plan outlines a standardized cadence for feature releases, in addition to a strategy for patches:
Aggregation service release and end-of-support plan

We're opening this issue to solicit general feedback on the proposal.
cc @hostirosti

How to copy AMI to another region?

In the AWS instructions, there are two options for using the AMI in a region other than us-east-1:

If you like to deploy the aggregation service in a different region you need to copy the released AMI to your account or build it using our provided scripts.

I have been having a lot of trouble building the AMI using the provided scripts, so I would like to try simply copying the AMI (the first option), but I don't see instructions for this. What is the AMI name and where do I get it from? Do I need to change any parameters to point to the new region? What step should I move on to after copying the AMI?

Could you add instructions for copying the AMI and subsequent steps?

Aggregation Service: AWS worker build issue and workaround

Hi Aggregation Service testers,

We have discovered an issue that broke the AWS worker build, caused by an incompatible Docker engine version upgrade. We are planning to release a new patch next week. Meanwhile, if you encounter issues building AWS worker, you can use the following workaround:

Create a new patch at <repo_root>/build_defs/shared_libraries/pin_pkr_docker.patch with the following content:

diff --git a/operator/worker/aws/setup_enclave.sh b/operator/worker/aws/setup_enclave.sh
index e4bd30371..8bf2e0fb1 100644
--- a/operator/worker/aws/setup_enclave.sh
+++ b/operator/worker/aws/setup_enclave.sh
@@ -19,7 +19,7 @@ sudo yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/late
 #
 # Builds enclave image inside the /home/ec2-user directory as part of automatic
 # AMI generation.
-sudo yum install docker -y
+sudo yum install docker-24.0.5-1.amzn2023.0.3 -y
 sudo systemctl enable docker
 sudo systemctl start docker

Add the new patch to list of patches under shared_libraries rules in the WORKSPACE file. The shared_libraries rule should now become:

git_repository(
    name = "shared_libraries",
    patch_args = [
        "-p1",
    ],
    remote = "https://github.com/privacysandbox/coordinator-services-and-shared-libraries",
    patches = [
        "//build_defs/shared_libraries:coordinator.patch",
        "//build_defs/shared_libraries:gcs_storage_client.patch",
        "//build_defs/shared_libraries:dependency_update.patch",
        "//build_defs/shared_libraries:key_cache_ttl.patch",
        "//build_defs/shared_libraries:pin_pkr_docker.patch",
    ],
    tag = COORDINATOR_VERSION,
    workspace_file = "@shared_libraries_workspace//file",
)

Thank you!

Unable to copy AMI image to my AWS account

The documentation states that:

Note: The prebuilt Amazon Machine Image (AMI) for the aggregation service is only available in the us-east-1 region. If you like to deploy the aggregation service in a different region you need to copy the released AMI to your account or build it using our provided scripts.

Ref. https://github.com/privacysandbox/aggregation-service#download-terraform-scripts-and-prebuilt-dependencies

When I try to copy the AMI to my account I'm getting the following error:

Failed to copy ami-036942f537f7a7c2b
You do not have permission to access the storage of this ami

Can you give me some guidance or tell me if it's a configuration error?

Context:

Private Aggregation API - no metrics in summarised report

Hello,

I'm currently experimenting with the Private Aggregation API and I'm struggling to validate that my final output is correct

From my worklet, I perform the following histogram contribution:

privateAggregation.contributeToHistogram({ bucket: BigInt(1369), value: 128 });

Which is correctly triggering a POST request with the following body:

 {
  aggregation_service_payloads: [
    {
      debug_cleartext_payload: 'omRkYXRhgaJldmFsdWVEAAAAgGZidWNrZXRQAAAAAAAAAAAAAAAAAAAFWWlvcGVyYXRpb25paGlzdG9ncmFt',
      key_id: 'bca09245-2ef0-4fdf-a4fa-226306fc2a09',
      payload: 'RVd7QRTTUmPp0i1zBev+4W8lJK8gLIIod6LUjPkfbxCOHsQLBW/jRn642YZ2HYpYkiMK9+PprU5CUi9W7TwJToQ4UXiUbJUgYwliqBFC+aAcwsKJ3Hg46joHZXV5E0ZheeFTqqvLtiJxlVpzFcWd'
    }
  ],
  debug_key: '777',
  shared_info: '{"api":"shared-storage","debug_mode":"enabled","report_id":"aaa889f1-2adc-4796-9e46-c652a08e18ca","reporting_origin":"http://adtech.localhost:3000","scheduled_report_time":"1698074105","version":"0.1"}'
}

I've setup a small node.js server handling requests on /.well-known/private-aggregation/debug/report-shared-storage basically doing this:

  const encoder = avro.createFileEncoder(
    `${REPORT_UPLOAD_PATH}/debug}/aggregation_report_${Date.now()}.avro`,
    reportType
  );

  reportContent.aggregation_service_payloads.forEach((payload) => {
    console.log(
    "Decoded data from debug_cleartext_payload:",
    readDataFromCleartextPayload(payload.debug_cleartext_payload)
    );

    encoder.write({
      payload: convertPayloadToBytes(payload.debug_cleartext_payload),
      key_id: payload.key_id,
      shared_info: reportContent.shared_info,
    });
  });

  encoder.end();

As you can see at this point I'm printing the decoded data on console and I can see as expected:
Decoded data from debug_cleartext_payload: { value: 128, bucket: 1369 }

However, now I'm trying to generate a summary report with the local test tool by running the following command:

java -jar LocalTestingTool_2.0.0.jar --input_data_avro_file aggregation_report_1698071597075.avro --domain_avro_file output_domain.avro --no_noising --json_output --output_directory ./results

No matther what value I've passed as payload of the contributeToHistogram method, I always got 0 on the metric field:

[ {
  "bucket" : "MTM2OQ==", // 1369 base64 encoded
  "metric" : 0
} ]

Am I doing something wrong ?

Apart of this issue, I wonder how it would work in real life application, currently this example is handling one report at a time which is sent instantly because of being in debug_mode, but in real situation, how are we supposed to process a big amount of reports at once ? Can we pass a list of files to the --input_data_avro_file ? Should we batch the reports prior to converting it to avro based on the shared_info data? If yes, based on which field?

Thank you by advance !

Download link for LocalTestingTool does not work

The various links to get the local testing tool do not work (see for instance here https://github.com/privacysandbox/aggregation-service/blob/main/COLLECTING.md#produce-a-summary-report-locally).
Even replacing the {VERSION} by 0.4.0 in the link does not solve the issue.

Thanks a lot!

P.S. I could get the previous release (ie 0.3.0) using the link available before the 0.4.0 release. See the associated diff of the release.

GCP Build container fails to build due to hanging apt-get install

Tried to kick off a build of the build container using the git hash for v2.4.2 and got the error below

I believe its due to a missing "-y" on apt-get install here:
https://github.com/privacysandbox/aggregation-service/blame/22c2a42ea98b88e5dd3451446db2b7a152760274/build-scripts/gcp/build-container/Dockerfile#L63

Google Ldap: evgenyy@ if you want to reach out internally

Step 9/12 : RUN     echo "deb [signed-by=/usr/share/keyrings/cloud.google.asc] https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list &&     curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | tee /usr/share/keyrings/cloud.google.asc &&     apt-get update && apt-get install google-cloud-cli &&     apt-get -y autoclean && apt-get -y autoremove
 ---> Running in e691327d6e48
deb [signed-by=/usr/share/keyrings/cloud.google.asc] https://packages.cloud.google.com/apt cloud-sdk main
�[91m  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    L�[0m�[91meft  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0�[0m�[91m
100  2659  100  2659    0     0  42686      0 --:--:-- --:--:-- --:--:-- 42887
�[0m-----BEGIN PGP PUBLIC KEY BLOCK-----
...
-----END PGP PUBLIC KEY BLOCK-----
Hit:1 https://download.docker.com/linux/debian bookworm InRelease
Hit:2 http://deb.debian.org/debian bookworm InRelease
Hit:3 http://deb.debian.org/debian bookworm-updates InRelease
Get:4 https://packages.cloud.google.com/apt cloud-sdk InRelease [6361 B]
Hit:5 http://deb.debian.org/debian-security bookworm-security InRelease
Get:6 https://packages.cloud.google.com/apt cloud-sdk/main amd64 Packages [629 kB]
Fetched 636 kB in 1s (1239 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  google-cloud-cli-anthoscli
Suggested packages:
  google-cloud-cli-app-engine-java google-cloud-cli-app-engine-python
  google-cloud-cli-pubsub-emulator google-cloud-cli-bigtable-emulator
  google-cloud-cli-datastore-emulator kubectl
The following NEW packages will be installed:
  google-cloud-cli google-cloud-cli-anthoscli
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 106 MB of archives.
After this operation, 609 MB of additional disk space will be used.
Do you want to continue? [Y/n] Abort.
The command '/bin/sh -c echo "deb [signed-by=/usr/share/keyrings/cloud.google.asc] https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list &&     curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | tee /usr/share/keyrings/cloud.google.asc &&     apt-get update && apt-get install google-cloud-cli &&     apt-get -y autoclean && apt-get -y autoremove' returned a non-zero code: 1
ERROR
ERROR: build step 0 "gcr.io/cloud-builders/docker" failed: step exited with non-zero status: 1

AggregationService always returning PRIVACY_BUDGET_EXHAUSTED error after test with "debug_privacy_epsilon"

Hello team, how are you?

Guys, after uploading the AggregationService to the AWS environment, I was carrying out some tests to generate the summary report, with these tests I noticed that the noise was greatly impacting the values of the metrics, given this scenario I implemented scaling the value and defining the epislon, but after adding the epislon definition to /createJob, the AggregationService is always returning the PRIVACY_BUDGET_EXHAUSTED error, regardless of whether they are new reports (new .avro files).

I wanted to see if anyone had any tips on how to identify the source of this error and consequently how I could get around it.

With each test I do, I clean the AWS bucket and create new reports through the interaction flow (click on the ad, conversion page and send the report)
I already tried to remove the epislon definition from /createJob but nothing changed
Despite the message, the test did not contain duplicate files or files that had already been processed previously

CreateJob body Request

{
    "input_data_blob_prefix": "reports_17",
    "input_data_bucket_name": "uolcsm-uolads-aggregate-reports",
    "output_data_blob_prefix": "output/summary_report_25.avro",
    "output_data_bucket_name": "uolcsm-uolads-aggregate-reports",
    "job_parameters": {           
        "debug_privacy_epsilon": "10",     
        "attribution_report_to": "https://attribution.ads.uol.com.br",
        "output_domain_blob_prefix": "output_domain.avro",
        "output_domain_bucket_name": "uolcsm-uolads-aggregate-reports"
    },
    "job_request_id": "test80"
}

AggregationService getJob response

{
    "job_status": "FINISHED",
    "request_received_at": "2024-06-19T20:20:56.246430Z",
    "request_updated_at": "2024-06-19T20:21:13.512574536Z",
    "job_request_id": "test80",
    "input_data_blob_prefix": "reports_17",
    "input_data_bucket_name": "uolcsm-uolads-aggregate-reports",
    "output_data_blob_prefix": "output/summary_report_25.avro",
    "output_data_bucket_name": "uolcsm-uolads-aggregate-reports",
    "postback_url": "",
    "result_info": {
        "return_code": "PRIVACY_BUDGET_EXHAUSTED",
        "return_message": "com.google.aggregate.adtech.worker.exceptions.AggregationJobProcessException: Insufficient privacy budget for one or more aggregatable reports. No aggregatable report can appear in more than one aggregation job. \n com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor.consumePrivacyBudgetUnits(ConcurrentAggregationProcessor.java:472) \n com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor.process(ConcurrentAggregationProcessor.java:329) \n com.google.aggregate.adtech.worker.WorkerPullWorkService.run(WorkerPullWorkService.java:142)",
        "error_summary": {
            "error_counts": [],
            "error_messages": []
        },
        "finished_at": "2024-06-19T20:21:13.503639102Z"
    },
    "job_parameters": {
        "debug_privacy_epsilon": "10",
        "output_domain_bucket_name": "uolcsm-uolads-aggregate-reports",
        "output_domain_blob_prefix": "output_domain.avro",
        "attribution_report_to": "https://attribution.ads.uol.com.br"
    },
    "request_processing_started_at": "2024-06-19T20:21:02.916500310Z"
}

*I was taking a look at NoiseLab and the solution to mitigate the impact of noise would be to scale my values and use an epislon greater than 0

Reports used during testing:

//Histogram 1
[
 {
  "key": "0x29a",
  "value": 10054
 }
]

//Report 1
{
   "aggregation_coordinator_origin": "https://publickeyservice.msmt.aws.privacysandboxservices.com",
   "aggregation_service_payloads": [ {
      "debug_cleartext_payload": "omRkYXRhlKJldmFsdWVEAAAnRmZidWNrZXRQAAAAAAAAAAAAAAAAAAACmqJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAGlvcGVyYXRpb25paGlzdG9ncmFt",
      "key_id": "b16a91ad-6d09-491f-af3e-31faef3d116f",
      "payload": "tQpGPd6LRE1I16SoUk9ifwdyvc9zZAG1Qj1zAQSLhktOi8k1kLcBgYpTwq+OfpLIBpePCxHC1LeXiCdPAgEPWQW6gM6wumBTgSF611YGN+i52IZPYyR1k4F9yGocRg5AE6wjMDxLVr71QyJ7At17Ulje6Jj5rIlByPTAnNUWwDB0T1XuMx9gWn3mhS3M1fTvo1Z77VhnysZJuUdu5Q1BeLX1EUkl1o08tw8bRFPjIinBgVEX1iJRymxeCmfiqV/oZOUf7yU1XUWOEHiVgWQFc9ZLgbja3xLzOL+WlOEXD6Wn1Nu6Gq1LD0U7G450jP+x2wa8fMmHUS9LPmyM/uP8c7dkEHHRV02TuhtR7sepGmnaMfGHJNT/poiEMX1XRF6/3iqhn9o6kyrFuMZ0VPEdrCtN4RIpReQUJXD318jTYPCpxtVUQsZojdU05+IavAUYtVYMepUXV87VBjRVEzCagLYekw9AOGlPOtfVSfGl7DZech+pUHwRA2GPT/W6mnNfWMxq76XsEgO+Xc1Ap1FKNmUZhEgzdP9PFwrYbn9GEwjhvyUxuPk85lF10yqRbrkOSiYH3RHMy80L1uGJIeiAFPvGtHwWNmrfLMKqxNQLlYnJPUjnKgM7jgO0VNDrCJxUuSo8u0WHlGkDK8F7tAlD3W2k45ZkGFPKlzXeaP1mqbpP7YSolyMFHvzwipq8ztGqzGN19BKxQmhlJQLW80B/kjzqK1GV4xzPHWtqq5yctV3OScnfws9RRJVUsoNIXakX2EwXLuapd2AhgXuRp3Ojcg/l9JjxKZliFYA2aMjT1yYfD1JCcB3l7cBYvPb4QKPgYXOXaR6lCSX6ZMVgFnQm7Lk7CYhZTaSuaXDj5j3irIWLnN7aXOMsZ/SDamRdQ30Hm5eFzaANpaENWNO3oil+fRrlIe8bCj5TPPIwZcNdTbJ6564cT2MQBMUiZeqFq7+z61H2E8MJbSuXuXRjMFoc095nM2sIkci9Fyb80VwOPlXHdZN4Azf85taPhGjUOrXrs7X8HGIAouqhEiHjBa0Z6IDBdYNu3f6hSwwn"
   } ],
   "shared_info": "{\"api\":\"attribution-reporting\",\"attribution_destination\":\"https://cliente.com\",\"debug_mode\":\"enabled\",\"report_id\":\"f23783ec-b335-41ff-8885-dc529ccf800f\",\"reporting_origin\":\"https://attribution.ads.uol.com.br\",\"scheduled_report_time\":\"1718829420\",\"source_registration_time\":\"0\",\"version\":\"0.1\"}",
   "source_debug_key": "647775351539539",
   "trigger_debug_key": "647775351539539"
}

//Histogram 2
[
 {
  "key": "0x29a",
  "value": 10074
 }
]

//Report 2
{
   "aggregation_coordinator_origin": "https://publickeyservice.msmt.aws.privacysandboxservices.com",
   "aggregation_service_payloads": [ {
      "debug_cleartext_payload": "omRkYXRhlKJldmFsdWVEAAAnWmZidWNrZXRQAAAAAAAAAAAAAAAAAAACmqJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAGlvcGVyYXRpb25paGlzdG9ncmFt",
      "key_id": "3e9d0be8-6cc6-4e80-862f-b5c1ba34b03d",
      "payload": "3fuou19wP7qL616ylGdTVp26HiFLpylrHIXf5sHF/xs+3prz3S3cQ2J16ojKQkyD9AWOKU+tBYMZJXCT6/jDD8Tv0kP6TjPGAiyOyWQ5ATdw8xsPqCxZuvtVeywxHZEZ65NBgOWm/txLBMMtQLAFHc0s+gdq4pHPem9Q1PHrAtCIO3dUO0xlvSMSIZJzhjPrNzB5XBzfLHXs0ACl533RLbv7SRuxb9njkVE0WIx80OY6jeBiUFT5NHaP610YU+tULCaA6mg7XtQUWh1voWDKplJwYlfPOWoPItMX0v1OS69+Zmc669CUN/hV+PI/meyX9PagbwfXs4Rt3IrQAN5MzaXHSPq5Z/eMsbiOfhCO3lGYxBKo/9KpWG8Qt9BWAALntHqg+y7BItiCC8NyY9pJZ3n9ghul2QdJy1QkDOuFOwquiKQsPmFMY5kQUaN4d0E37PwKT7fmZiDyblIUCBHjaS+jM7eUH3wYCoylm4HeqR9gzA2BdN+Vm+zctUfvYmT2QOLEVKwBMp+UOZ67P5ABOMLV2jUIkx6uoqY9eDngJYk9bRwQbNmqTp54hbdOnjg8eYt6MocXjWUttfg6NKFQTEHkQ125SjDBG+T4angbM91rzZA2gUWqH32yszTAsCVsgRUu00iQTpnjtVdCxIyKKPZShGZiVloJHv7bG2sbvgp8aVQ0Y3xjL38Sea9xUvQDdsgwnVSeYeJUNakmi0Ni0bVdbplVugMuE1SoZ/Xf6Y7klg1peEygBoVC1W7i2Z/VPoaoQ7ctwyqmMXoMXlVcLsdB8Tp1ph6+aJTfZcF8ZwpU0vfq+MRjEO9dp6kqu3WBD5Q+jaAOykFW3wXBN+CQx1jk07dGvSf2JZcwjG62f/JUVPvdpyJEmDaaIfupoMap23BdD82F/9y43prFJwIGDOzLolDae7bJh1q8A2BTDldKg5z4x7Hc85MkpmiAuY7kZ1fyChxsu6COMfucqmbKrzDzZGwpLnnKl/95kk7eTlbr4o7VTCs+cPVHV0GtMTNydUGUgbrvPiuh9AmehHvGavWVx2++oQSpyEvv"
   } ],
   "shared_info": "{\"api\":\"attribution-reporting\",\"attribution_destination\":\"https://cliente.com\",\"debug_mode\":\"enabled\",\"report_id\":\"15cfee94-a102-4215-9367-1e17862d7769\",\"reporting_origin\":\"https://attribution.ads.uol.com.br\",\"scheduled_report_time\":\"1718829459\",\"source_registration_time\":\"0\",\"version\":\"0.1\"}",
   "source_debug_key": "647775351539539",
   "trigger_debug_key": "647775351539539"
}

//Histogram 3 
[
 {
  "key": "0x29a",
  "value": 10029
 }
]

//Report 3
{
   "aggregation_coordinator_origin": "https://publickeyservice.msmt.aws.privacysandboxservices.com",
   "aggregation_service_payloads": [ {
      "debug_cleartext_payload": "omRkYXRhlKJldmFsdWVEAAAnLWZidWNrZXRQAAAAAAAAAAAAAAAAAAACmqJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAGlvcGVyYXRpb25paGlzdG9ncmFt",
      "key_id": "c29773d9-a72b-4147-bd60-3777a3844054",
      "payload": "e3oi8KV4uK8aYUgZIlYFFrZsENSY0gsgcdzYtKECqnFLzGF+VwHsLvbzg5QAAfBhqM9y/6bgCUssbEBfOYwNcSGNP6KTxuCteocfbL7FcpijnsRQ8001oc8Hhcx1efr/h9Kh/14w3HDd74rRMMnNS2L7RIEnGqkcZM6Y4/HGDIK0nd9CptuxZjjxuLs37fnIj2ZRgTrKvVEk8EoE0ro+54cBbfYLMC+SMLJ5NefRmNA29gUnXuzvkOCBc5zp9XvtedyVjZzcatFJooPdY5UJhCYso121vngmk5O4U2KAyf7mOL6vPkhOAf4irC5NUymmSqjHrfCwMZ5aGGJcfSYXwd5xp/MRBXlP9/hgiyg7vLUucul9jvNMZ9RDXlE/pZGpq2u9pOsmlSl2wKh4I7xhv0PQgOwjj5N43YnGvarFC6Js7mVyVyDccnga6u6RdJsfH2i0ObhF2vJenkexycLLykCPS5TZrUkIBolglkq/Z5tiXIbnugh8jrmy4IAd8z/XBK9CgPq+sGmdLG8ZKVKiguylwXTL/Zl0IxdNhl6GIIRkfABaDUSUr7tgCiLWHVAKLMbAyDalluGdfkRwhf7fMnI9fN6YIpq6VJ+H7JXNbdnsP2maL4z+8eGs+Q2eqa8eMkB2qYinPCSQIHLR47Xk9xn63H2EQhko1LcNq9lF0Id3xfvZI5WU/i+lUv4ZpF1XTV/3aciHD8C5gKzgyceFmv65EjiHhOQ19bu9tULuxaROAozrJ9wL5SieFgVb8QqpvshrhhQGu4E0FtnNwMOUrGm1R8so5J0n4vqUM8t2f7GAdqq7LIMXfEatjENV4bdWyJev8IRnQtZPstjLKHFJM+v2QtsLIx2z2nGG0NqgvGlpT1jBOvVYg0hiosbtSAF8zb8fwTPhg7mgG/sNR2d2uVfQ/Ou8APip+BBTv871r2OMk9d6x0Zey61T5E7kP4di3GCXKT2qkUmr4yw1r9RvVkPZi37oWruN2y82DzGsB3XXbFtZ8J3gV0oO8xDx7+vHCQk1AarV88D85s1aV4wb808p06NRtXGE/R5D"
   } ],
   "shared_info": "{\"api\":\"attribution-reporting\",\"attribution_destination\":\"https://cliente.com\",\"debug_mode\":\"enabled\",\"report_id\":\"63e0ef38-d97e-431c-9b79-2abc6ee92793\",\"reporting_origin\":\"https://attribution.ads.uol.com.br\",\"scheduled_report_time\":\"1718829513\",\"source_registration_time\":\"0\",\"version\":\"0.1\"}",
   "source_debug_key": "647775351539539",
   "trigger_debug_key": "647775351539539"
}

//Histogram 4
[
 {
  "key": "0x29a",
  "value": 10096
 }
]

//Report 4
{
   "aggregation_coordinator_origin": "https://publickeyservice.msmt.aws.privacysandboxservices.com",
   "aggregation_service_payloads": [ {
      "debug_cleartext_payload": "omRkYXRhlKJldmFsdWVEAAAncGZidWNrZXRQAAAAAAAAAAAAAAAAAAACmqJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAGlvcGVyYXRpb25paGlzdG9ncmFt",
      "key_id": "71009a63-aa17-45b9-a5a9-49d78067210f",
      "payload": "GvnO9L57AFyTdlejpGgXRmC8AfQG318hEl6K0gOoVWB4GHHNg0h0Gs/LwmhVCzUMq89rB6LuVD8SZ9XXpMFVWbP2XvJMU5gokgOo1TTtAWwvEBFxfcvJZ5hqfLMsQGRb91UrwRhUI+6GyhrDXV76sKS4j3D5OevAfamcdd9fp8SZhfB7umPL08ESXekkEDPHmmblBjrKLLyw0w4/23MMyCGRwnYtdc8uZYR1p0uyE3VoPYYnaRmHUBddsgRsvg+u2cwRho0PxIReHHRgHS50hELkSf0DXJjJHRd5ukOIu15LVBFBQ0VIoMB9RB61PkynjyhuLlqZk0d1Pif+ALQ7jyE4xiYKCgmkq/SowT6EW2v1UmX54KvcDXZx/eWh2rz373ZlGitYa/TtBPNDngM9PYVZrrRANWBjya4nBB6ElPAsVjgjF6+Zo3rkSj3tF0+WxdBOM7jbYbIjPOVoWViKUq6b2b5FMNR4FFYIDsVwNaLLrLBoDot09DFa+1gSHBNcYOyFdsy2sx9Mg/NOa3VeA8W6nTKLZd/KzDq70H7jRcj4wwZCL+gy4s45XBBHRtcaCfcokvEpCr3USvDP7miICzqD9+XL73qiyLpGeTpEZwAVGomX1kLSpTxsjaNcpoO0eNZMl7DVmsd37vjq6z43DqRuhSpigK3nbLK3lMtOVNUbCzItVqyYcSNpIdDiZGGtODPwppGeE1Xp8T0AxXp76PMR/pAUTaglxjmRVAhUZBIPuUKAXrDHCoQyBktqQHWEIBt0QomqcggQPgOTCP4nQ5nTE42FdhevV5DZoZik7UFLEx7WV09/96GBogV91dbkI9pG4717jgn8pebVDxP7xJ7WZRDbOmI4ESCcIXz5YqjKqoL1UFAfeOfLDVALh99LZJayy1igxQFdzeteKmzYmUPUaosPCXdJd552SynGow7mjZbWidLn8p51tdUvPyEtaTq9P4D360zhrld3ZkYYbdclBl7ywgxWFauczrh+oJOgWdm6C2F9rAKXwz43fiI86+bCZEnhoUFnJZihSoGNTrP519rDx0Bf/qw+"
   } ],
   "shared_info": "{\"api\":\"attribution-reporting\",\"attribution_destination\":\"https://cliente.com\",\"debug_mode\":\"enabled\",\"report_id\":\"e09684d5-4634-4ca7-af62-18bb03090f35\",\"reporting_origin\":\"https://attribution.ads.uol.com.br\",\"scheduled_report_time\":\"1718829605\",\"source_registration_time\":\"0\",\"version\":\"0.1\"}",
   "source_debug_key": "647775351539539",
   "trigger_debug_key": "647775351539539"
}

//Histogram 5
[
 {
  "key": "0x29a",
  "value": 10172
 }
]

//Report 5
{
   "aggregation_coordinator_origin": "https://publickeyservice.msmt.aws.privacysandboxservices.com",
   "aggregation_service_payloads": [ {
      "debug_cleartext_payload": "omRkYXRhlKJldmFsdWVEAAAnvGZidWNrZXRQAAAAAAAAAAAAAAAAAAACmqJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAGlvcGVyYXRpb25paGlzdG9ncmFt",
      "key_id": "3e9d0be8-6cc6-4e80-862f-b5c1ba34b03d",
      "payload": "BtI4rdy6Lult3ikxY/QN3CTgPbhWyo1rzKwFn8yTZydUWSs5Kg1k+GW1kzfVQriFM1GWifKPfjyrFsJcEtS00JTpMKDWYuvquywq30KUgZrLowaEW5ajbDMS9+mDsJgEiIVy/Y7l4Zx7j14sdklOF7IJqXj9Di3sSJOFlB5IWuVLim03IB4cvOKu+KZKqkncrXnIliKrhZW3PuUmV3Beb5cHgR6gC1YOo4xHdr5fhM8IMSA4YBRKkp02Gxc1bv/B8PGDBkPFKwH+xtFWBw9myRja/ExvgNet7QTYReiOiKXsJom3iT8f2bObAQ2Hi4EGdrXSxxYUVsLPsFElRkGIy//mJyCDTqC/ItM3EpdHtJySADaeBp3viDilUyAWbDceNMtQeKyeQe9IWEBYClHAgFS8lqsBp/Lm/d2i9OieDTyUb+tIzIXkNpdEB7Zftfkr2ovo66rhFI/JMheoU29t/6mQHVUMJb+unaUzdgjT3E6CWggqsVBiICueW48N2sUQkX9VSSJmlYTlltI/I+I1AOTPgMI9zlaB9+L5qUzTYREanw+DuGdbM/2eqvQ8mxHEKvOJJ2XRSG7nQVLcWsMSh366Rdl830sv6NGyTCrZBaELV2mxIp7kcr2xx3xxSVJRIEG9L5wAgbc33HQmC+8x7i3I/+PvKHLx9RDaMw9yaCmEXH3B57no87kvOqIXLHULBt/mZ9346TaYChs3smxYzjErtCJI/CKFeWYIFpD3Ix0p/M96+qVpIp2xmPghXjhM+OGFd0ieQDGofgvN1S1qAccw5tXyyAc+3S0EaNkXeEE/Y4WN4lgxpbI06Qbk6X/ltXsm7xHILLmGOv4ic/O3/tCNWRQey8DawjmCCNINiHWo+QHiry3fjMkCKpiHKnxq4wJxGjSHh445prf7KZSVGCX2JSnkhXpZlJWsyN2zF3Tv3LRU/eOSqfiE9xuIccQsSJaK97wapgkvEmer+UecD7aWaPKDid4KnUiPIyTS9lWr3YzjOIhSlp4tYUWvdjsTkKKivQRFWFcQsRRMidINa0f7YeJCk6sc+7T2"
   } ],
   "shared_info": "{\"api\":\"attribution-reporting\",\"attribution_destination\":\"https://cliente.com\",\"debug_mode\":\"enabled\",\"report_id\":\"4b597fd1-b455-4b2f-8d13-591aa894efa8\",\"reporting_origin\":\"https://attribution.ads.uol.com.br\",\"scheduled_report_time\":\"1718829826\",\"source_registration_time\":\"0\",\"version\":\"0.1\"}",
   "source_debug_key": "647775351539539",
   "trigger_debug_key": "647775351539539"
}

Thanks in advance

Mismatch between API response and specification for `debug_privacy_epsilon` field

The response I get from the getJob API doesn't include debug_privacy_epsilon as a double but a string.
e.g.

{
    ...
    "job_parameters": {
        "debug_privacy_epsilon": "64.0",
        ...
    }
    ...
}

The API specifications in https://github.com/privacysandbox/aggregation-service/blob/main/docs/api.md state that we should expect a double value. It would be helpful if either the specifications or the API response is changed to match the other.

Aggregation service job showed "Service Error"

Hi, I was testing the Aggregation Service deployed in AWS (version 2.5.0). I met an error shows:

"result_info": {
        "return_code": "REPORTS_WITH_ERRORS_EXCEEDED_THRESHOLD",
        "return_message": "Aggregation job failed early because the number of reports excluded from aggregation exceeded threshold.",
        "error_summary": {
            "error_counts": [
                {
                    "category": "SERVICE_ERROR",
                    "count": 1,
                    "description": "Internal error occurred during operation."
                },
                {
                    "category": "NUM_REPORTS_WITH_ERRORS",
                    "count": 1,
                    "description": "Total number of reports that had an error. These reports were not considered in aggregation. See additional error messages for details on specific reasons."
                }
            ],
            "error_messages": []
        },
        "finished_at": "2024-07-01T22:02:32.089351102Z"
    },

I was testing only 1 reports in the batch, tried local-testing with the cleartext version, which works fine.

Without the additional information, I wasn't able to identify the problem.

By any chance I am not the only one who met this problem? What else should I check? Thanks

Job status is always RECEIVED (Terraform AWS)

Hello team, how are you?

Guys, we managed to implement the aggregationService through Terraform AWS, I made my Job request but it always has the status of RECEIVED (more than a day ago), I wanted to see if you had any tips that would help to understand what is happening and why I don't at least have an ERROR status

*I don't know if it could be related but I only managed to hit the API by removing authentication from the paths (It was the solution found by our infrastructure team)

*I saw that there was a similar issue with the GCP environment but initially I couldn't make a connection with what we have in AWS
#53

Thanks a lot

Unable to test noise addition using the local aggregation service tool

Hi, I work in the Google Ad Traffic Quality Team. I am using the local aggregation service tool to simulate noise on locally generated aggregatable reports. However, due to the contribution budget limits, I am unable to create multiple aggregatable reports that will correctly represent my data. What is the best way for me to test this locally, can I manually create an aggregatable report with very high values (corresponding to a raw summary report) for testing?

Error building AMI: VPCIdNotSpecified: No default VPC for this user

I am trying to follow the instructions to build the AMI because I want it in a different region than us-east-1.

But when I run

aws codebuild start-build --project-name aggregation-service-artifacts-build --region us-west-2

I get this error:

Build 'amazon-ebs.sample-ami' errored after 936 milliseconds 511 microseconds: VPCIdNotSpecified: No default VPC for this user
status code: 400, request id: fffa8013-121f-4855-a665-70e36030a4e7x

Questions

Is having a default VPC a prerequisite for this build to work?
Is it possible to get the build to work using another VPC, or is using the defaut VPC the recommended way?

privacysandbox / aggregation-service Goto Github PK

aggregation-service's Issues

Full architecture diagram

Resource naming

Resource costs

Cloud Function / Run

Known errors and solutions

Some missing configuration

Show data conversion flows

Attempt 1: Splitting both of reports and domains

AWS setup

Results

Attempt 2: 100 batches with a unique domain

Load test conclusions

amazon-ebs.sample-ami: Loaded plugins: extras_suggestions, langpacks, priorities, update-motd

Recommend Projects

Recommend Topics

Recommend Org