privacysandbox / aggregation-service Goto Github PK
View Code? Open in Web Editor NEWThis repository contains instructions and scripts to set up and test the Privacy Sandbox Aggregation Service
License: Apache License 2.0
This repository contains instructions and scripts to set up and test the Privacy Sandbox Aggregation Service
License: Apache License 2.0
When running the terraform code on step https://github.com/privacysandbox/aggregation-service/blob/main/build-scripts/aws/README.md#configure-codebuild-setup
I got the following error:
│ Error: error creating S3 bucket ACL for aggregation-service-artifacts: AccessControlListNotSupported: The bucket does not allow ACLs
To resolve this error i had to add to: build-scripts/aws/terraform/codebuild.tf
The following resource:
resource "aws_s3_bucket_ownership_controls" "artifacts_output_ownership_controls" {
bucket = aws_s3_bucket.artifacts_output.id
rule {
object_ownership = "BucketOwnerEnforced"
}
}
I have the aggregation service set up, but our system to produce encrypted reports is not ready to go yet. This repo's sampledata directory has a sample report, but it is unencrypted and so only works with the local testing tool only, not with AWS Nitro Enclaves.
Could you provide, either in the repo or in a zip file in this thread, an encrypted sample output.avro and accompanying domain.avrp that we can use to test our AWS aggregation service to make sure everything is running properly?
Hi team,
I’m trying to set up our deployment environment. But I encountered this error. Could you please help to look at it ? Thanks a lot !!!
These are the roles of our service accounts. Do I need to add some additional role permissions?
our projectId: ecs-1709881683838
Error: Error creating function: googleapi: Error 403: Could not create Cloud Run service dev-us-west2-worker-scale-in. Permission ‘iam.serviceaccounts.actAs’ denied on service account [worker-sa-aggregation-service@microsites-sa.iam.gserviceaccount.com](mailto:worker-sa-aggregation-service@microsites-sa.iam.gserviceaccount.com) (or it may not exist).
│
│ with module.job_service.module.autoscaling.google_cloudfunctions2_function.worker_scale_in_cloudfunction,
│ on ../../coordinator-services-and-shared-libraries/operator/terraform/gcp/modules/autoscaling/workerscalein.tf line 35, in resource “google_cloudfunctions2_function” “worker_scale_in_cloudfunction”:
│ 35: resource “google_cloudfunctions2_function” “worker_scale_in_cloudfunction” {
│
╵
╷
│ Error: Error creating function: googleapi: Error 403: Could not create Cloud Run service dev-us-west2-frontend-service. Permission ‘iam.serviceaccounts.actAs’ denied on service account [[email protected]](mailto:[email protected]) (or it may not exist).
│
│ with module.job_service.module.frontend.google_cloudfunctions2_function.frontend_service_cloudfunction,
│ on ../../coordinator-services-and-shared-libraries/operator/terraform/gcp/modules/frontend/main.tf line 43, in resource “google_cloudfunctions2_function” “frontend_service_cloudfunction”:
│ 43: resource “google_cloudfunctions2_function” “frontend_service_cloudfunction” {
│
╵
╷
│ Error: Error creating instance template: googleapi: Error 409: The resource ‘projects/ecs-1709881683838/global/instanceTemplates/dev-collector’ already exists, alreadyExists
│
│ with module.job_service.module.worker.google_compute_instance_template.collector,
│ on ../../coordinator-services-and-shared-libraries/operator/terraform/gcp/modules/worker/collector.tf line 49, in resource “google_compute_instance_template” “collector”:
│ 49: resource “google_compute_instance_template” “collector” {
Hi aggregation-service team,
I'm really confused about the file "output_domain.avro" used for producing a summary report locally. In your nodejs example(code), how can I generate a "output_domain.avro" for the aggregation report ?
Here is your sample doc: https://github.com/privacysandbox/aggregation-service/blob/main/docs/collecting.md#collecting-and-batching-aggregatable-reports
{
"bucket": "\u0005Y"
}
Will this "output_domain.avro" work for your nodejs example ?
If convenient, could you explain what this domain file is generated according to ? Thanks a lot !!
The sample provided here is using an out of date shared_info which also doesn't contain a version.
Better to use the one from the sampledata dir - here is the plaintext
"{\"api\":\"attribution-reporting\",\"version\":\"0.1\",\"scheduled_report_time\":1698872400.000000000,\"reporting_origin\":\"http://adtech.localhost:3000\",\"source_registration_time\":1698796800.000000000,\"attribution_destination\":\"dest.com\",\"debug_mode\":\"enabled\",\"report_id\":\"b360383a-108d-4ae3-96bd-aecde1c3c30b\"}"
Which has an allowed version, an actual 'api' key and also has attribution_destination moved within shared_info.
Hi aggregation service team, we(Adform) are facing issues Privacy budget Exhaustion issue due to duplicate reports. We are following the batching criteria
mentioned at
and
Based on the above rules, we tried to reverse engineer the batch data to check if we do have any duplicate reports across all our batch data but we couldn't find any .
We also looked at #35 and cross verified our assumption with the code as well.
Is there any other way we can have more debug information as to across which batches we have these duplicate reports with the same key.
Can you please provide any information on how to proceed with debugging this issue.
The way the browser and adtech's servers interact over the network makes it inherently unavoidable that some reports will be received by the adtech but not considered as such by the browser (e.g. when a timeout happens) and hence retried and received several times by the adtech; as is mentioned in your documentation:
The browser is free to utilize techniques like retries to minimize data loss.
Sometimes, these duplicate reports reach upwards of hundreds of reports each day, for several days (sometimes several months) in a row, all having the same report_id.
The aggregation service runs the no-duplicates rule basing itself on a combination of information:
Instead, each aggregatable report will be assigned a shared ID. This ID is generated from the combined data points: API version, reporting origin, destination site, source registration time and scheduled report time. These data points come from the report's shared_info field.
The aggregation service will enforce that all aggregatable reports with the same ID must be included in the same batch. Conversely, if more than one batch is submitted with the same ID, only one batch will be accepted for aggregation and the others will be rejected.
As an adtech company, when trying to provide timely reporting to clients, it is paramount to try and use all of the available information (in this case, reports) in order to have our reporting be as precise as possible.
In this scenario, however, if we try to batch together all of our reports for a chosen client on a chosen day, even by deduplicating all of the chosen day's reports through the report_id
(or the overall shared_info
) field, we may have a batch accepted on day 1, and then all subsequent batches for the next month be rejected because they all contain that same shared_info
-based id.
This means that we have to check further back in the data for possible duplicate reports. To be able to implement this check in an efficient manner we would benefit from a more precise description of the retry policy, namely for how long the retries can happen.
I guess the questions this issue raises are as follows:
In the instructions for building the AMI (Building aggregation service artifacts), part of the instructions is to put a github_personal_access_token in codebuild.auto.tfvars.
Can you provide more information on this token?
Hello,
While executing /createJob request with following payload
Please see an example below:
{ "job_request_id": "Job-1010", "input_data_blob_prefix": "reports/inputs/input.avro", "input_data_bucket_name": "test-android-sandbox", "output_data_blob_prefix": "reports/output/result_1.avro", "output_data_bucket_name": "test-android-sandbox", "job_parameters": { "output_domain_blob_prefix": "reports/domains/domain.avro", "output_domain_bucket_name": "test-android-sandbox", "debug_privacy_epsilon": 30 } }
The response of this request will be 202
When executing /getJob?job_request_id=Job-1010
{ "job_status": "IN_PROGRESS", "request_received_at": "2023-06-12T15:14:17.891601Z", "request_updated_at": "2023-06-12T15:14:23.222830Z", "job_request_id": "Job-1010", "input_data_blob_prefix": "reports/inputs/input.avro", "input_data_bucket_name": "test-android-sandbox", "output_data_blob_prefix": "reports/output/result_1.avro", "output_data_bucket_name": "test-android-sandbox", "postback_url": "", "result_info": { "return_code": "", "return_message": "", "error_summary": { "error_counts": [], "error_messages": [ "Missing required properties: jobKey" ] }, "finished_at": "1970-01-01T00:00:00Z" }, "job_parameters": { "debug_privacy_epsilon": "30", "output_domain_bucket_name": "test-android-sandbox", "output_domain_blob_prefix": "reports/domains/domain.avro" }, "request_processing_started_at": "2023-06-12T15:14:23.133071Z" }
The error is Missing required properties: jobKey
The job stays in status IN_PROGRESS
When running same /createJob request without the job_request_id property -
the response from /createJob will be:
{ "code": 3, "message": "Missing required properties: jobRequestId\r\n in: {\n \"input_data_blob_prefix\": \"reports/inputs/input.avro\",\n \"input_data_bucket_name\": \"test-android-sandbox\",\n \"output_data_blob_prefix\": \"reports/output/result_1.avro\",\n \"output_data_bucket_name\": \"test-android-sandbox\",\n \"job_parameters\": {\n \"output_domain_blob_prefix\": \"reports/domains/domain.avro\",\n \"output_domain_bucket_name\": \"test-android-sandbox\"\n }\n}", "details": [ { "reason": "JSON_ERROR", "domain": "", "metadata": {} } ] }
Hello,
One interesting evolution of the aggregation service would be to enable querying aggregate of keys. I think this was mentioned in the aggregate attribution API at a time when the aggregation was supposed to be performed by MPC rather than TEEs.
In other words, I would love to be able to query a bit mask (eg for a 8 bit key, 01100*01 would be 01100101 and 01100001).
This would enable a greater flexibility for decoding (ie chosing which encoded variables to get depending on the number of reports), and negate the need to adapt the encoding depending on the expected traffic to the destination website.
Thanks!
P.S. I can cross-post on https://github.com/WICG/attribution-reporting-api if needed.
For Aggregation Service releases (e.g. Aggregation Service v2.0.0 ), can a more complete set of binaries be published? The use-case is to enable adtechs to more easily customize and build Aggregation Service AMI images to meet adtech deployment requirements.
For Aggregation Service v2.0.0 this set would include:
I got an api gateway error The API with ID my-api-id doesn’t include a route with path /* having an integration arn:aws:lambda:us-east-1:my-aws-account-id:function:stg-create-job.
on aws console, after deploying aggregation service using terraform.
I changed Source ARN of lambda's permisson from arn:aws:execute-api:us-east-1:my-aws-account-id:my-api-id/*/**
to arn:aws:execute-api:us-east-1:my-aws-account-id:my-api-id/*/*/v1alpha/getJob
, and it solved the error.
https://github.com/privacysandbox/control-plane-shared-libraries/blob/9efe5591acc18e46263399d9785432a146d9675c/operator/terraform/aws/modules/frontend/api_gateway.tf#L62
Hi,
The Aggregation service team is looking for your feedback to improve debugging support in the service.
Adtech can already get metrics for their jobs (status, errors, execution time etc.) from the Cloud metadata (DynamoDb in AWS and Spanner on GCP).
We are exploring other metrics, traces and logs that can provide a better understanding of the job processing within the Trusted Execution Environment without impacting privacy. We are considering providing CPU and memory metrics and total execution time traces for the adtech deployment and will benefit from your feedback on other metrics that adtech may find useful.
We are also considering adding useful logs which can give information about the job processing for debugging purposes such as ‘Job at data reading stage’ etc.. This is subject to review and approval considering user privacy.
Your inputs will be reviewed by the Privacy Sandbox team. We welcome any feedback on debugging Aggregation Service jobs.
Thank you!
We are working on adding the possibility to generate debug summary reports from encrypted aggregatable reports with the AWS based aggregation service. This capability will be time-limited and be phased out at a later time.
We would like to hear from you what capabilities you'd like to see in these debug summary reports.
Some ideas we are considering:
epsilon
output domain
with an annotation hinting to the omissionQuestions:
Hi,
I have managed to get the full flow running to aggregate debug reports in the browser and process them locally with the provided tool.
The final file output I have is:
[{"bucket": "d0ZHnRzgTJMAAAAAAAAAAA==", "metric": 195000}]
Which looks correct in terms of there should be a single key and the metric value is correct.
The issue I have is now decoding this bucket
to get my original input data, I assumed the steps would be:
But this causes the following error:
_cbor2.CBORDecodeEOF: premature end of stream (expected to read 23 bytes, got 15 instead)
Would really appreciate any help on how to get the input data back out of this bucket.
Best,
D
Hello!
I am following the guide outlined here: https://github.com/privacysandbox/aggregation-service/blob/main/docs/gcp-aggregation-service.md#adtech-setup-terraform
And I am now at the stage where I am trying to deploy the individual environments:
GOOGLE_IMPERSONATE_SERVICE_ACCOUNT="aggregation-service-deploy-sa@ag-edgekit-prod.iam.gserviceaccount.com" terraform plan
However I am faced with this error:
╷
│ Error: invalid value for member (IAM members must have one of the values outlined here: https://cloud.google.com/billing/docs/reference/rest/v1/Policy#Binding)
│
│ with module.job_service.module.autoscaling.google_cloud_run_service_iam_member.worker_scale_in_sched_iam,
│ on ../../coordinator-services-and-shared-libraries/operator/terraform/gcp/modules/autoscaling/workerscalein.tf line 104, in resource "google_cloud_run_service_iam_member" "worker_scale_in_sched_iam":
│ 104: member = "serviceAccount:${var.worker_service_account}"
│
╵
╷
│ Error: invalid value for member (IAM members must have one of the values outlined here: https://cloud.google.com/billing/docs/reference/rest/v1/Policy#Binding)
│
│ with module.job_service.module.worker.google_spanner_database_iam_member.worker_jobmetadatadb_iam,
│ on ../../coordinator-services-and-shared-libraries/operator/terraform/gcp/modules/worker/main.tf line 98, in resource "google_spanner_database_iam_member" "worker_jobmetadatadb_iam":
│ 98: member = "serviceAccount:${local.worker_service_account_email}"
│
╵
╷
│ Error: invalid value for member (IAM members must have one of the values outlined here: https://cloud.google.com/billing/docs/reference/rest/v1/Policy#Binding)
│
│ with module.job_service.module.worker.google_pubsub_subscription_iam_member.worker_jobqueue_iam,
│ on ../../coordinator-services-and-shared-libraries/operator/terraform/gcp/modules/worker/main.tf line 104, in resource "google_pubsub_subscription_iam_member" "worker_jobqueue_iam":
│ 104: member = "serviceAccount:${local.worker_service_account_email}"
│
╵
I am new to terraform and have not been able to find a way to log the value of serviceAccount:${var.worker_service_account}
& serviceAccount:${local.worker_service_account_email}
.
Any help here would be greatly appreciated!
EDIT: The below seems to show that TF state does correctly store the two service accounts created in the adtech_setup
step.
terraform state show 'module.adtech_setup.google_service_account.deploy_service_account[0]'
# module.adtech_setup.google_service_account.deploy_service_account[0]:
resource "google_service_account" "deploy_service_account" {
account_id = "aggregation-service-deploy-sa"
disabled = false
display_name = "Deploy Service Account"
email = "aggregation-service-deploy-sa@ag-edgekit-prod.iam.gserviceaccount.com"
id = "projects/ag-edgekit-prod/serviceAccounts/aggregation-service-deploy-sa@ag-edgekit-prod.iam.gserviceaccount.com"
member = "serviceAccount:aggregation-service-deploy-sa@ag-edgekit-prod.iam.gserviceaccount.com"
name = "projects/ag-edgekit-prod/serviceAccounts/aggregation-service-deploy-sa@ag-edgekit-prod.iam.gserviceaccount.com"
project = "ag-edgekit-prod"
unique_id = "106307936135287037408"
}
Hi guys, does anyone here have any strategy on how to generate the output_domain.avro when the values that make up the bucket are dynamic (example: a creative id)? It's just that we are implementing attribution-reporting in our company and our report key is not fixed as we already use the ID of the creatives, this means that we cannot map the keys (the volume of creatives we have would be very large)
Example do código que thet define the value of the keys (bucket):
const registerSource = (req, res) => {
if (req.headers['attribution-reporting-eligible']) {
let SOURCE_PARAMS = {
source_event_id: Date.now().toString(),
destination: req.query.destination,
expiry: 2592000,
event_report_window: 3600,
priority: "0",
aggregation_keys: { // Defining the value of the keys (bucket)
creativeId: Utils.toHex(req.query.creativeId),
lineItemId: Utils.toHex(req.query.lineItemId),
pixelId: Utils.toHex(0),
},
aggregatable_report_window: "86400",
filter_data: {
creativeId: [`${req.query.creativeId}`],
lineItemId: [`${req.query.lineItemId}`]
},
debug_key: "260893",
}
res.set('Attribution-Reporting-Register-Source', JSON.stringify(SOURCE_PARAMS));
res.status(200).send('OK');
} else {
res.statusCode = 400;
res.end('Invalid request');
}
}
Hello,
Currently, the aggregation service does a sum of the values on the set of keys which is declared in the output domain files. This explicit declaration of keys mean that the encoding must be well-done at report creation time (eg on the source and trigger side for ARA or in Shared Storage for Private Aggregation API). This is quite inflexible in its use.
To bring in some flexibility, I propose to add a system to the aggregation service where a predeclared set of keys would be summed by the aggregation service. This set of keys would constitute a partition of the key space for the service not to violate the DP limit. A simple check done by the aggregation service could reject the query if a key is in two sets.
Here is what the output domain file would look like. I am not sure "super bucket" is a great name, but this is the only I could think of right now.
Super bucket | Bucket |
---|---|
0x123 | 0x456 |
0x123 | 0x789 |
0x124 | 0xaef |
0x125 | 0x12e |
The aggregation service would provide the output only on the "super buckets".
The operational benefits of this added flexibility would be huge. Currently, one has to decide on an encoding before knowing what one can measure. For ARA or PAA for Fledge, this means having a very good idea before hand of the size and the performance of the campaign. When the campaign is running, then adjustment have to be made if the volume estimate was not good (or if the settings of the campaign are changed). Encoding change can be difficult to track, especially in ARA where sources and triggers both contribute to the keys, but at different point in time. This proposal allows to have a fixed encoding, and adjust after the fact (using the volume of reports as a proxy) the encoding actually used.
When an aggregatable report is created by sendHistogramReport() (i.e. called inside reportWin function) it contains shared info without attribution_destination nor source_registration_time. This seems to be logical as these keys are strictly related with attribution logic. Example:
"shared_info": "{\"api\":\"fledge\",\"debug_mode\":\"enabled\",\"report_id\":\"9ae1a0d0-8cf5-4951-b752-e932bf0f7705\",\"reporting_origin\":\"https://fledge-eu.creativecdn.com\",\"scheduled_report_time\":\"1668771714\",\"version\":\"0.1\"}"
More readable form:
{
"api": "fledge",
"debug_mode": "enabled",
"report_id": "9ae1a0d0-8cf5-4951-b752-e932bf0f7705",
"reporting_origin": "https://fledge-eu.creativecdn.com",
"scheduled_report_time": "1668771714",
"version": "0.1"
}
(note: version 0.1, values for: privacy_budget_key, attribution_destination, source_registration_time are missing)
In the same time Aggregation service expect to have both attribution_destination and source_registration_time for shared info.version==0.1 (since aggregation service version 0.4):
see SharedInfo.getPrivacyBudgetKey()
Tested on chrome:
The following exception were printed:
CustomMetric{nameSpace=scp/worker, name=WorkerJobError, value=1.0, unit=Count, labels={Type=JobHandlingError}}
2022-11-22 09:10:54:120 +0100 [WorkerPullWorkService] ERROR com.google.aggregate.adtech.worker.WorkerPullWorkService - Exception occurred in worker
com.google.aggregate.adtech.worker.JobProcessor$AggregationJobProcessException: java.util.concurrent.ExecutionException: java.util.NoSuchElementException: No value present
at com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor.process(ConcurrentAggregationProcessor.java:400)
at com.google.aggregate.adtech.worker.WorkerPullWorkService.run(WorkerPullWorkService.java:145)
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(AbstractExecutionThreadService.java:67)
at com.google.common.util.concurrent.Callables.lambda$threadRenaming$3(Callables.java:103)
at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: java.util.concurrent.ExecutionException: java.util.NoSuchElementException: No value present
at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:588)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:567)
at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:113)
at com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor.process(ConcurrentAggregationProcessor.java:295)
... 4 more
Caused by: java.util.NoSuchElementException: No value present
at java.base/java.util.Optional.get(Optional.java:143)
at com.google.aggregate.adtech.worker.model.SharedInfo.getPrivacyBudgetKey(SharedInfo.java:161)
at com.google.aggregate.adtech.worker.aggregation.engine.AggregationEngine.accept(AggregationEngine.java:88)
at com.google.aggregate.adtech.worker.aggregation.engine.AggregationEngine.accept(AggregationEngine.java:49)
When specifying "enable_user_provided_vpc = true", creation of the environment following the instructions at https://github.com/privacysandbox/aggregation-service/tree/main#set-up-your-deployment-environment
fails with error:
Out of index vpc[0], 182: dynamodb_vpc_endpoint_id = module.vpc[0].dynamodb_vpc_endpoint_id
At file: terraform/aws/applications/operator-service/main.tf
Lines 182 & 183 refers to module.vpc[0]
While module.vpc is not set when "enable_user_provided_vpc = true"
module "vpc" {
count = var.enable_user_provided_vpc ? 0 : 1
Hi team,
We enrolled https://ebayadservices.com/ as our production environment a few weeks ago, and confirmed with your team that it was completed.
and we noticed this sentence on your document: Your staging, beta, QA and test environments will be automatically enrolled if they use the same site as your production environment.
However, when we use https://staging.ebayadservices.com/ to do the tests, the job failed to pass the authorization. Could you please help to investigate this issue?
BTW, because our company’s testing environment does not allow external sites, we use a internal proxy to access https://staging.ebayadservices.com/. I don't know if this is the root cause of this issue.
The avro files:
report avro
domain avro
The api response are as followed.
{
"job_status": "FINISHED",
"request_received_at": "2024-05-22T02:15:53.301731Z",
"request_updated_at": "2024-05-22T02:16:02.790976116Z",
"job_request_id": "test11",
"input_data_blob_prefix": "output/output_regular_reports_2024-05-21T19:12:54-07:00.avro",
"input_data_bucket_name": "tracking_tf_state_bucket",
"output_data_blob_prefix": "output/summary_report.avro",
"output_data_bucket_name": "tracking_tf_state_bucket",
"postback_url": "",
"result_info": {
"return_code": "PRIVACY_BUDGET_AUTHORIZATION_ERROR",
"return_message": "com.google.aggregate.adtech.worker.exceptions.AggregationJobProcessException: Aggregation service is not authorized to call privacy budget service. This could happen if the createJob API job_paramaters.attribution_report_to does not match the one registered at enrollment. Please verify and contact support if needed. \n com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor.consumePrivacyBudgetUnits(ConcurrentAggregationProcessor.java:451) \n com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor.process(ConcurrentAggregationProcessor.java:329) \n com.google.aggregate.adtech.worker.WorkerPullWorkService.run(WorkerPullWorkService.java:142)\nThe root cause is: com.google.scp.operator.cpio.distributedprivacybudgetclient.TransactionEngine$TransactionEngineException: PRIVACY_BUDGET_CLIENT_UNAUTHORIZED \n com.google.scp.operator.cpio.distributedprivacybudgetclient.TransactionEngineImpl.createTransactionEngineException(TransactionEngineImpl.java:203) \n com.google.scp.operator.cpio.distributedprivacybudgetclient.TransactionEngineImpl.proceedToNextPhase(TransactionEngineImpl.java:67) \n com.google.scp.operator.cpio.distributedprivacybudgetclient.TransactionEngineImpl.executeDistributedPhase(TransactionEngineImpl.java:196)",
"error_summary": {
"error_counts": [],
"error_messages": []
},
"finished_at": "2024-05-22T02:16:02.778068915Z"
},
"job_parameters": {
"output_domain_blob_prefix": "domain/output_local_domain.avro",
"output_domain_bucket_name": "tracking_tf_state_bucket",
"attribution_report_to": "https://staging.ebayadservices.com"
},
"request_processing_started_at": "2024-05-22T02:15:55.674601807Z"
}
I am able to trigger the aggregation job with /createJob endpoint deployed via terraform in aws. While running the /getJob with the request id, I am getting below error:
"result_info": { "return_code": "REPORTS_WITH_ERRORS_EXCEEDED_THRESHOLD", "return_message": "Aggregation job failed early because the number of reports excluded from aggregation exceeded threshold.", "error_summary": { "error_counts": [ { "category": "DECRYPTION_KEY_NOT_FOUND", "count": 1, "description": "Could not find decryption key on private key endpoint." }, { "category": "NUM_REPORTS_WITH_ERRORS", "count": 1, "description": "Total number of reports that had an error. These reports were not considered in aggregation. See additional error messages for details on specific reasons." } ], "error_messages": [] }, "finished_at": "2024-05-0
I could see @ydennisy also had similar issue but could not find the solution for it.
Hello All!
Having spent the past few days on trying to get the AS live, I have been jotting down various questions, suggestions & bugs which I think could be a great addition to the documentation and workflow.
Maybe for those who use terraform in their project this is not required, but we do not use terraform and essentially followed the instructions to get all the resources built. I have since had to traverse the GCP console to try and understand what the scripts created. A high level overview diagram with the main data flows, table names etc would be extremely useful.
Similar to the point above, the terraform scripts are spread over many files so it is not clear exactly what will be created. I think it would be great to have a single file config showing all the names of the resources as they are very obscure in the context of our overall infra, for example prod-jobmd
, is a name of a newly created Cloud Spanner instance, which is a pretty unhelpful name. At the very least everything should be prefixed with aggregation-service
, or even better allow users to transparently set this as a first step.
It would be good to have an understanding of the cost of the full set up at idle, and maybe have some suggestions for development and staging setups which can minimise costs by using more serverless infra for example.
I would suggest to drop the use of cloud functions and migrate fully to cloud run, the docs seems to use these interchangeable and although they sort of are (gen2 functions are powered by cloud run), I think this can cause extra confusion. There is also a small typo on the endpoint:
This is the value in the docs
https://<environment>-<region>-frontend-service-<cloud-funtion-id>-uc.a.run.app/v1alpha/createJob
But -uc.
was -ew.
in my case, so this does not seem to a value which can be hardcoded in the docs in this manner.
Running the jobs stores a nice error in the DB, which is awesome! But even with this nice error it would be great to have a document to show common errors and their solutions. For example my latest error is:
{"errorSummary":{"errorCounts":[{"category":"DECRYPTION_KEY_NOT_FOUND","count":"445","description":"Could not find decryption key on private key endpoint."},{"category":"NUM_REPORTS_WITH_ERRORS","count":"445","description":"Total number of reports that had an error. These reports were not considered in aggregation. See additional error messages for details on specific reasons."}]},"finishedAt":"2024-04-30T13:17:24.233681575Z","returnCode":"REPORTS_WITH_ERRORS_EXCEEDED_THRESHOLD","returnMessage":"Aggregation job failed early because the number of reports excluded from aggregation exceeded threshold."}
Which is very clear - but still does not leave me any paths open to try and rectify the issue apart from troubling people over email or in this repo :)
This was addressed in #48 but needs to be added to the repo.
There are quite a few flows in which data must be converted from one format to another, for example some hashed string into a byte array, whilst it is possible to figure this out given some disparate pieces of information available in the repository it would be very useful to have a few examples for various platforms, eg:
-- Convert hashes to domain avro for processing.
CAST(FROM_HEX(SUBSTR(reports.hashed_key, 3)) AS BYTES) AS bucket
I hope you do not mind if I keep updating this issue as I hopeful near completion of getting the service up!
All the best!
D
I am trying to process a job without an output domain. I found the domain_optional flag in the AggregationWorkerArgs class (link). I can’t set this flag as a JobParameter. Can you guide me on how to set the flag?
Hi team,
Our aggregation service is running successfully now, and we plan to daily use it when our app releases.
But before the release, we have to do some stress tests on it. To simulate real business scenarios, we need to mock 400k aggregatable reports for aggregation service to decrypt. Is there any convenient way for us to create so many reports for testing?
Currently, I have to manually register souce & trigger event to send an aggregatable report to GCS, which is really inefficient...
Thanks,
Yang
We have established a fully automated workflow that collects ARA production reports, processes them, and forwards them to the Aggregation Service. Additionally, we are considering utilizing PAA to process bid request data to enhance our bidding models by collecting data about lost auctions. However, this approach would substantially increase the workload on the Aggregation Service (AS), as the volume of bid requests far exceeds the data used for attribution. Specifically, we exceed 400 million rows with 200 million domain buckets as indicated in the aggregation service sizing guide.
We used 1 day of our prod bidding data as input and we modified aggregation service tool to generate valid PAA reports with debug mode enabled, In the end, we had approximately 2.23 billion reports with an associated domain file of 685 Million distinct buckets.
We cannot batch PAA reports the same way we do for Attribution Reporting summary reports. For ARA summary reports we group individual reports by shared_info.attribution_destination to create report batches, for each group we create the associated domain by listing all the possible buckets of the advertiser identified using shared_info.attribution_destination field.
There is no such field in PAA, the only remaining batching option we have is to batch by the reception time, either daily or hourly, by design, the more data we aggregate the less noise we have so it’s always better to launch a daily aggregation. to stress test the AS we first split our daily data into 100 batches and run 100 different aggregations we then try to scale up and run a daily aggregation by first using the domain file for the whole day. The number of 100 was picked arbitrarily.
Daily data was split into 100 batches resulting in approx. 22 Million reports and 6.8 Million domains per batch, this falls within the instance recommendation matrix.
The configuration used m5.12xlarge EC2 instances with a maximum auto-scaling capacity of 15 instances. A custom script simultaneously triggered all 100 aggregation jobs, which were executed with debug mode enabled
Aggregation jobs were launched sequentially All the executions completed with the status DEBUG_SUCCESS_WITH_PRIVACY_BUDGET_EXHAUSTED except for one batch, which finished with a SUCCESS status
Each execution took about 30~35min to finish, the whole aggregation took approximately 4h to execute.
The graph represents the number of jobs that remain to be processed on AWS. The first query was received at approx 11:50 and the last job finished at 15:48.
Note:
Almost all of our executions completed with DEBUG_SUCCESS_WITH_PRIVACY_BUDGET_EXHAUSTED, the batching strategy that we used for the load test is not viable in production.
As we wanted to run the aggregation on the entire domain we only batched the reports and we kept the domain as it is, it resulted in 100 aggregation jobs with 22 million input reports and 684 Million input domains.
Job executions resulted in INPUT_DATA_READ_FAILED error, AS logs aren’t very explicit but this seems to be related to the domain being too large. We used an m5.12xlarge and then m5.24xlarge instances, results were the same. Below is the job stack trace from AS.
com.google.aggregate.adtech.worker.exceptions.AggregationJobProcessException: Exception while reading domain input data.
com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor.process(ConcurrentAggregationProcessor.java:305)
com.google.aggregate.adtech.worker.WorkerPullWorkService.run(WorkerPullWorkService.java:142)
com.google.common.util.concurrent.AbstractExecutionThreadService$1.lambda$doStart$1(AbstractExecutionThreadService.java:57)
The root cause is: software.amazon.awssdk.services.s3.model.S3Exception: The requested range is not satisfiable (Service: S3, Status Code: 416, Request ID: XXXXXXXXXXXXXXXX, Extended Request ID: XXXXXXXXXXXXXXXX
software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:125)
software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:82)
software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:60)"
When inputting report/domain volumes that fall within the instance recommendation matrix, aggregation finishes successfully with reasonable time frames that are not far from those shared here
When the input report/domain volume is above those mentioned in the sizing guidance, the aggregation service encounters failures even with the largest instances. It poses a problem for Criteo as it puts at risk our target use case for the Private Aggregation API
The sizing shared in the doc is the true maximum of the AS, any use case requiring more domains should be revisited
As we have seen before when investigating ARA issues but even more pronounced here (because of new errors that we had not encountered so far), the errors of the AS are hard to understand and do not tell us why the jobs are failing.
To address these issues and improve the feasibility for our use case, the following actions are recommended
Hello aggregation service team,
We (Criteo) would like to seek clarification on a couple of points to ensure we have a comprehensive understanding of certain features.
Your insights will greatly assist us in optimizing our utilization of the platform:
Batch Size Limit (30k reports):
Could you kindly provide more details about the batch size limit of 30,000?
We are a little unsure as to how this limit behaves: it is our understanding that the aggregation service will expect loads of up to tens (even hundreds) of thousands of reports. However when we provide it with batches of 50k+ reports, our aggregations fail.
Is the limit of 30k a limit that is to be enforced per avro file within the batch? Per batch overall?
If it is per overall batch, is there any kind of suggestion on your side to aggregate batches of more than 30k reports?
If we need to split these larger aggregations over several smaller requests, that will greatly increase the noise levels we see in our final results, and would work against the idea of the aggregation service, which encourages adtechs to aggregate as many reports as possible to increase privacy.
Understanding the specifics of this limit should greatly help us in tailoring our processes more effectively.
Debug Information on Privacy Budget Exhaustion:
We've been considering ways to enhance our debugging capabilities, especially in situations where the privacy budget is exhausted. Would it be possible to obtain more detailed debug information in such cases, specifically regarding the occurrence of duplicates? We believe that having for instance the report_ids of the duplicates wouldn't compromise privacy, and would significantly contribute to our troubleshooting efforts.
Hi
i have enrolled and managed to deploy the aggregation service and it looks ok (i see metrics logs and everything)
i do however have some questions:
i have some reports (with and without clear text option) i got a domain file, i took both of them and tried to use them in the local testing tool - everything looked ok- got a good output i have used the non-encrypted output
then, i took the reports and the domain file and use them with the deployed aggregation service (of course this time encrypted since the local tool doesnt accept encrypted files)
i got the following error (in this example i have sent only 1 report but also when i sent 200 i get the same error but with 200 as count):
**"result_info": {
"return_code": "SUCCESS_WITH_ERRORS",
"return_message": "Aggregation job successfully processed but some reports have errors.",
"error_summary": {
"error_counts": [
{
"category": "SERVICE_ERROR",
"count": 1,
"description": "Internal error occurred during operation."
},
{
"category": "NUM_REPORTS_WITH_ERRORS",
"count": 1,
"description": "Total number of reports that had an error. These reports were not considered in aggregation. See additional error messages for details on specific reasons."
}
],
"error_messages": []
}, **
the output report contains all the keys from the domain file but all of the metrics are just pure noise.. also i dont understand this error message, and i think it will be great to get more elaborate error message (since it is empty and doesnt give anymore info)
im using the latests 2.4.2 version
im really stuck right now can you help?
looking at the auto scailing group for the Aggregation Service i saw that no policy for auto scaling has been created after deploy (im looking in AWS->auto scaling groups -> my service -> automatic scaling or instance manager, however when sending some jobs i did see that the instnace number went up and then down.. what am i missing?
do i need to create those policies myself? do you have a recommendations on what metrics to use for auto scaling? including thresholds?
thanks!!
I am trying to follow the instructions in Testing locally using Local Testing Tool but when I run the following command with the sampledata:
java -jar LocalTestingTool_2.0.0.jar \
--input_data_avro_file sampledata/output_debug_reports.avro \
--domain_avro_file sampledata/output_domain.avro \
--output_directory .
I get the error below:
2023-10-31 12:21:57:506 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.WorkerPullWorkService - Aggregation worker started
2023-10-31 12:21:57:545 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.WorkerPullWorkService - Item pulled
2023-10-31 12:21:57:555 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor - Reports shards detected by blob storage client: [output_debug_reports.avro]
2023-10-31 12:21:57:566 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor - Reports shards to be used: [DataLocation{blobStoreDataLocation=BlobStoreDataLocation{bucket=/Users/jonaquino/projects/aggregation-service/sampledata, key=output_debug_reports.avro}}]
2023-10-31 12:21:57:566 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.domain.OutputDomainProcessor - Output domain shards detected by blob storage client: [output_domain.avro]
2023-10-31 12:21:57:567 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.domain.OutputDomainProcessor - Output domain shards to be used: [DataLocation{blobStoreDataLocation=BlobStoreDataLocation{bucket=/Users/jonaquino/projects/aggregation-service/sampledata, key=output_domain.avro}}]
2023-10-31 12:21:57:575 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor - Job parameters didn't have a report error threshold configured. Taking the default percentage value 10.000000
return_code: "REPORTS_WITH_ERRORS_EXCEEDED_THRESHOLD"
return_message: "Aggregation job failed early because the number of reports excluded from aggregation exceeded threshold."
error_summary {
error_counts {
category: "REQUIRED_SHAREDINFO_FIELD_INVALID"
count: 1
description: "One or more required SharedInfo fields are empty or invalid."
}
error_counts {
category: "NUM_REPORTS_WITH_ERRORS"
count: 1
description: "Total number of reports that had an error. These reports were not considered in aggregation. See additional error messages for details on specific reasons."
}
}
finished_at {
seconds: 1698780117
nanos: 679576000
}
CustomMetric{nameSpace=scp/worker, name=WorkerJobCompletion, value=1.0, unit=Count, labels={Type=Success}}
2023-10-31 12:21:57:732 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.WorkerPullWorkService - No job pulled.
Running the step "Building artifacts" from https://github.com/privacysandbox/aggregation-service/blob/main/build-scripts/aws/README.md#building-artifacts
To build the artifacts on region: eu-west-1
The CodeBuild failed with the below error:
amazon-ebs.sample-ami: Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
754 | ==> amazon-ebs.sample-ami: Existing lock /var/run/yum.pid: another copy is running as pid 3465.
755 | ==> amazon-ebs.sample-ami: Another app is currently holding the yum lock; waiting for it to exit...
Hi all!
The Aggregation service team is currently exploring options for adtechs who may want to migrate from one cloud provider to another. This gives adtechs flexibility in using a cloud provider of their choice to optimize for cost or other business needs. Our proposed migration solution would enable adtechs to re-encrypt their reports from a source cloud provider (let’s call this Cloud A) to a destination cloud provider (let’s call this Cloud B) and enable them to use Cloud B to process reports originally encrypted for Cloud A as part of the migration. After migration is completed, use of Cloud A for processing reports will be disabled and the adtech will only be able to use Cloud B to process their reports.
In the short-term, this solution will support migration of aggregation service jobs from AWS to GCP and vice versa. As we support more cloud options in the future, this solution would be extensible to moving from any supported cloud provider to another.
Depiction of the re-encryption flow:
For any adtechs considering a migration, we encourage completing this migration before third-party cookie deprecation to take advantage of feature benefits such as:
After third-party cookie deprecation, we plan to continue to support cloud migration with the re-encryption feature, but may not be able to give the additional benefits outlined above to preserve privacy.
We welcome any feedback on this proposal.
Thank you!
Hello,
I'm trying to build and deploy images based on the steps here:
https://github.com/privacysandbox/aggregation-service/blob/2b3d5c450d0be4e2ce0f4cb49444f3f049508917/build-scripts/gcp/cloudbuild.yaml
This uploads the compiled JAR files to the bucket, however I can not use these directly in cloud functions and have to download them, zip them, and reupload them (this is automaticely done for users in terraform). Ideally I'd like to skip this step and was hoping to be able to directly upload those JAR files zipped.
We are seeking feedback on consolidating coordinator services for attribution reporting and other workloads. Please review and comment on the main issue posted on the WICG/protected-auction-services-discussion#69 repository.
Hello everyone, I'm currently trying to create a version of attribution-reporting in NODE JS so far so good, I managed to complete the entire journey (trigger interactions with creatives, conversion on the final website, generate event and aggregable reports)
But I got to this part where I must store the aggregatable reports before sending them to the aggregation services, I wanted to know if anyone else did this step of collecting the reports in NODE JS
Below is the code responsible for collecting and storing the reports (I took the documentation code written in GO as a reference)
*Spoiler: Each report record I receive generates an .avro file
const avro = require('avsc');
const REPORTS_AVRO_SCHEMA = {
"name": "AvroAggregatableReport",
"type": "record",
"fields": [
{ "name": "payload", "type": "bytes" },
{ "name": "key_id", "type": "string" },
{ "name": "shared_info", "type": "string" }
]
};
const RECORD_SCHEMA = avro.Type.forSchema(REPORTS_AVRO_SCHEMA);
const registerAggregateReport = (req, res) => {
try {
// const report = req.body;
// Example to illustrate what the request body would be
const report = {
"aggregation_coordinator_origin": "https://publickeyservice.msmt.aws.privacysandboxservices.com",
"aggregation_service_payloads": [
{
"key_id": "bbe6351f-5619-4c98-84b2-4a74fa1ae254",
"payload": "7K9SQLdROKqITmnrkgIDulfEXDAR76XUP4vc6uzxPwDycQql3AhR3dxeXdEw2gbUaIAldnu33RSN4SAFcFFKgDQkvnhFzPoxJjO2Yfw4osJ1S0Odp0smu0rC5k5GuG4oIu9YQofCPNmSD7KRVJ9Y6Lucz3BXoI3RQhpQkO31RDyxVJdBbJ8JiS2KBtu8naUf5Z+/mNNKp39ObsNbo7kQKI0TwyRJDSJKqv42Yi3ctoAhOT0eaaUtMfho67i9XaEtVnh8wB4Mi+nzlAfVsGIavP6aXWDe44IgKZvTS/zEKjI68+nzWkyfdRNOf7jtb2XnoB7k5iM+Yu9Ayk5ic/aT1eA1iPEzLvW/tNLcohne3UL2DefZoTLb5l9aludA7Qlf0g+kW9nuvUSmHBuTjE/fTY5s9uRExHH+b2Hjm2sL9DyrFZUFqcl/KLS+McgOT8I0ZTpPRmr+njW8+4b01Hsc2MpY3KKAn1jUDUE45pGbhj/Gqlb1ikJO9nNKS/nnWJgR7+3P8JEpHC2fkfEase4+vrNxZujWolYfTUxswJpiEZs1+fCOroEyyEY6Zjvx5qLbk+7wMNqCeCltDPA6c8WtAPtMreIUvKbco6XUUzaGSnvWLz6/WJqCxG4hjPOfcYAWXIwSboqvNyBHrRr4H5V7C0unSkIjd0j/GeB3ywgnKEqiihuvZ5PPw+O5aYqJdaR3QEFZtpLj+3Uv4OGn2+CvU1thV0A0H1XViP846Tfmb0jVejN1+ih+VO5cf/7T2TPz6oGO9sa6qitWtll5vhwxVyG3vniCo3xghGnUcHSP5ogfp6qgDGSgsGFqSvdiuOpQU+MG/HrCDUjvce0GoXJP6674UcurGxR9UKAnVwZyKRIj/q9qzUgxhWEFC3ssADMmxhZBs3X+rrAxKfhXD12MfuUluRTCzpCKZ9/YapnJQYjngGx7GIkfW6tw8eSCC8yO41vWyHGRz4nKlgNeQkwYafGPzXqUXjyEyiupMUlmSsU/zT52wdCQYLJbQg7xhNuLebb8qh9LW07jMho4Vo9DBP9l463uqA8hcZnJ"
}
],
"shared_info": "{\"api\":\"attribution-reporting\",\"attribution_destination\":\"https://cliente.com\",\"report_id\":\"4d82121f-7d62-4fa4-bda4-a70c9e850089\",\"reporting_origin\":\"https://attribution.ads.uol.com.br\",\"scheduled_report_time\":\"1714764978\",\"source_registration_time\":\"0\",\"version\":\"0.1\"}"
}
report.aggregation_service_payloads.map(payload => {
const payloadBytes = Buffer.from(payload.payload, 'base64');
const record = {
payload: payloadBytes,
key_id: payload.key_id,
shared_info: report.shared_info,
};
const outputFilename = `./reports/output_reports_${Date.now()}.avro`;
const encoder = avro.createFileEncoder(outputFilename, RECORD_SCHEMA);
encoder.write(record);
encoder.end()
});
res.status(200).send('Report received successfully.');
} catch (e) {
console.error('Error processing report:', e);
res.status(400).send('Failed to process report.');
}
};
module.exports = {
registerAggregateReport
}
*English is not my native language so take it easy
As the death of third-party cookies is something that will affect everyone, it would be nice to have references in more commonly used languages such as NodeJs, Java, etc., I hope this post can contribute in some way to this
Hi all!
We are currently exploring migration from origin enrollment to site enrollment for the Aggregation Service (current form using origin here) for the following reasons:
As a follow up to this proposal, we would like to support multiple origins in a batch of aggregatable reports. Do adtechs have a preference or blocking concern with either specifying a list of origins or the site in the createJob request?
Running the step "Building artifacts" from https://github.com/privacysandbox/aggregation-service/blob/main/build-scripts/aws/README.md#building-artifacts
To build the artifacts on region: eu-west-1
The CodeBuild failedwith the below error:
836 | --> amazon-ebs.sample-ami: AMIs were created:
837 | us-east-1: ami-069b14bccedc04571
....
[Container] 2023/05/09 15:34:31 Running command bash build-scripts/aws/set_ami_to_public.sh set_ami_to_public_by_prefix aggregation-service-enclave_$(cat VERSION) $AWS_DEFAULT_REGION $AWS_ACCOUNT_ID
841 |
842 | An error occurred (InvalidAMIID.Malformed) when calling the ModifyImageAttribute operation: Invalid id: "" (expecting "ami-...")
843 |
844 | An error occurred (MissingParameter) when calling the ModifySnapshotAttribute operation: Value () for parameter snapshotId is invalid. Parameter may not be null or empty.
845
The reason is that it created the ami on us-east-1 instead of eu-west-1
Hi all!
We recently published a proposal for the aggregation service release and end-os-support plan. This plan outlines a standardized cadence for feature releases, in addition to a strategy for patches:
Aggregation service release and end-of-support plan
We're opening this issue to solicit general feedback on the proposal.
cc @hostirosti
In the AWS instructions, there are two options for using the AMI in a region other than us-east-1:
If you like to deploy the aggregation service in a different region you need to copy the released AMI to your account or build it using our provided scripts.
I have been having a lot of trouble building the AMI using the provided scripts, so I would like to try simply copying the AMI (the first option), but I don't see instructions for this. What is the AMI name and where do I get it from? Do I need to change any parameters to point to the new region? What step should I move on to after copying the AMI?
Could you add instructions for copying the AMI and subsequent steps?
Hi Aggregation Service testers,
We have discovered an issue that broke the AWS worker build, caused by an incompatible Docker engine version upgrade. We are planning to release a new patch next week. Meanwhile, if you encounter issues building AWS worker, you can use the following workaround:
<repo_root>/build_defs/shared_libraries/pin_pkr_docker.patch
with the following content:diff --git a/operator/worker/aws/setup_enclave.sh b/operator/worker/aws/setup_enclave.sh
index e4bd30371..8bf2e0fb1 100644
--- a/operator/worker/aws/setup_enclave.sh
+++ b/operator/worker/aws/setup_enclave.sh
@@ -19,7 +19,7 @@ sudo yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/late
#
# Builds enclave image inside the /home/ec2-user directory as part of automatic
# AMI generation.
-sudo yum install docker -y
+sudo yum install docker-24.0.5-1.amzn2023.0.3 -y
sudo systemctl enable docker
sudo systemctl start docker
patches
under shared_libraries
rules in the WORKSPACE
file. The shared_libraries
rule should now become:git_repository(
name = "shared_libraries",
patch_args = [
"-p1",
],
remote = "https://github.com/privacysandbox/coordinator-services-and-shared-libraries",
patches = [
"//build_defs/shared_libraries:coordinator.patch",
"//build_defs/shared_libraries:gcs_storage_client.patch",
"//build_defs/shared_libraries:dependency_update.patch",
"//build_defs/shared_libraries:key_cache_ttl.patch",
"//build_defs/shared_libraries:pin_pkr_docker.patch",
],
tag = COORDINATOR_VERSION,
workspace_file = "@shared_libraries_workspace//file",
)
Thank you!
The documentation states that:
Note: The prebuilt Amazon Machine Image (AMI) for the aggregation service is only available in the us-east-1 region. If you like to deploy the aggregation service in a different region you need to copy the released AMI to your account or build it using our provided scripts.
When I try to copy the AMI to my account I'm getting the following error:
Failed to copy ami-036942f537f7a7c2b
You do not have permission to access the storage of this ami
Can you give me some guidance or tell me if it's a configuration error?
Context:
Hello,
I'm currently experimenting with the Private Aggregation API and I'm struggling to validate that my final output is correct
From my worklet, I perform the following histogram contribution:
privateAggregation.contributeToHistogram({ bucket: BigInt(1369), value: 128 });
Which is correctly triggering a POST request with the following body:
{
aggregation_service_payloads: [
{
debug_cleartext_payload: 'omRkYXRhgaJldmFsdWVEAAAAgGZidWNrZXRQAAAAAAAAAAAAAAAAAAAFWWlvcGVyYXRpb25paGlzdG9ncmFt',
key_id: 'bca09245-2ef0-4fdf-a4fa-226306fc2a09',
payload: 'RVd7QRTTUmPp0i1zBev+4W8lJK8gLIIod6LUjPkfbxCOHsQLBW/jRn642YZ2HYpYkiMK9+PprU5CUi9W7TwJToQ4UXiUbJUgYwliqBFC+aAcwsKJ3Hg46joHZXV5E0ZheeFTqqvLtiJxlVpzFcWd'
}
],
debug_key: '777',
shared_info: '{"api":"shared-storage","debug_mode":"enabled","report_id":"aaa889f1-2adc-4796-9e46-c652a08e18ca","reporting_origin":"http://adtech.localhost:3000","scheduled_report_time":"1698074105","version":"0.1"}'
}
I've setup a small node.js server handling requests on /.well-known/private-aggregation/debug/report-shared-storage
basically doing this:
const encoder = avro.createFileEncoder(
`${REPORT_UPLOAD_PATH}/debug}/aggregation_report_${Date.now()}.avro`,
reportType
);
reportContent.aggregation_service_payloads.forEach((payload) => {
console.log(
"Decoded data from debug_cleartext_payload:",
readDataFromCleartextPayload(payload.debug_cleartext_payload)
);
encoder.write({
payload: convertPayloadToBytes(payload.debug_cleartext_payload),
key_id: payload.key_id,
shared_info: reportContent.shared_info,
});
});
encoder.end();
As you can see at this point I'm printing the decoded data on console and I can see as expected:
Decoded data from debug_cleartext_payload: { value: 128, bucket: 1369 }
However, now I'm trying to generate a summary report with the local test tool by running the following command:
java -jar LocalTestingTool_2.0.0.jar --input_data_avro_file aggregation_report_1698071597075.avro --domain_avro_file output_domain.avro --no_noising --json_output --output_directory ./results
No matther what value I've passed as payload of the contributeToHistogram
method, I always got 0 on the metric field:
[ {
"bucket" : "MTM2OQ==", // 1369 base64 encoded
"metric" : 0
} ]
Am I doing something wrong ?
Apart of this issue, I wonder how it would work in real life application, currently this example is handling one report at a time which is sent instantly because of being in debug_mode, but in real situation, how are we supposed to process a big amount of reports at once ? Can we pass a list of files to the --input_data_avro_file
? Should we batch the reports prior to converting it to avro
based on the shared_info
data? If yes, based on which field?
Thank you by advance !
The various links to get the local testing tool do not work (see for instance here https://github.com/privacysandbox/aggregation-service/blob/main/COLLECTING.md#produce-a-summary-report-locally).
Even replacing the {VERSION} by 0.4.0 in the link does not solve the issue.
Thanks a lot!
P.S. I could get the previous release (ie 0.3.0) using the link available before the 0.4.0 release. See the associated diff of the release.
Tried to kick off a build of the build container using the git hash for v2.4.2 and got the error below
I believe its due to a missing "-y" on apt-get install here:
https://github.com/privacysandbox/aggregation-service/blame/22c2a42ea98b88e5dd3451446db2b7a152760274/build-scripts/gcp/build-container/Dockerfile#L63
Google Ldap: evgenyy@ if you want to reach out internally
Step 9/12 : RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.asc] https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | tee /usr/share/keyrings/cloud.google.asc && apt-get update && apt-get install google-cloud-cli && apt-get -y autoclean && apt-get -y autoremove
---> Running in e691327d6e48
deb [signed-by=/usr/share/keyrings/cloud.google.asc] https://packages.cloud.google.com/apt cloud-sdk main
�[91m % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent L�[0m�[91meft Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0�[0m�[91m
100 2659 100 2659 0 0 42686 0 --:--:-- --:--:-- --:--:-- 42887
�[0m-----BEGIN PGP PUBLIC KEY BLOCK-----
...
-----END PGP PUBLIC KEY BLOCK-----
Hit:1 https://download.docker.com/linux/debian bookworm InRelease
Hit:2 http://deb.debian.org/debian bookworm InRelease
Hit:3 http://deb.debian.org/debian bookworm-updates InRelease
Get:4 https://packages.cloud.google.com/apt cloud-sdk InRelease [6361 B]
Hit:5 http://deb.debian.org/debian-security bookworm-security InRelease
Get:6 https://packages.cloud.google.com/apt cloud-sdk/main amd64 Packages [629 kB]
Fetched 636 kB in 1s (1239 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
google-cloud-cli-anthoscli
Suggested packages:
google-cloud-cli-app-engine-java google-cloud-cli-app-engine-python
google-cloud-cli-pubsub-emulator google-cloud-cli-bigtable-emulator
google-cloud-cli-datastore-emulator kubectl
The following NEW packages will be installed:
google-cloud-cli google-cloud-cli-anthoscli
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 106 MB of archives.
After this operation, 609 MB of additional disk space will be used.
Do you want to continue? [Y/n] Abort.
The command '/bin/sh -c echo "deb [signed-by=/usr/share/keyrings/cloud.google.asc] https://packages.cloud.google.com/apt cloud-sdk main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | tee /usr/share/keyrings/cloud.google.asc && apt-get update && apt-get install google-cloud-cli && apt-get -y autoclean && apt-get -y autoremove' returned a non-zero code: 1
ERROR
ERROR: build step 0 "gcr.io/cloud-builders/docker" failed: step exited with non-zero status: 1
Hello team, how are you?
Guys, after uploading the AggregationService to the AWS environment, I was carrying out some tests to generate the summary report, with these tests I noticed that the noise was greatly impacting the values of the metrics, given this scenario I implemented scaling the value and defining the epislon, but after adding the epislon definition to /createJob, the AggregationService is always returning the PRIVACY_BUDGET_EXHAUSTED error, regardless of whether they are new reports (new .avro files).
I wanted to see if anyone had any tips on how to identify the source of this error and consequently how I could get around it.
CreateJob body Request
{
"input_data_blob_prefix": "reports_17",
"input_data_bucket_name": "uolcsm-uolads-aggregate-reports",
"output_data_blob_prefix": "output/summary_report_25.avro",
"output_data_bucket_name": "uolcsm-uolads-aggregate-reports",
"job_parameters": {
"debug_privacy_epsilon": "10",
"attribution_report_to": "https://attribution.ads.uol.com.br",
"output_domain_blob_prefix": "output_domain.avro",
"output_domain_bucket_name": "uolcsm-uolads-aggregate-reports"
},
"job_request_id": "test80"
}
AggregationService getJob response
{
"job_status": "FINISHED",
"request_received_at": "2024-06-19T20:20:56.246430Z",
"request_updated_at": "2024-06-19T20:21:13.512574536Z",
"job_request_id": "test80",
"input_data_blob_prefix": "reports_17",
"input_data_bucket_name": "uolcsm-uolads-aggregate-reports",
"output_data_blob_prefix": "output/summary_report_25.avro",
"output_data_bucket_name": "uolcsm-uolads-aggregate-reports",
"postback_url": "",
"result_info": {
"return_code": "PRIVACY_BUDGET_EXHAUSTED",
"return_message": "com.google.aggregate.adtech.worker.exceptions.AggregationJobProcessException: Insufficient privacy budget for one or more aggregatable reports. No aggregatable report can appear in more than one aggregation job. \n com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor.consumePrivacyBudgetUnits(ConcurrentAggregationProcessor.java:472) \n com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor.process(ConcurrentAggregationProcessor.java:329) \n com.google.aggregate.adtech.worker.WorkerPullWorkService.run(WorkerPullWorkService.java:142)",
"error_summary": {
"error_counts": [],
"error_messages": []
},
"finished_at": "2024-06-19T20:21:13.503639102Z"
},
"job_parameters": {
"debug_privacy_epsilon": "10",
"output_domain_bucket_name": "uolcsm-uolads-aggregate-reports",
"output_domain_blob_prefix": "output_domain.avro",
"attribution_report_to": "https://attribution.ads.uol.com.br"
},
"request_processing_started_at": "2024-06-19T20:21:02.916500310Z"
}
*I was taking a look at NoiseLab and the solution to mitigate the impact of noise would be to scale my values and use an epislon greater than 0
Reports used during testing:
//Histogram 1
[
{
"key": "0x29a",
"value": 10054
}
]
//Report 1
{
"aggregation_coordinator_origin": "https://publickeyservice.msmt.aws.privacysandboxservices.com",
"aggregation_service_payloads": [ {
"debug_cleartext_payload": "omRkYXRhlKJldmFsdWVEAAAnRmZidWNrZXRQAAAAAAAAAAAAAAAAAAACmqJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAGlvcGVyYXRpb25paGlzdG9ncmFt",
"key_id": "b16a91ad-6d09-491f-af3e-31faef3d116f",
"payload": "tQpGPd6LRE1I16SoUk9ifwdyvc9zZAG1Qj1zAQSLhktOi8k1kLcBgYpTwq+OfpLIBpePCxHC1LeXiCdPAgEPWQW6gM6wumBTgSF611YGN+i52IZPYyR1k4F9yGocRg5AE6wjMDxLVr71QyJ7At17Ulje6Jj5rIlByPTAnNUWwDB0T1XuMx9gWn3mhS3M1fTvo1Z77VhnysZJuUdu5Q1BeLX1EUkl1o08tw8bRFPjIinBgVEX1iJRymxeCmfiqV/oZOUf7yU1XUWOEHiVgWQFc9ZLgbja3xLzOL+WlOEXD6Wn1Nu6Gq1LD0U7G450jP+x2wa8fMmHUS9LPmyM/uP8c7dkEHHRV02TuhtR7sepGmnaMfGHJNT/poiEMX1XRF6/3iqhn9o6kyrFuMZ0VPEdrCtN4RIpReQUJXD318jTYPCpxtVUQsZojdU05+IavAUYtVYMepUXV87VBjRVEzCagLYekw9AOGlPOtfVSfGl7DZech+pUHwRA2GPT/W6mnNfWMxq76XsEgO+Xc1Ap1FKNmUZhEgzdP9PFwrYbn9GEwjhvyUxuPk85lF10yqRbrkOSiYH3RHMy80L1uGJIeiAFPvGtHwWNmrfLMKqxNQLlYnJPUjnKgM7jgO0VNDrCJxUuSo8u0WHlGkDK8F7tAlD3W2k45ZkGFPKlzXeaP1mqbpP7YSolyMFHvzwipq8ztGqzGN19BKxQmhlJQLW80B/kjzqK1GV4xzPHWtqq5yctV3OScnfws9RRJVUsoNIXakX2EwXLuapd2AhgXuRp3Ojcg/l9JjxKZliFYA2aMjT1yYfD1JCcB3l7cBYvPb4QKPgYXOXaR6lCSX6ZMVgFnQm7Lk7CYhZTaSuaXDj5j3irIWLnN7aXOMsZ/SDamRdQ30Hm5eFzaANpaENWNO3oil+fRrlIe8bCj5TPPIwZcNdTbJ6564cT2MQBMUiZeqFq7+z61H2E8MJbSuXuXRjMFoc095nM2sIkci9Fyb80VwOPlXHdZN4Azf85taPhGjUOrXrs7X8HGIAouqhEiHjBa0Z6IDBdYNu3f6hSwwn"
} ],
"shared_info": "{\"api\":\"attribution-reporting\",\"attribution_destination\":\"https://cliente.com\",\"debug_mode\":\"enabled\",\"report_id\":\"f23783ec-b335-41ff-8885-dc529ccf800f\",\"reporting_origin\":\"https://attribution.ads.uol.com.br\",\"scheduled_report_time\":\"1718829420\",\"source_registration_time\":\"0\",\"version\":\"0.1\"}",
"source_debug_key": "647775351539539",
"trigger_debug_key": "647775351539539"
}
//Histogram 2
[
{
"key": "0x29a",
"value": 10074
}
]
//Report 2
{
"aggregation_coordinator_origin": "https://publickeyservice.msmt.aws.privacysandboxservices.com",
"aggregation_service_payloads": [ {
"debug_cleartext_payload": "omRkYXRhlKJldmFsdWVEAAAnWmZidWNrZXRQAAAAAAAAAAAAAAAAAAACmqJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAGlvcGVyYXRpb25paGlzdG9ncmFt",
"key_id": "3e9d0be8-6cc6-4e80-862f-b5c1ba34b03d",
"payload": "3fuou19wP7qL616ylGdTVp26HiFLpylrHIXf5sHF/xs+3prz3S3cQ2J16ojKQkyD9AWOKU+tBYMZJXCT6/jDD8Tv0kP6TjPGAiyOyWQ5ATdw8xsPqCxZuvtVeywxHZEZ65NBgOWm/txLBMMtQLAFHc0s+gdq4pHPem9Q1PHrAtCIO3dUO0xlvSMSIZJzhjPrNzB5XBzfLHXs0ACl533RLbv7SRuxb9njkVE0WIx80OY6jeBiUFT5NHaP610YU+tULCaA6mg7XtQUWh1voWDKplJwYlfPOWoPItMX0v1OS69+Zmc669CUN/hV+PI/meyX9PagbwfXs4Rt3IrQAN5MzaXHSPq5Z/eMsbiOfhCO3lGYxBKo/9KpWG8Qt9BWAALntHqg+y7BItiCC8NyY9pJZ3n9ghul2QdJy1QkDOuFOwquiKQsPmFMY5kQUaN4d0E37PwKT7fmZiDyblIUCBHjaS+jM7eUH3wYCoylm4HeqR9gzA2BdN+Vm+zctUfvYmT2QOLEVKwBMp+UOZ67P5ABOMLV2jUIkx6uoqY9eDngJYk9bRwQbNmqTp54hbdOnjg8eYt6MocXjWUttfg6NKFQTEHkQ125SjDBG+T4angbM91rzZA2gUWqH32yszTAsCVsgRUu00iQTpnjtVdCxIyKKPZShGZiVloJHv7bG2sbvgp8aVQ0Y3xjL38Sea9xUvQDdsgwnVSeYeJUNakmi0Ni0bVdbplVugMuE1SoZ/Xf6Y7klg1peEygBoVC1W7i2Z/VPoaoQ7ctwyqmMXoMXlVcLsdB8Tp1ph6+aJTfZcF8ZwpU0vfq+MRjEO9dp6kqu3WBD5Q+jaAOykFW3wXBN+CQx1jk07dGvSf2JZcwjG62f/JUVPvdpyJEmDaaIfupoMap23BdD82F/9y43prFJwIGDOzLolDae7bJh1q8A2BTDldKg5z4x7Hc85MkpmiAuY7kZ1fyChxsu6COMfucqmbKrzDzZGwpLnnKl/95kk7eTlbr4o7VTCs+cPVHV0GtMTNydUGUgbrvPiuh9AmehHvGavWVx2++oQSpyEvv"
} ],
"shared_info": "{\"api\":\"attribution-reporting\",\"attribution_destination\":\"https://cliente.com\",\"debug_mode\":\"enabled\",\"report_id\":\"15cfee94-a102-4215-9367-1e17862d7769\",\"reporting_origin\":\"https://attribution.ads.uol.com.br\",\"scheduled_report_time\":\"1718829459\",\"source_registration_time\":\"0\",\"version\":\"0.1\"}",
"source_debug_key": "647775351539539",
"trigger_debug_key": "647775351539539"
}
//Histogram 3
[
{
"key": "0x29a",
"value": 10029
}
]
//Report 3
{
"aggregation_coordinator_origin": "https://publickeyservice.msmt.aws.privacysandboxservices.com",
"aggregation_service_payloads": [ {
"debug_cleartext_payload": "omRkYXRhlKJldmFsdWVEAAAnLWZidWNrZXRQAAAAAAAAAAAAAAAAAAACmqJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAGlvcGVyYXRpb25paGlzdG9ncmFt",
"key_id": "c29773d9-a72b-4147-bd60-3777a3844054",
"payload": "e3oi8KV4uK8aYUgZIlYFFrZsENSY0gsgcdzYtKECqnFLzGF+VwHsLvbzg5QAAfBhqM9y/6bgCUssbEBfOYwNcSGNP6KTxuCteocfbL7FcpijnsRQ8001oc8Hhcx1efr/h9Kh/14w3HDd74rRMMnNS2L7RIEnGqkcZM6Y4/HGDIK0nd9CptuxZjjxuLs37fnIj2ZRgTrKvVEk8EoE0ro+54cBbfYLMC+SMLJ5NefRmNA29gUnXuzvkOCBc5zp9XvtedyVjZzcatFJooPdY5UJhCYso121vngmk5O4U2KAyf7mOL6vPkhOAf4irC5NUymmSqjHrfCwMZ5aGGJcfSYXwd5xp/MRBXlP9/hgiyg7vLUucul9jvNMZ9RDXlE/pZGpq2u9pOsmlSl2wKh4I7xhv0PQgOwjj5N43YnGvarFC6Js7mVyVyDccnga6u6RdJsfH2i0ObhF2vJenkexycLLykCPS5TZrUkIBolglkq/Z5tiXIbnugh8jrmy4IAd8z/XBK9CgPq+sGmdLG8ZKVKiguylwXTL/Zl0IxdNhl6GIIRkfABaDUSUr7tgCiLWHVAKLMbAyDalluGdfkRwhf7fMnI9fN6YIpq6VJ+H7JXNbdnsP2maL4z+8eGs+Q2eqa8eMkB2qYinPCSQIHLR47Xk9xn63H2EQhko1LcNq9lF0Id3xfvZI5WU/i+lUv4ZpF1XTV/3aciHD8C5gKzgyceFmv65EjiHhOQ19bu9tULuxaROAozrJ9wL5SieFgVb8QqpvshrhhQGu4E0FtnNwMOUrGm1R8so5J0n4vqUM8t2f7GAdqq7LIMXfEatjENV4bdWyJev8IRnQtZPstjLKHFJM+v2QtsLIx2z2nGG0NqgvGlpT1jBOvVYg0hiosbtSAF8zb8fwTPhg7mgG/sNR2d2uVfQ/Ou8APip+BBTv871r2OMk9d6x0Zey61T5E7kP4di3GCXKT2qkUmr4yw1r9RvVkPZi37oWruN2y82DzGsB3XXbFtZ8J3gV0oO8xDx7+vHCQk1AarV88D85s1aV4wb808p06NRtXGE/R5D"
} ],
"shared_info": "{\"api\":\"attribution-reporting\",\"attribution_destination\":\"https://cliente.com\",\"debug_mode\":\"enabled\",\"report_id\":\"63e0ef38-d97e-431c-9b79-2abc6ee92793\",\"reporting_origin\":\"https://attribution.ads.uol.com.br\",\"scheduled_report_time\":\"1718829513\",\"source_registration_time\":\"0\",\"version\":\"0.1\"}",
"source_debug_key": "647775351539539",
"trigger_debug_key": "647775351539539"
}
//Histogram 4
[
{
"key": "0x29a",
"value": 10096
}
]
//Report 4
{
"aggregation_coordinator_origin": "https://publickeyservice.msmt.aws.privacysandboxservices.com",
"aggregation_service_payloads": [ {
"debug_cleartext_payload": "omRkYXRhlKJldmFsdWVEAAAncGZidWNrZXRQAAAAAAAAAAAAAAAAAAACmqJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAGlvcGVyYXRpb25paGlzdG9ncmFt",
"key_id": "71009a63-aa17-45b9-a5a9-49d78067210f",
"payload": "GvnO9L57AFyTdlejpGgXRmC8AfQG318hEl6K0gOoVWB4GHHNg0h0Gs/LwmhVCzUMq89rB6LuVD8SZ9XXpMFVWbP2XvJMU5gokgOo1TTtAWwvEBFxfcvJZ5hqfLMsQGRb91UrwRhUI+6GyhrDXV76sKS4j3D5OevAfamcdd9fp8SZhfB7umPL08ESXekkEDPHmmblBjrKLLyw0w4/23MMyCGRwnYtdc8uZYR1p0uyE3VoPYYnaRmHUBddsgRsvg+u2cwRho0PxIReHHRgHS50hELkSf0DXJjJHRd5ukOIu15LVBFBQ0VIoMB9RB61PkynjyhuLlqZk0d1Pif+ALQ7jyE4xiYKCgmkq/SowT6EW2v1UmX54KvcDXZx/eWh2rz373ZlGitYa/TtBPNDngM9PYVZrrRANWBjya4nBB6ElPAsVjgjF6+Zo3rkSj3tF0+WxdBOM7jbYbIjPOVoWViKUq6b2b5FMNR4FFYIDsVwNaLLrLBoDot09DFa+1gSHBNcYOyFdsy2sx9Mg/NOa3VeA8W6nTKLZd/KzDq70H7jRcj4wwZCL+gy4s45XBBHRtcaCfcokvEpCr3USvDP7miICzqD9+XL73qiyLpGeTpEZwAVGomX1kLSpTxsjaNcpoO0eNZMl7DVmsd37vjq6z43DqRuhSpigK3nbLK3lMtOVNUbCzItVqyYcSNpIdDiZGGtODPwppGeE1Xp8T0AxXp76PMR/pAUTaglxjmRVAhUZBIPuUKAXrDHCoQyBktqQHWEIBt0QomqcggQPgOTCP4nQ5nTE42FdhevV5DZoZik7UFLEx7WV09/96GBogV91dbkI9pG4717jgn8pebVDxP7xJ7WZRDbOmI4ESCcIXz5YqjKqoL1UFAfeOfLDVALh99LZJayy1igxQFdzeteKmzYmUPUaosPCXdJd552SynGow7mjZbWidLn8p51tdUvPyEtaTq9P4D360zhrld3ZkYYbdclBl7ywgxWFauczrh+oJOgWdm6C2F9rAKXwz43fiI86+bCZEnhoUFnJZihSoGNTrP519rDx0Bf/qw+"
} ],
"shared_info": "{\"api\":\"attribution-reporting\",\"attribution_destination\":\"https://cliente.com\",\"debug_mode\":\"enabled\",\"report_id\":\"e09684d5-4634-4ca7-af62-18bb03090f35\",\"reporting_origin\":\"https://attribution.ads.uol.com.br\",\"scheduled_report_time\":\"1718829605\",\"source_registration_time\":\"0\",\"version\":\"0.1\"}",
"source_debug_key": "647775351539539",
"trigger_debug_key": "647775351539539"
}
//Histogram 5
[
{
"key": "0x29a",
"value": 10172
}
]
//Report 5
{
"aggregation_coordinator_origin": "https://publickeyservice.msmt.aws.privacysandboxservices.com",
"aggregation_service_payloads": [ {
"debug_cleartext_payload": "omRkYXRhlKJldmFsdWVEAAAnvGZidWNrZXRQAAAAAAAAAAAAAAAAAAACmqJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAKJldmFsdWVEAAAAAGZidWNrZXRQAAAAAAAAAAAAAAAAAAAAAGlvcGVyYXRpb25paGlzdG9ncmFt",
"key_id": "3e9d0be8-6cc6-4e80-862f-b5c1ba34b03d",
"payload": "BtI4rdy6Lult3ikxY/QN3CTgPbhWyo1rzKwFn8yTZydUWSs5Kg1k+GW1kzfVQriFM1GWifKPfjyrFsJcEtS00JTpMKDWYuvquywq30KUgZrLowaEW5ajbDMS9+mDsJgEiIVy/Y7l4Zx7j14sdklOF7IJqXj9Di3sSJOFlB5IWuVLim03IB4cvOKu+KZKqkncrXnIliKrhZW3PuUmV3Beb5cHgR6gC1YOo4xHdr5fhM8IMSA4YBRKkp02Gxc1bv/B8PGDBkPFKwH+xtFWBw9myRja/ExvgNet7QTYReiOiKXsJom3iT8f2bObAQ2Hi4EGdrXSxxYUVsLPsFElRkGIy//mJyCDTqC/ItM3EpdHtJySADaeBp3viDilUyAWbDceNMtQeKyeQe9IWEBYClHAgFS8lqsBp/Lm/d2i9OieDTyUb+tIzIXkNpdEB7Zftfkr2ovo66rhFI/JMheoU29t/6mQHVUMJb+unaUzdgjT3E6CWggqsVBiICueW48N2sUQkX9VSSJmlYTlltI/I+I1AOTPgMI9zlaB9+L5qUzTYREanw+DuGdbM/2eqvQ8mxHEKvOJJ2XRSG7nQVLcWsMSh366Rdl830sv6NGyTCrZBaELV2mxIp7kcr2xx3xxSVJRIEG9L5wAgbc33HQmC+8x7i3I/+PvKHLx9RDaMw9yaCmEXH3B57no87kvOqIXLHULBt/mZ9346TaYChs3smxYzjErtCJI/CKFeWYIFpD3Ix0p/M96+qVpIp2xmPghXjhM+OGFd0ieQDGofgvN1S1qAccw5tXyyAc+3S0EaNkXeEE/Y4WN4lgxpbI06Qbk6X/ltXsm7xHILLmGOv4ic/O3/tCNWRQey8DawjmCCNINiHWo+QHiry3fjMkCKpiHKnxq4wJxGjSHh445prf7KZSVGCX2JSnkhXpZlJWsyN2zF3Tv3LRU/eOSqfiE9xuIccQsSJaK97wapgkvEmer+UecD7aWaPKDid4KnUiPIyTS9lWr3YzjOIhSlp4tYUWvdjsTkKKivQRFWFcQsRRMidINa0f7YeJCk6sc+7T2"
} ],
"shared_info": "{\"api\":\"attribution-reporting\",\"attribution_destination\":\"https://cliente.com\",\"debug_mode\":\"enabled\",\"report_id\":\"4b597fd1-b455-4b2f-8d13-591aa894efa8\",\"reporting_origin\":\"https://attribution.ads.uol.com.br\",\"scheduled_report_time\":\"1718829826\",\"source_registration_time\":\"0\",\"version\":\"0.1\"}",
"source_debug_key": "647775351539539",
"trigger_debug_key": "647775351539539"
}
Thanks in advance
The response I get from the getJob API doesn't include debug_privacy_epsilon
as a double
but a string
.
e.g.
{
...
"job_parameters": {
"debug_privacy_epsilon": "64.0",
...
}
...
}
The API specifications in https://github.com/privacysandbox/aggregation-service/blob/main/docs/api.md state that we should expect a double
value. It would be helpful if either the specifications or the API response is changed to match the other.
Hi, I was testing the Aggregation Service deployed in AWS (version 2.5.0). I met an error shows:
"result_info": {
"return_code": "REPORTS_WITH_ERRORS_EXCEEDED_THRESHOLD",
"return_message": "Aggregation job failed early because the number of reports excluded from aggregation exceeded threshold.",
"error_summary": {
"error_counts": [
{
"category": "SERVICE_ERROR",
"count": 1,
"description": "Internal error occurred during operation."
},
{
"category": "NUM_REPORTS_WITH_ERRORS",
"count": 1,
"description": "Total number of reports that had an error. These reports were not considered in aggregation. See additional error messages for details on specific reasons."
}
],
"error_messages": []
},
"finished_at": "2024-07-01T22:02:32.089351102Z"
},
I was testing only 1 reports in the batch, tried local-testing with the cleartext version, which works fine.
Without the additional information, I wasn't able to identify the problem.
By any chance I am not the only one who met this problem? What else should I check? Thanks
Hello team, how are you?
Guys, we managed to implement the aggregationService through Terraform AWS, I made my Job request but it always has the status of RECEIVED (more than a day ago), I wanted to see if you had any tips that would help to understand what is happening and why I don't at least have an ERROR status
*I don't know if it could be related but I only managed to hit the API by removing authentication from the paths (It was the solution found by our infrastructure team)
*I saw that there was a similar issue with the GCP environment but initially I couldn't make a connection with what we have in AWS
#53
Thanks a lot
Hi, I work in the Google Ad Traffic Quality Team. I am using the local aggregation service tool to simulate noise on locally generated aggregatable reports. However, due to the contribution budget limits, I am unable to create multiple aggregatable reports that will correctly represent my data. What is the best way for me to test this locally, can I manually create an aggregatable report with very high values (corresponding to a raw summary report) for testing?
I am trying to follow the instructions to build the AMI because I want it in a different region than us-east-1.
But when I run
aws codebuild start-build --project-name aggregation-service-artifacts-build --region us-west-2
I get this error:
Build 'amazon-ebs.sample-ami' errored after 936 milliseconds 511 microseconds: VPCIdNotSpecified: No default VPC for this user
status code: 400, request id: fffa8013-121f-4855-a665-70e36030a4e7x
Questions
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.