Git Product home page Git Product logo

trusted-execution-aggregation-service's Introduction

Aggregation Service for Measurement

Aggregation Service provides server-side mechanism for adtechs to create summary reports. A summary report is the result of noisy aggregation applied to a batch of aggregatable reports. Summary reports allow for greater flexibility and a richer data model than event-level reporting, particularly for some use-cases like conversion values. Aggregation Service operates on aggregatable reports that are encrypted reports sent from individual user devices which contain data about individual events in the client software, such as a Chrome browser or an Android device.

Build and deploy

This repository contains instructions and scripts to set up and test the Aggregation Service for Aggregatable Reports locally and on Cloud (AWS and GCP). If you want to learn more about the Privacy Sandbox Aggregation Service for the Attribution Reporting API and other use cases, read the Aggregation Service proposal.

Key documents

Explainers

Operating documentation

Support

You can reach out to us for support through creating issues on this repository or sending us an email at aggregation-service-support<at>google.com. This address is monitored and only visible to selected support staff.

License

Apache 2.0 - See LICENSE for more information.

FAQ

Where should I post feedback/questions, this repo or the API repo?

For feedback/questions encountered during using this aggregation service implementation, please use the support channels provided by this repo. For feedback/requests related to the APIs in general, please initiate discussions in the respective API repo eg. Attribution Reporting API repo.

trusted-execution-aggregation-service's People

Contributors

erintwalsh avatar ghanekaromkar avatar hostirosti avatar peiwenhu avatar ruclohani avatar zyw229 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trusted-execution-aggregation-service's Issues

aggregation-service-artifacts-build CodeBuild built the AMI at the wrong region

Running the step "Building artifacts" from https://github.com/privacysandbox/aggregation-service/blob/main/build-scripts/aws/README.md#building-artifacts
To build the artifacts on region: eu-west-1

The CodeBuild failedwith the below error:

836 | --> amazon-ebs.sample-ami: AMIs were created:
837 | us-east-1: ami-069b14bccedc04571
....
[Container] 2023/05/09 15:34:31 Running command bash build-scripts/aws/set_ami_to_public.sh set_ami_to_public_by_prefix aggregation-service-enclave_$(cat VERSION) $AWS_DEFAULT_REGION $AWS_ACCOUNT_ID
841 |  
842 | An error occurred (InvalidAMIID.Malformed) when calling the ModifyImageAttribute operation: Invalid id: "" (expecting "ami-...")
843 |  
844 | An error occurred (MissingParameter) when calling the ModifySnapshotAttribute operation: Value () for parameter snapshotId is invalid. Parameter may not be null or empty.
845

The reason is that it created the ami on us-east-1 instead of eu-west-1

Aggregation service, ARA browser retries and duplicate reports

The way the browser and adtech's servers interact over the network makes it inherently unavoidable that some reports will be received by the adtech but not considered as such by the browser (e.g. when a timeout happens) and hence retried and received several times by the adtech; as is mentioned in your documentation:

The browser is free to utilize techniques like retries to minimize data loss.

Sometimes, these duplicate reports reach upwards of hundreds of reports each day, for several days (sometimes several months) in a row, all having the same report_id.
The aggregation service runs the no-duplicates rule basing itself on a combination of information:

Instead, each aggregatable report will be assigned a shared ID. This ID is generated from the combined data points: API version, reporting origin, destination site, source registration time and scheduled report time. These data points come from the report's shared_info field.
The aggregation service will enforce that all aggregatable reports with the same ID must be included in the same batch. Conversely, if more than one batch is submitted with the same ID, only one batch will be accepted for aggregation and the others will be rejected.

As an adtech company, when trying to provide timely reporting to clients, it is paramount to try and use all of the available information (in this case, reports) in order to have our reporting be as precise as possible.
In this scenario, however, if we try to batch together all of our reports for a chosen client on a chosen day, even by deduplicating all of the chosen day's reports through the report_id (or the overall shared_info) field, we may have a batch accepted on day 1, and then all subsequent batches for the next month be rejected because they all contain that same shared_info-based id.
This means that we have to check further back in the data for possible duplicate reports. To be able to implement this check in an efficient manner we would benefit from a more precise description of the retry policy, namely for how long the retries can happen.

I guess the questions this issue raises are as follows:

  • In what scenarii does a browser go for the aforementioned retries?
  • Is there a time limit for those retries (i.e. a date after the original report when the browser no longer retries sending a report)?
  • If there is not, could you please advise on a way for adtech companies to efficiently filter out duplicate reports without having to process all of their available reports for duplicate shared info values?
  • Also, the described problem of "duplicate retried" reports, but not only, makes us believe that adtechs would benefit from a modification to the way the AS handles duplicates. Indeed if the AS gracefully dropped the duplicates from the aggregation instead of failing the batch altogether, we wouldn't necessarily need to filter out such reports from a batch. Could this possibility be considered on your side?

Clarifications on aggregation service batches + enhanced debugging possibilities

Hello aggregation service team,

We (Criteo) would like to seek clarification on a couple of points to ensure we have a comprehensive understanding of certain features.
Your insights will greatly assist us in optimizing our utilization of the platform:

  1. Batch Size Limit (30k reports):
    Could you kindly provide more details about the batch size limit of 30,000?
    We are a little unsure as to how this limit behaves: it is our understanding that the aggregation service will expect loads of up to tens (even hundreds) of thousands of reports. However when we provide it with batches of 50k+ reports, our aggregations fail.
    Is the limit of 30k a limit that is to be enforced per avro file within the batch? Per batch overall?
    If it is per overall batch, is there any kind of suggestion on your side to aggregate batches of more than 30k reports?
    If we need to split these larger aggregations over several smaller requests, that will greatly increase the noise levels we see in our final results, and would work against the idea of the aggregation service, which encourages adtechs to aggregate as many reports as possible to increase privacy.
    Understanding the specifics of this limit should greatly help us in tailoring our processes more effectively.

  2. Debug Information on Privacy Budget Exhaustion:
    We've been considering ways to enhance our debugging capabilities, especially in situations where the privacy budget is exhausted. Would it be possible to obtain more detailed debug information in such cases, specifically regarding the occurrence of duplicates? We believe that having for instance the report_ids of the duplicates wouldn't compromise privacy, and would significantly contribute to our troubleshooting efforts.

Why must we supply a github_personal_access_token when building the AMI?

In the instructions for building the AMI (Building aggregation service artifacts), part of the instructions is to put a github_personal_access_token in codebuild.auto.tfvars.

Can you provide more information on this token?

  1. What scopes are required? All of them?
  2. Why do we need to supply a GitHub Personal Access Token? Is it to read something from GtHub?
  3. I feel uncomfortable putting this sensitive token in AWS where anyone in my company can access it.

Could you provide encrypted sample report for testing?

I have the aggregation service set up, but our system to produce encrypted reports is not ready to go yet. This repo's sampledata directory has a sample report, but it is unencrypted and so only works with the local testing tool only, not with AWS Nitro Enclaves.

Could you provide, either in the repo or in a zip file in this thread, an encrypted sample output.avro and accompanying domain.avrp that we can use to test our AWS aggregation service to make sure everything is running properly?

A Cloud Migration Tool for Aggregation Service: Feedback Requested

Hi all!

The Aggregation service team is currently exploring options for adtechs who may want to migrate from one cloud provider to another. This gives adtechs flexibility in using a cloud provider of their choice to optimize for cost or other business needs. Our proposed migration solution would enable adtechs to re-encrypt their reports from a source cloud provider (let’s call this Cloud A) to a destination cloud provider (let’s call this Cloud B) and enable them to use Cloud B to process reports originally encrypted for Cloud A as part of the migration. After migration is completed, use of Cloud A for processing reports will be disabled and the adtech will only be able to use Cloud B to process their reports.

In the short-term, this solution will support migration of aggregation service jobs from AWS to GCP and vice versa. As we support more cloud options in the future, this solution would be extensible to moving from any supported cloud provider to another.

Depiction of the re-encryption flow:

image

For any adtechs considering a migration, we encourage completing this migration before third-party cookie deprecation to take advantage of feature benefits such as:

  • Apples to apples comparison using additional budget: We will allow adtechs to process the same report on both Cloud A and Cloud B during migration.
  • Flexible migration windows: We will not enforce a timeline by which adtechs need to complete migration.

After third-party cookie deprecation, we plan to continue to support cloud migration with the re-encryption feature, but may not be able to give the additional benefits outlined above to preserve privacy.

We welcome any feedback on this proposal.

Thank you!

Configure CodeBuild Setup - getting error

When running the terraform code on step https://github.com/privacysandbox/aggregation-service/blob/main/build-scripts/aws/README.md#configure-codebuild-setup
I got the following error:

│ Error: error creating S3 bucket ACL for aggregation-service-artifacts: AccessControlListNotSupported: The bucket does not allow ACLs

To resolve this error i had to add to: build-scripts/aws/terraform/codebuild.tf
The following resource:

resource "aws_s3_bucket_ownership_controls" "artifacts_output_ownership_controls" {
  bucket = aws_s3_bucket.artifacts_output.id

  rule {
    object_ownership = "BucketOwnerEnforced"
  }
}

Unable to copy AMI image to my AWS account

The documentation states that:

Note: The prebuilt Amazon Machine Image (AMI) for the aggregation service is only available in the us-east-1 region. If you like to deploy the aggregation service in a different region you need to copy the released AMI to your account or build it using our provided scripts.

Ref. https://github.com/privacysandbox/aggregation-service#download-terraform-scripts-and-prebuilt-dependencies

When I try to copy the AMI to my account I'm getting the following error:

Failed to copy ami-036942f537f7a7c2b
You do not have permission to access the storage of this ami

Can you give me some guidance or tell me if it's a configuration error?

Context:

image

Private Aggregation API - no metrics in summarised report

Hello,

I'm currently experimenting with the Private Aggregation API and I'm struggling to validate that my final output is correct

From my worklet, I perform the following histogram contribution:

privateAggregation.contributeToHistogram({ bucket: BigInt(1369), value: 128 });

Which is correctly triggering a POST request with the following body:

 {
  aggregation_service_payloads: [
    {
      debug_cleartext_payload: 'omRkYXRhgaJldmFsdWVEAAAAgGZidWNrZXRQAAAAAAAAAAAAAAAAAAAFWWlvcGVyYXRpb25paGlzdG9ncmFt',
      key_id: 'bca09245-2ef0-4fdf-a4fa-226306fc2a09',
      payload: 'RVd7QRTTUmPp0i1zBev+4W8lJK8gLIIod6LUjPkfbxCOHsQLBW/jRn642YZ2HYpYkiMK9+PprU5CUi9W7TwJToQ4UXiUbJUgYwliqBFC+aAcwsKJ3Hg46joHZXV5E0ZheeFTqqvLtiJxlVpzFcWd'
    }
  ],
  debug_key: '777',
  shared_info: '{"api":"shared-storage","debug_mode":"enabled","report_id":"aaa889f1-2adc-4796-9e46-c652a08e18ca","reporting_origin":"http://adtech.localhost:3000","scheduled_report_time":"1698074105","version":"0.1"}'
}

I've setup a small node.js server handling requests on /.well-known/private-aggregation/debug/report-shared-storage basically doing this:

  const encoder = avro.createFileEncoder(
    `${REPORT_UPLOAD_PATH}/debug}/aggregation_report_${Date.now()}.avro`,
    reportType
  );

  reportContent.aggregation_service_payloads.forEach((payload) => {
    console.log(
    "Decoded data from debug_cleartext_payload:",
    readDataFromCleartextPayload(payload.debug_cleartext_payload)
    );

    encoder.write({
      payload: convertPayloadToBytes(payload.debug_cleartext_payload),
      key_id: payload.key_id,
      shared_info: reportContent.shared_info,
    });
  });

  encoder.end();

As you can see at this point I'm printing the decoded data on console and I can see as expected:
Decoded data from debug_cleartext_payload: { value: 128, bucket: 1369 }

However, now I'm trying to generate a summary report with the local test tool by running the following command:

java -jar LocalTestingTool_2.0.0.jar --input_data_avro_file aggregation_report_1698071597075.avro --domain_avro_file output_domain.avro --no_noising --json_output --output_directory ./results

No matther what value I've passed as payload of the contributeToHistogram method, I always got 0 on the metric field:

[ {
  "bucket" : "MTM2OQ==", // 1369 base64 encoded
  "metric" : 0
} ]

Am I doing something wrong ?

Apart of this issue, I wonder how it would work in real life application, currently this example is handling one report at a time which is sent instantly because of being in debug_mode, but in real situation, how are we supposed to process a big amount of reports at once ? Can we pass a list of files to the --input_data_avro_file ? Should we batch the reports prior to converting it to avro based on the shared_info data? If yes, based on which field?

Thank you by advance !

Ability to query key aggregate

Hello,

One interesting evolution of the aggregation service would be to enable querying aggregate of keys. I think this was mentioned in the aggregate attribution API at a time when the aggregation was supposed to be performed by MPC rather than TEEs.
In other words, I would love to be able to query a bit mask (eg for a 8 bit key, 01100*01 would be 01100101 and 01100001).
This would enable a greater flexibility for decoding (ie chosing which encoded variables to get depending on the number of reports), and negate the need to adapt the encoding depending on the expected traffic to the destination website.
Thanks!
P.S. I can cross-post on https://github.com/WICG/attribution-reporting-api if needed.

Decode the final `output.json` from the `LocalTestingTool`

Hi,

I have managed to get the full flow running to aggregate debug reports in the browser and process them locally with the provided tool.

The final file output I have is:

[{"bucket": "d0ZHnRzgTJMAAAAAAAAAAA==", "metric": 195000}]

Which looks correct in terms of there should be a single key and the metric value is correct.

The issue I have is now decoding this bucket to get my original input data, I assumed the steps would be:

  • base64 decode
  • CBOR decode

But this causes the following error:

_cbor2.CBORDecodeEOF: premature end of stream (expected to read 23 bytes, got 15 instead)

Would really appreciate any help on how to get the input data back out of this bucket.

Best,
D

Error building AMI: VPCIdNotSpecified: No default VPC for this user

I am trying to follow the instructions to build the AMI because I want it in a different region than us-east-1.

But when I run

aws codebuild start-build --project-name aggregation-service-artifacts-build --region us-west-2

I get this error:

Build 'amazon-ebs.sample-ami' errored after 936 milliseconds 511 microseconds: VPCIdNotSpecified: No default VPC for this user
status code: 400, request id: fffa8013-121f-4855-a665-70e36030a4e7x

Questions

  1. Is having a default VPC a prerequisite for this build to work?
  2. Is it possible to get the build to work using another VPC, or is using the defaut VPC the recommended way?

How to copy AMI to another region?

In the AWS instructions, there are two options for using the AMI in a region other than us-east-1:

If you like to deploy the aggregation service in a different region you need to copy the released AMI to your account or build it using our provided scripts.

I have been having a lot of trouble building the AMI using the provided scripts, so I would like to try simply copying the AMI (the first option), but I don't see instructions for this. What is the AMI name and where do I get it from? Do I need to change any parameters to point to the new region? What step should I move on to after copying the AMI?

Could you add instructions for copying the AMI and subsequent steps?

Consider migration from origin to site enrollment for Aggregation service: Feedback requested

Hi all!

We are currently exploring migration from origin enrollment to site enrollment for the Aggregation Service (current form using origin here) for the following reasons:

  • Consistency with the client API enrollment that uses site
  • Ease of onboarding for adtechs, so they don't have to enroll each origin individually

As a follow up to this proposal, we would like to support multiple origins in a batch of aggregatable reports. Do adtechs have a preference or blocking concern with either specifying a list of origins or the site in the createJob request?

Aggregation Service: AWS worker build issue and workaround

Hi Aggregation Service testers,

We have discovered an issue that broke the AWS worker build, caused by an incompatible Docker engine version upgrade. We are planning to release a new patch next week. Meanwhile, if you encounter issues building AWS worker, you can use the following workaround:

  • Create a new patch at <repo_root>/build_defs/shared_libraries/pin_pkr_docker.patch with the following content:
diff --git a/operator/worker/aws/setup_enclave.sh b/operator/worker/aws/setup_enclave.sh
index e4bd30371..8bf2e0fb1 100644
--- a/operator/worker/aws/setup_enclave.sh
+++ b/operator/worker/aws/setup_enclave.sh
@@ -19,7 +19,7 @@ sudo yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/late
 #
 # Builds enclave image inside the /home/ec2-user directory as part of automatic
 # AMI generation.
-sudo yum install docker -y
+sudo yum install docker-24.0.5-1.amzn2023.0.3 -y
 sudo systemctl enable docker
 sudo systemctl start docker
 
  • Add the new patch to list of patches under shared_libraries rules in the WORKSPACE file. The shared_libraries rule should now become:
git_repository(
    name = "shared_libraries",
    patch_args = [
        "-p1",
    ],
    remote = "https://github.com/privacysandbox/coordinator-services-and-shared-libraries",
    patches = [
        "//build_defs/shared_libraries:coordinator.patch",
        "//build_defs/shared_libraries:gcs_storage_client.patch",
        "//build_defs/shared_libraries:dependency_update.patch",
        "//build_defs/shared_libraries:key_cache_ttl.patch",
        "//build_defs/shared_libraries:pin_pkr_docker.patch",
    ],
    tag = COORDINATOR_VERSION,
    workspace_file = "@shared_libraries_workspace//file",
)

Thank you!

Unable to test noise addition using the local aggregation service tool

Hi, I work in the Google Ad Traffic Quality Team. I am using the local aggregation service tool to simulate noise on locally generated aggregatable reports. However, due to the contribution budget limits, I am unable to create multiple aggregatable reports that will correctly represent my data. What is the best way for me to test this locally, can I manually create an aggregatable report with very high values (corresponding to a raw summary report) for testing?

Support for aggregation over a set of keys

Hello,

Currently, the aggregation service does a sum of the values on the set of keys which is declared in the output domain files. This explicit declaration of keys mean that the encoding must be well-done at report creation time (eg on the source and trigger side for ARA or in Shared Storage for Private Aggregation API). This is quite inflexible in its use.

To bring in some flexibility, I propose to add a system to the aggregation service where a predeclared set of keys would be summed by the aggregation service. This set of keys would constitute a partition of the key space for the service not to violate the DP limit. A simple check done by the aggregation service could reject the query if a key is in two sets.

Here is what the output domain file would look like. I am not sure "super bucket" is a great name, but this is the only I could think of right now.

Super bucket Bucket
0x123 0x456
0x123 0x789
0x124 0xaef
0x125 0x12e

The aggregation service would provide the output only on the "super buckets".

The operational benefits of this added flexibility would be huge. Currently, one has to decide on an encoding before knowing what one can measure. For ARA or PAA for Fledge, this means having a very good idea before hand of the size and the performance of the campaign. When the campaign is running, then adjustment have to be made if the volume estimate was not good (or if the settings of the campaign are changed). Encoding change can be difficult to track, especially in ARA where sources and triggers both contribute to the keys, but at different point in time. This proposal allows to have a fixed encoding, and adjust after the fact (using the volume of reports as a proxy) the encoding actually used.

feedback on deploying aggregation service

I got an api gateway error The API with ID my-api-id doesn’t include a route with path /* having an integration arn:aws:lambda:us-east-1:my-aws-account-id:function:stg-create-job. on aws console, after deploying aggregation service using terraform.

I changed Source ARN of lambda's permisson from arn:aws:execute-api:us-east-1:my-aws-account-id:my-api-id/*/** to arn:aws:execute-api:us-east-1:my-aws-account-id:my-api-id/*/*/v1alpha/getJob, and it solved the error.
https://github.com/privacysandbox/control-plane-shared-libraries/blob/9efe5591acc18e46263399d9785432a146d9675c/operator/terraform/aws/modules/frontend/api_gateway.tf#L62

aggregation-service-artifacts-build CodeBuild build AMI fails on yum lock

Running the step "Building artifacts" from https://github.com/privacysandbox/aggregation-service/blob/main/build-scripts/aws/README.md#building-artifacts
To build the artifacts on region: eu-west-1

The CodeBuild failed with the below error:

amazon-ebs.sample-ami: Loaded plugins: extras_suggestions, langpacks, priorities, update-motd

754 | ==> amazon-ebs.sample-ami: Existing lock /var/run/yum.pid: another copy is running as pid 3465.
755 | ==> amazon-ebs.sample-ami: Another app is currently holding the yum lock; waiting for it to exit...

Aggregation service deployment using user provided vpc fails

When specifying "enable_user_provided_vpc = true", creation of the environment following the instructions at https://github.com/privacysandbox/aggregation-service/tree/main#set-up-your-deployment-environment
fails with error:
Out of index vpc[0], 182: dynamodb_vpc_endpoint_id = module.vpc[0].dynamodb_vpc_endpoint_id

At file: terraform/aws/applications/operator-service/main.tf
Lines 182 & 183 refers to module.vpc[0]
While module.vpc is not set when "enable_user_provided_vpc = true"
module "vpc" {
count = var.enable_user_provided_vpc ? 0 : 1

Error using LocalTestingTool_2.0.0.jar with sampledata

I am trying to follow the instructions in Testing locally using Local Testing Tool but when I run the following command with the sampledata:

java -jar LocalTestingTool_2.0.0.jar \
--input_data_avro_file sampledata/output_debug_reports.avro \
--domain_avro_file sampledata/output_domain.avro \
--output_directory .

I get the error below:

2023-10-31 12:21:57:506 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.WorkerPullWorkService - Aggregation worker started
2023-10-31 12:21:57:545 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.WorkerPullWorkService - Item pulled
2023-10-31 12:21:57:555 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor - Reports shards detected by blob storage client: [output_debug_reports.avro]
2023-10-31 12:21:57:566 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor - Reports shards to be used: [DataLocation{blobStoreDataLocation=BlobStoreDataLocation{bucket=/Users/jonaquino/projects/aggregation-service/sampledata, key=output_debug_reports.avro}}]
2023-10-31 12:21:57:566 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.domain.OutputDomainProcessor - Output domain shards detected by blob storage client: [output_domain.avro]
2023-10-31 12:21:57:567 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.domain.OutputDomainProcessor - Output domain shards to be used: [DataLocation{blobStoreDataLocation=BlobStoreDataLocation{bucket=/Users/jonaquino/projects/aggregation-service/sampledata, key=output_domain.avro}}]
2023-10-31 12:21:57:575 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor - Job parameters didn't have a report error threshold configured. Taking the default percentage value 10.000000
return_code: "REPORTS_WITH_ERRORS_EXCEEDED_THRESHOLD"
return_message: "Aggregation job failed early because the number of reports excluded from aggregation exceeded threshold."
error_summary {
  error_counts {
    category: "REQUIRED_SHAREDINFO_FIELD_INVALID"
    count: 1
    description: "One or more required SharedInfo fields are empty or invalid."
  }
  error_counts {
    category: "NUM_REPORTS_WITH_ERRORS"
    count: 1
    description: "Total number of reports that had an error. These reports were not considered in aggregation. See additional error messages for details on specific reasons."
  }
}
finished_at {
  seconds: 1698780117
  nanos: 679576000
}

CustomMetric{nameSpace=scp/worker, name=WorkerJobCompletion, value=1.0, unit=Count, labels={Type=Success}}
2023-10-31 12:21:57:732 -0700 [WorkerPullWorkService] INFO com.google.aggregate.adtech.worker.WorkerPullWorkService - No job pulled.

Missing required properties: jobKey

Hello,

While executing /createJob request with following payload
Please see an example below:
{ "job_request_id": "Job-1010", "input_data_blob_prefix": "reports/inputs/input.avro", "input_data_bucket_name": "test-android-sandbox", "output_data_blob_prefix": "reports/output/result_1.avro", "output_data_bucket_name": "test-android-sandbox", "job_parameters": { "output_domain_blob_prefix": "reports/domains/domain.avro", "output_domain_bucket_name": "test-android-sandbox", "debug_privacy_epsilon": 30 } }
The response of this request will be 202

When executing /getJob?job_request_id=Job-1010

{ "job_status": "IN_PROGRESS", "request_received_at": "2023-06-12T15:14:17.891601Z", "request_updated_at": "2023-06-12T15:14:23.222830Z", "job_request_id": "Job-1010", "input_data_blob_prefix": "reports/inputs/input.avro", "input_data_bucket_name": "test-android-sandbox", "output_data_blob_prefix": "reports/output/result_1.avro", "output_data_bucket_name": "test-android-sandbox", "postback_url": "", "result_info": { "return_code": "", "return_message": "", "error_summary": { "error_counts": [], "error_messages": [ "Missing required properties: jobKey" ] }, "finished_at": "1970-01-01T00:00:00Z" }, "job_parameters": { "debug_privacy_epsilon": "30", "output_domain_bucket_name": "test-android-sandbox", "output_domain_blob_prefix": "reports/domains/domain.avro" }, "request_processing_started_at": "2023-06-12T15:14:23.133071Z" }

The error is Missing required properties: jobKey
The job stays in status IN_PROGRESS

When running same /createJob request without the job_request_id property -
the response from /createJob will be:

{ "code": 3, "message": "Missing required properties: jobRequestId\r\n in: {\n \"input_data_blob_prefix\": \"reports/inputs/input.avro\",\n \"input_data_bucket_name\": \"test-android-sandbox\",\n \"output_data_blob_prefix\": \"reports/output/result_1.avro\",\n \"output_data_bucket_name\": \"test-android-sandbox\",\n \"job_parameters\": {\n \"output_domain_blob_prefix\": \"reports/domains/domain.avro\",\n \"output_domain_bucket_name\": \"test-android-sandbox\"\n }\n}", "details": [ { "reason": "JSON_ERROR", "domain": "", "metadata": {} } ] }

Discussion - debugging support for summary reports

We are working on adding the possibility to generate debug summary reports from encrypted aggregatable reports with the AWS based aggregation service. This capability will be time-limited and be phased out at a later time.

We would like to hear from you what capabilities you'd like to see in these debug summary reports.

Some ideas we are considering:

  • return unnoised metric and noise that would have been applied to the metric with the given epsilon
  • return metrics that have not been listed in the output domain with an annotation hinting to the omission

Questions:

  • What helps you to understand the inner workings of the system better and helps you to develop experiments using the attribution reporting APIs?
  • How are you currently generating your output domains and what tools would help you to simplify this process?

Aggregation Service with Private Aggregation API

When an aggregatable report is created by sendHistogramReport() (i.e. called inside reportWin function) it contains shared info without attribution_destination nor source_registration_time. This seems to be logical as these keys are strictly related with attribution logic. Example:

"shared_info": "{\"api\":\"fledge\",\"debug_mode\":\"enabled\",\"report_id\":\"9ae1a0d0-8cf5-4951-b752-e932bf0f7705\",\"reporting_origin\":\"https://fledge-eu.creativecdn.com\",\"scheduled_report_time\":\"1668771714\",\"version\":\"0.1\"}"

More readable form:

{
 "api": "fledge",
 "debug_mode": "enabled",
 "report_id": "9ae1a0d0-8cf5-4951-b752-e932bf0f7705",
 "reporting_origin": "https://fledge-eu.creativecdn.com",
 "scheduled_report_time": "1668771714",
 "version": "0.1"
}

(note: version 0.1, values for: privacy_budget_key, attribution_destination, source_registration_time are missing)

In the same time Aggregation service expect to have both attribution_destination and source_registration_time for shared info.version==0.1 (since aggregation service version 0.4):
see SharedInfo.getPrivacyBudgetKey()

Tested on chrome:

  • 108.0.5359.48
  • 110.0.5433.0
  • 107.0.5304.110
    And local aggregation service 0.4

The following exception were printed:
CustomMetric{nameSpace=scp/worker, name=WorkerJobError, value=1.0, unit=Count, labels={Type=JobHandlingError}}
2022-11-22 09:10:54:120 +0100 [WorkerPullWorkService] ERROR com.google.aggregate.adtech.worker.WorkerPullWorkService - Exception occurred in worker
com.google.aggregate.adtech.worker.JobProcessor$AggregationJobProcessException: java.util.concurrent.ExecutionException: java.util.NoSuchElementException: No value present
at com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor.process(ConcurrentAggregationProcessor.java:400)
at com.google.aggregate.adtech.worker.WorkerPullWorkService.run(WorkerPullWorkService.java:145)
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$2.run(AbstractExecutionThreadService.java:67)
at com.google.common.util.concurrent.Callables.lambda$threadRenaming$3(Callables.java:103)
at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: java.util.concurrent.ExecutionException: java.util.NoSuchElementException: No value present
at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:588)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:567)
at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:113)
at com.google.aggregate.adtech.worker.aggregation.concurrent.ConcurrentAggregationProcessor.process(ConcurrentAggregationProcessor.java:295)
... 4 more
Caused by: java.util.NoSuchElementException: No value present
at java.base/java.util.Optional.get(Optional.java:143)
at com.google.aggregate.adtech.worker.model.SharedInfo.getPrivacyBudgetKey(SharedInfo.java:161)
at com.google.aggregate.adtech.worker.aggregation.engine.AggregationEngine.accept(AggregationEngine.java:88)
at com.google.aggregate.adtech.worker.aggregation.engine.AggregationEngine.accept(AggregationEngine.java:49)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.