google / privacy-sandbox-aggregation-service Goto Github PK

License: Apache License 2.0

Starlark 21.00% Go 49.93% C++ 7.61% C 0.87% Dockerfile 0.34% Shell 0.56% HCL 8.21% JavaScript 10.50% CSS 0.97%

privacy-sandbox-aggregation-service's Introduction

Multi-Browser Aggregation Service Prototype

The code in here is related to a prototype for aggregating data across browsers with privacy protection. The mechanism is explained here. Note that the current MPC protocol has some known flaws. This prototype is a proof of concept, and is not the final design.

How to build

To build the code, you need to install Bazel first, and there are detailed instructions in the Go files for running the binaries. You can follow our Terraform setup to setup an environment.

Main pipelines

The following pipelines are implemented based on the IDPF (Incremental Distributed Point Functions) and Apache Beam. As instructed in the Go files, you can run the pipelines locally, or use other runner engines such as Google Cloud Dataflow. For the latter, you need to have a Google Cloud project first.

There are three main pipelines for the DPF protocol:

pipeline/dpf_aggregate_partial_report_pipeline expands the DPF keys to histograms and combines the histograms to get partial aggregation results.
pipeline/dpf_aggregate_reach_partial_report_pipeline expands the DPF keys to histograms of tuples and combines the histograms.
tools/dpf_merge_partial_aggregation shows an example of how the report origins can obtain the complete aggregation result from the DPF partial results.

Services

service/collector_server receives the encrypted partial reports sent by the browsers, and batches them according to the specified helper servers.
service/aggregator_server hosts two services: a. providing the shared helper information, including the location where the other helper can find the intermediate results for inter-helper communication; and b. processing the aggregation request passed by PubSub messages.
service/browser_simulator simulates the process how the browser creates the partial reports and sends them to the collector_server endpoints.

Query models

With the aggregator_server set up, users can query the aggregation results by sending request with binary tools/aggregation_query_tool. There are two modes for the aggregation depending on the configuration passed to the query tool.

Hierarchical query model

The aggregation is finished in multiple rounds corresponding to different hierarchies. For each hierarchy, the partial reports are aggregated to the prefixes with a certain length of the original bucket IDs. After each round, two helpers exchange and merge the noised hierarchical results so they can figure out the prefixes to be further expanded in the next-level hierarchy. Users need to specify the prefix length and the threshold to filter the prefixes with small values for each hierarchy. Example of the configuration(HierarchicalConfig):

{
  prefix_lengths: [5, 10, 20, 25],
  expansion_threshold_per_prefix: [10, 5, 5, 5]
  privacy_budget_per_prefix: [.2, .1, .3, .4]
}

Direct query model

The aggregation is finished in one round. Users need to specify the bucket IDs they want to have in the results returned by the helpers. IDs are not included in the configuration will be ignored, while all the ones in the configuration will have noised results. Example of the configuration(DirectConfig):

{
  bucket_ids: [5, 10, 20, 25],
}

Contributing

Contributions to this repository are always welcome and highly encouraged.

See CONTRIBUTING for more information on how to get started.

License

Apache 2.0 - See LICENSE for more information.

Disclaimer

This is not an officially supported Google product.

privacy-sandbox-aggregation-service's People

Contributors

Stargazers

Watchers

Forkers

isabella232 jetdc ressmann clusteredemotion shigeki le0000000 taoliaoleo haroonmoh hostirosti woops-team houshijie-2020 martin-yeo ghas-results adjust

privacy-sandbox-aggregation-service's Issues

Create new readme to reflect changes in authn/authz

Whenever I finish authentication and authorization, I will come back to this issue

Level-first ordering for the sub-jobs

Got:
Subjob items with name:
<aggregator>-<level>

Expected:
<level>-<aggregator>, since higher level subjobs can only start when all lower level subjobs finish.

Adding a user management page

Signing up with [email protected] gives a permission error

Whenever I sign up with [email protected], I get a permissions error which stops the user from moving to the Jobs page. However, whenever I refresh the page, I then have permission to view the page.

User management profile icon is wrong

Users dropdown tab not showing

Remove the “delete” button for each job

Got:
There is a delete button for each job.

Expected:
We should remove it as the control plane will not delete jobs.

Deleted user cannot get approval again

Steps:

Sign up with an email, e.g. [email protected]
Grant "viewer" permission with the admin account
Account "[email protected]" can see the job list after refresh
Delete "[email protected]" with the admin account
Account "[email protected]" sees the "Waiting for approval" page after refresh

Expect: the admin account can re-approve "[email protected]" from the "pending users"

Got: "[email protected]" is missing from "pending users"

Adding section about how to enforce user management roles

The README currently lacks a section to inform users on how to add enforcement for the different user management roles. This is very important for users who want to later on add features to their control plane

Error handling for signup/login pages

Dashboard with graphs, figures, and tables

Searching empty list should list all jobs

Got:

Search a job, e.g. "66662e23-a0cd-4856-866f-2f4abe1f34ca" -> The only job will be listed
Delete the job ID -> Still the previous searching result

Expected: The page should list all the jobs. Currently there's no way to get back to the full list of jobs after searching.