Git Product home page Git Product logo

privacy-sandbox-aggregation-service's Introduction

Multi-Browser Aggregation Service Prototype

The code in here is related to a prototype for aggregating data across browsers with privacy protection. The mechanism is explained here. Note that the current MPC protocol has some known flaws. This prototype is a proof of concept, and is not the final design.

How to build

To build the code, you need to install Bazel first, and there are detailed instructions in the Go files for running the binaries. You can follow our Terraform setup to setup an environment.

Main pipelines

The following pipelines are implemented based on the IDPF (Incremental Distributed Point Functions) and Apache Beam. As instructed in the Go files, you can run the pipelines locally, or use other runner engines such as Google Cloud Dataflow. For the latter, you need to have a Google Cloud project first.

There are three main pipelines for the DPF protocol:

  1. pipeline/dpf_aggregate_partial_report_pipeline expands the DPF keys to histograms and combines the histograms to get partial aggregation results.

  2. pipeline/dpf_aggregate_reach_partial_report_pipeline expands the DPF keys to histograms of tuples and combines the histograms.

  3. tools/dpf_merge_partial_aggregation shows an example of how the report origins can obtain the complete aggregation result from the DPF partial results.

Services

  1. service/collector_server receives the encrypted partial reports sent by the browsers, and batches them according to the specified helper servers.

  2. service/aggregator_server hosts two services: a. providing the shared helper information, including the location where the other helper can find the intermediate results for inter-helper communication; and b. processing the aggregation request passed by PubSub messages.

  3. service/browser_simulator simulates the process how the browser creates the partial reports and sends them to the collector_server endpoints.

Query models

With the aggregator_server set up, users can query the aggregation results by sending request with binary tools/aggregation_query_tool. There are two modes for the aggregation depending on the configuration passed to the query tool.

Hierarchical query model

The aggregation is finished in multiple rounds corresponding to different hierarchies. For each hierarchy, the partial reports are aggregated to the prefixes with a certain length of the original bucket IDs. After each round, two helpers exchange and merge the noised hierarchical results so they can figure out the prefixes to be further expanded in the next-level hierarchy. Users need to specify the prefix length and the threshold to filter the prefixes with small values for each hierarchy. Example of the configuration(HierarchicalConfig):

{
  prefix_lengths: [5, 10, 20, 25],
  expansion_threshold_per_prefix: [10, 5, 5, 5]
  privacy_budget_per_prefix: [.2, .1, .3, .4]
}

Direct query model

The aggregation is finished in one round. Users need to specify the bucket IDs they want to have in the results returned by the helpers. IDs are not included in the configuration will be ignored, while all the ones in the configuration will have noised results. Example of the configuration(DirectConfig):

{
  bucket_ids: [5, 10, 20, 25],
}

Contributing

Contributions to this repository are always welcome and highly encouraged.

See CONTRIBUTING for more information on how to get started.

License

Apache 2.0 - See LICENSE for more information.

Disclaimer

This is not an officially supported Google product.

privacy-sandbox-aggregation-service's People

Contributors

haroonmoh avatar hostirosti avatar privacy-sandbox-aggregation-bot avatar taoliaoleo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

privacy-sandbox-aggregation-service's Issues

Level-first ordering for the sub-jobs

Got:
Subjob items with name:
<aggregator>-<level>

Expected:
<level>-<aggregator>, since higher level subjobs can only start when all lower level subjobs finish.

Searching empty list should list all jobs

Got:

  1. Search a job, e.g. "66662e23-a0cd-4856-866f-2f4abe1f34ca" -> The only job will be listed
  2. Delete the job ID -> Still the previous searching result

Expected: The page should list all the jobs. Currently there's no way to get back to the full list of jobs after searching.

Consider moving job ID search as part of filter

Got:
On the "filter" panel, users can filter jobs with status, created time and last updated time.
Searching by job ID is a separate function.

Expect:
Searching by ID sounds like another type of filter especially when we support "searching multiple jobs". Consider moving the search function as part of filter, so the user knows what they need to use for search.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.