Git Product home page Git Product logo

Comments (1)

CGossec avatar CGossec commented on May 19, 2024

At Criteo, we are using the aggregation service when testing the end-to-end pipeline of ARA reports. We have been using the Aggregation Service for months, and have faced several issues when trying to run aggregation jobs. While the setup documentation is really clear, it turns out that most of our efforts w.r.t the aggregation service were spent not deploying or maintaining it, but in debugging it. Here we give some ideas of features that we think would greatly enhance our visibility when debugging aggregation jobs, as well as insight on information we think should be part of the aggregation service documentation.

1. More details on PRIVACY_BUDGET_EXHAUSTED errors

Root causes for aggregation jobs failing to execute are currently very obscure, and it’s hard to know where the error lies.

This is specifically the case for PRIVACY_BUDGET_EXHAUSTED errors. It would be a lot easier for us to locate and fix errors if an aggregation service failure could give information on either:

The report(s) causing the error, or at least the sharedID's information (or sharedIDs' information) related to the issue

The jobId of the aggregations that were related to the error, be it the aggregation that failed, but also any other, previous aggregation, that could have consumed the privacy budget for the faulty sharedIDs

2. Additional documentation on the AWS internal architecture

To simplify the understanding of the AS structure in AWS, it would be helpful to have a document explaining the various components of the aggregation service (job queue on SQS, job status table in DynamoDB, workers on EC2, access through API Gateway, etc.). Knowing what type of information is exposed via AWS tools, its format, and where to look for it would all be useful.

Additionally, once changes are made to the AS running online by the adtechs, a new deployment using Google’s cloned repositories will probably override the specific settings reached at that point (although we haven’t done this ourselves). It would be interesting to add more options when filling in the <filename>.auto.tfvar files for the setup to be more reproducible.

3. Additional information on optimization of the AS within and without the AWS infrastructure

The sizing guidance provides useful guidelines for choosing EC2 instance types depending on batch sizes. However in our tests we observed that splitting the aggregation load into thousands of small batches (which is necessary to batch the data per client) leads to long end-to-end execution times, at least if done in a naive way, even if the processing times for individual batches are short. In order to facilitate the tuning of this process for AdTechs it would be useful to have:

  • A description of how the processing is parallelized within aggregation service (across a single or different EC2 instances)
  • Any recommendation on sending parallel batch processing requests (e.g. how many batches of a certain size can be processed simultaneously by a EC2 instance of a given type).
  • Sizing recommendations for AWS components other than EC2, notably DynamoDB.

from trusted-execution-aggregation-service.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.