Git Product home page Git Product logo

multi-analyst-dp's Introduction

Budget Sharing for Multi-Analyst Differential Privacy

The arxiv version of this paper [1] is at https://arxiv.org/abs/2011.01192.

The codes in the paper are in the master branch. Please only use the master branch. Other branches are used for developments beyond this paper.

All codes are in Python 3. (We use 3.6.9 for experiments and 3.7.7 Jupyter Notebooks.)

The core codes are in the src directory, including the hdmm package from the HDMM paper [2], which is implemented based on the Ektelo package [3]. All the codes are implemented based on the hdmm package.

The codes for calculating strategies and errors are in src/fair_experiments.py.

The codes for water-filling algorithm is in src/bucket_filling.py and the function we use in our paper is query_sd.

For the variables, n is the size of data, k is the number of analysts, eps is the total privacy budget, rep is the number of iterations for optimizing the strategies. tol is the tolerance in the Water-filling Mechanism.

modes is a list of names of the mechanisms used in the experiment. They can be chosen from 'ind' (Independent), 'iden' (Identity), 'uni' (Utilitarian), 'fsum' (Weighted Utilitarian), 'buc_qsd' (Water-filling).

Ws is the workload set, Wr is the workload set to be evaluated (may be same as or different from Ws). As and Ar are the strategy matrices for Ws and Wr, respectively. Ws, Wr, As, and Ar are all 1-d numpy arrays of Ektelo matrices.

qr, x, ans, sample are only used for data-dependent queries. qr is the name of queries and can be chosen from 'Linear', 'Median', 'Mode', 'Mean', 'Per_x' (x percentile). None and any other value is the same as 'Linear'. x is the data. ans is the true answer. sample is the number of iterations for calculating the expected errors.

We use eps=1.0, rep=10, sample=10000, tol=1e-3 as the default settings in our experiments. Note that the values may be different from the default values in the Python scripts.

The Python scripts and the Slurm job scripts for the experiments shown in the paper are as the follows:

Practical Settings (Section 5.3): practical_interference/race_vary.py, practical_interference/array_workload.sh

Marginal Workloads (Section 5.4): practical_interference/oneway_marginal_vary.py, practical_interference/array_marginal.sh

Tolerance for Water-filling Mechanism (Section 5.5): practical_tol/oneway_marginal_vary.py, practical_tol/array_marginal.sh

Data-dependent Non-linear Queries (Section 5.6): data_interference/mm_per_vary.py, data_interference/array_workload.sh

The Slurm job scripts show the exact experiment settings. The experiments ran on the Duke Computer Science Cluster.

Reference

[1] Pujol, D., Wu, Y., Fain, B., & Machanavajjhala, A. (2020). Budget Sharing for Multi-Analyst Differential Privacy. arXiv preprint arXiv:2011.01192.

[2] McKenna, R., Miklau, G., Hay, M., & Machanavajjhala, A. (2018). Optimizing error of high-dimensional statistical queries under differential privacy. Proceedings of the VLDB Endowment, 11(10), 1206-1219.

[3] Zhang, D., McKenna, R., Kotsogiannis, I., Hay, M., Machanavajjhala, A., & Miklau, G. (2018, May). Ektelo: A framework for defining differentially-private computations. In Proceedings of the 2018 International Conference on Management of Data (pp. 115-130).

multi-analyst-dp's People

Contributors

yikai-wu avatar

Stargazers

Xin Liu avatar Albert Sun avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.