ssoelvsten / mlightdp Goto Github PK
View Code? Open in Web Editor NEWAn extensive implementation of the LightDP language of Zhang and Kifer used to prove algorithms to be privacy-preserving
License: MIT License
An extensive implementation of the LightDP language of Zhang and Kifer used to prove algorithms to be privacy-preserving
License: MIT License
If e1(1) and e1(2) are two adjacent values such that e1(1) −e1(2) = c, then
e1(1) −e1(2) ≡ c mod e2
.
Hence, one could think to allow e1
have distance d
and let the result have distance d % e2
. The function may not be injective in all the integers, but it is in the ring of integers modulo e2
.
We do currently not support to name the return value to make it available in the scope of the post condition. This is currently of no issue, since no postconditions are needed in our examples, but one could expect our tool to be used to simultaneously prove functional correctness together with its privacy guarantees.
Currently we only allow the ^ operator to be used directly on variables with a star-type. In most other cases it is not necessary, but one might as well allow a wider usage.
let eta1 : float[3] = Lap(1/3);
let eta2 : float[^eta1] = Lap(1/3);
let eta3 : float[^(eta1 + eta2)] = Lap(1/3);
As before ^ should only be allowed within a distance, not within the expressions
The PartialSum example partial_sum.mldp
currently contains the preconditions and invariants of Zhang and Kifer [ZK17], but Dafny is not able to verify the output.
Dependant on #6 .
We already track the mutable dependencies of random variables η. Currently we only use this information to discern whether the distance provided by the programmer is in conflict with the mutable dependencies.
Zhang and Kifer [ZK17] differentiate between three cases. In the first two cases, η may have any distance d, while in the third η may only have a distance ∗. One may extend our dependency tracking to change the distance of η in the environment to ∗ when recognising the third case applies to η.
The artifacts for the end-to-end tests are persisted, while they only are supposed to be used to move data between jobs. We need a fourth and last cleanup step.
See:
https://developer.github.com/v3/actions/artifacts/#delete-an-artifact
Currently, everything returned must have distance 0, but after #26 and #27 one can allow non-zero distance for helper functions. The main algorithm still has to return values only of distance 0 to make it provably privacy-preserving, but one should be able to differentiate between them within the syntax.
In #7 we'd leave uninferrable distance variables as ghost arguments to the transformed program. One can find the optimal solution in which the distance variables satisfy all the proof obligations generated. Bjørner and Phan have extended the SMT solver Z3 to also be a MaxSMT solver νZ, which Zhang and Kifer [ZK17] successfully used to infer the optimal distances of the random variables. After integration with Z3 in #10 we should try and extend the integration to use νZ
It seems possible a user would like to define multiple helper functions for their algorithms. To start simple, it's better to start with no mutual recursion and then considering the following design decisions at first.
*
distance variables, and epsilon
argument always first?)Then one can extend this to be mutually recursive functions.
Currently we only support epsilon-private algorithms, but many algorithms, such as SmartSum in the presentation by Zhang and Kifer [ZK17], are c · epsilon-private for some positive constant c = 1. We suggest to add a new annotation that describes the privacy guarantee, which is then used to generate the assertions before all return statements. The SmartSum of [ZK17] would then have the added annotation
privacy: 2 * epsilon
If no such annotation has been provided, then epsilon is the default used.
If one creates an argument of type list(list(float[*]))
the tool will crash with a NotImplemented
exception.
The SmartSum example partial_sum.mldp
currently contains the preconditions and invariants of Zhang and Kifer [ZK17], but Dafny is not able to verify the output.
This may need #14 first to be implemented.
Our tool could immediately call Dafny if the program is successfully transformed. If one also makes a one-to-one mapping between the line numbers of the original program and the transformed program, then Dafny errors can be given with the original line numbers.
To prove GapSparseVector correct Wang et al. [Wan+19] removed the nonlinearity introduced in the absolute value due to the T-Laplace . Their approach was to replace the |d| with its numerical upper bound of 2. This upper bound is true due to the precondition that ∀i < |q| : −1 ≤ ˆq[i] ≤ 1
.
In general, one could automate major parts of this step. If there is a bound on a star typed variable x such that l ≤ ˆx ≤ u, then one can add the bound |l|+|u| to v? rather than |d|. If l and u are provided by the programmer, then one may want to prove Φ =⇒ l ≤ d ≤ u as an assert statement in the transformed program.
One should notice that max(|l|,|u|) ≤ |l|+|u| ≤ 2max(|l|,|u|), which makes it a pretty tight over-approximation. To infer l and u one might need first to look into #11 .
Maybe one should even keep them throughout all processing to still output them everywhere you want?
The Size
function only works on lists of float
s rather than any list.
The mutable dependency tracking in refine.ml
is supposed to track the dependencies in the distance in the type and not the mutable dependencies in the expression assigned.
One might want to wait to fix this until after #6 .
To prove GapSparseVector correct Wang et al. [Wan+19] removed the nonlinearity introduced in the absolute value due to the T-Laplace. Their approach was to replace the |d| with its numerical upper bound of 2. This upper bound is true due to the precondition that
∀i < |q| : −1 ≤ ˆq[i] ≤ 1
.
In general, one could automate major parts of this step. If there is a bound on a star typed variable x
such that
l ≤ ˆx ≤ u
,
then one may provide a variant of the Lap
function that instead adds the bound |l|+|u| to v_epsilon
rather than |d|. If l and u are provided by the programmer, then one may want to prove
Φ =⇒ l ≤ d ≤ u
as an assert statement in the transformed program.
During the differential privacy type checking we currently generate proof obligations of
numerical constraints, which are output into the resulting Dafny file. We experienced that Dafny has trouble with verifying a program, when it is requested to prove too many trivial statements such as 0 = 0
, ˆq[i] − ˆq[i] = 0
etc. As long as no variables are involved, such as in the first expression, then it can be trivially discharged by computing the result. In the second expression one could also discharge this by normalising the equation and make terms cancel out. The proof obligations may become more complicated than that, at which point a much more sophisticated tool like Z3 would have to be used anyway. Hence, we suggest to merely discharge all proof obligations with Z3; if it fails we then output them to Dafny as we currently do.
We currently require all function arguments to be immutable. One goal with our language was to minimise the differential privacy analysis going in the way of the wishes of the programmer; allowing the programmer to specify an argument as mutable will bring our language further in line with other
languages.
The current End-to-End test in .github/workflows/end-to-end.yml
only covers as the compilation of all our examples by the tool. Both because it would be cool, but also to catch more subtle errors in our program we may want to setup Dafny as part of the Github Action.
To prove GapSparseVector correct Wang et al. [Wan+19] removed the nonlinearity introduced in the absolute value due to the T-Laplace . Their approach was to replace the |d| with its numerical upper bound of 2. This upper bound is true due to the precondition that ∀i < |q| : −1 ≤ ˆq[i] ≤ 1
.
In the Visual Studio Code version of Dafny one does not need to do any of this, but rather only replaces the equality in the invariant to be a less or equal. When running Dafny from the console, one then also needs to change the outputted Dafny file to add the following line
assert Abs(if q[i] + eta2 >= t'
then (1 as real) - q_hat0[i]
else (0 as real)) as real <= 2.0;
or add the precondition as an invariant. In general, one could autogenerate this assertion. If there is a bound on a star typed variable x such that l ≤ ˆx ≤ u, then an assertion of the following form can be output
assert l <= q[i] <= u
One may look into this in the following order
One should be able to infer the base type of an argument to further lessen the annotation burden on the programmer. One may be able to infer the distances of function parameters with the refinement rules of Zhang and Kifer [ZK17], though one should be cautious when applying them. It may be safest to start with assuming these distances to be zero if nothing else is specified.
We currently have distances and types combined in a single value with the following type definition in ast.mli
type 'd ty = IntTy of 'd
| FloatTy of 'd
| BoolTy
| ListTy of 'd ty
| Error
Which in retrospect has been giving us more headaches than is worth it. We should completely separate the base type from the distance. We then place the distance in the DeclStmt
as a separate field.
Since no example of ours include a mutable list of type list(float[∗])
, we do currently only support immutable lists of that type. One quickly realises though, that removing this restriction is not an easy endeavour when concatenation and the empty list comes into the picture.
The distance of a function output may rely on the distance of the function input. In other words, functions can, in general, be dependently typed and should support inferring the output distance of a function based on the input.
This most likely completely changes how one deals with functions in dp.ml
if not also elsewhere too.
Zhang and Kifer [ZK17] assume all free variables in a normal variable are immutable. Currently, we do no such check.
One wonders though why they do this assumption. Based on their description of how to handle random variables and their proof of soundness, it seems it all is connected to whether the distance of a variable x
merely is the inferred value or one has to use a ghost
variable ^x
to contain the backup of the old value before mutation.
One should hence think to extend the distance tracking of #8 to also include normal variables.
Zhang and Kifer [ZK17] mention the possibility to improve the T-OTimes rule to be more fine-grained than the requirement of e1 ⊗ e2
has type num[0]
given e1
and e2
do too. They do so by referring to the work of Reed and Pierce [RP10] and Gaboardi et al. [Gab+13]. We forward this reference for integrating into MLightDP this more fine-grained version of the rule. One may especially want to take a look at page 5 in [Gab+13].
Zhang and Kifer [ZK17] mention briefly the ability to extend the notion of distance to boolean values. Based on the PrivBernoulli algorithm [ZK17, Figure 13] they first sample η from a Uniform distribution in [0,1] and then let b be true
if η > t ∈ [0,1] and false
otherwise. This idea is easily extensible to categorical values of more than two values, such as enums, by use of the more general Exponential Mechanism [ZK17].
They provide a differential privacy type-checking rule and transformation for sampling from a uniform distribution, which should be possible to directly incorporate into MLightDP. One may further abstract this from the user to allow one to sample a random boolean that is true with probability p, such that they only need to write
let b := RandBool(p)
which in the refinement step is translated into
let eta = Uniform(0,1)
let b = eta <= p ? true : false
and finally in the differential privacy type checking step is proven differentiably private according to the rule as provided by Zhang and Kifer [ZK17].
With a pretty printer on the original AST one could output a Rust file of the algorithm.
In reflection on #33, we may want to allow assert statements to be part of the source program.
We do currently only support values of type float
to have distance ∗
, which has sufficed for our current examples. Our tool would be more complete if a variable x
could have type int[∗]
, though treating the value of ˆx
as being integral may prove less than trivial.
Both the language of Zhang and Kifer [ZK17] and of our current implementation requires the Lap function to be used in isolated lines. Inspired by the proof obligations created during the differential privacy type checking, we propose to replace Lap functions in an expression with fresh variables η_i . The η_i variables are then declared right before the line they originated on.
Then t’
of the many Sparse Vector examples can be instantiated in the single line
let t’ := t + Lap(3 / epsilon)
which then in the refinement step automatically is converted into
let eta1 := Lap(3 / epsilon)
let t’ := t + eta1
at which point the normal refinement rules can infer the distances of the many newly generated η_i variables. One should notice, that this transformation is sound, as it purposely adheres to the most conservative assumptions of Zhang and Kifer [ZK17].
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.