Git Product home page Git Product logo

privacyraven's Issues

Integrate TensorBoard

Is your feature request related to a problem? Please describe.
TensorBoard enables more transparency into the development of the victim and substitute models.

Describe the solution you'd like.
Make the output as clear and succint as possible.

Fix HopSkipJump extraction

Is your feature request related to a problem? Please describe.
With the new security updates and other changes, HopSkipJump-based extraction no longer works. We need to fix it so that using HopSkipJump to populate the synthetic dataset for model extraction is possible.

CLI

Build a CLI tool for Privacy Raven. Details TBD.

Blocked on #14 .

Metrics visualization interface

Is your feature request related to a problem? Please describe.
Instead of merely printing hard-to-understand metrics, PrivacyRaven should generate visuals and easy-to-understand metrics for this tool to be more broadly applicable. This can be solved in a multitude of ways, but user understanding should be the top priority.

Automated hyperparameter optimization

Is your feature request related to a problem? Please describe.
Presently, users have to manually pick different hyperparameters for training a substitute model and other phases of each attack. However, constructing an effective attack often requires these hyperparameters to be optimized.

Describe the solution you'd like.
Optuna is a commonly used library to facilitate automated hyperparameter optimized. Different techniques should be tested in order to determine what the best optimization strategy is for PrivacyRaven.

Describe alternatives you've considered.
Other hyperparameter libraries can be used.

Detail any additional context.
The PyTorch Lightning parameters already contained within PrivacyRaven can be changed if needed. Additionally, the hyperparameters are stored within a dictionary, but can be move to an enum or other suitable solution.

Add more model extraction attacks

Is your feature request related to a problem? Please describe.
We want every model extraction attack to be achievable in PrivacyRaven. This does not include side channel, white-box, full or partial prediction, or explanation-based attacks.

Describe the solution you'd like.
PrivacyRaven has three interfaces for attacks:

  1. The core interface defines each attack parameter individually.
  2. The specific interface runs a predefined attack configuration.
  3. The cohesive interface runs every possible attack.

A user should be able to run the attack in every interface; this means that all the building blocks for the attack should be contained within PrivacyRaven. For example, new synthesizers or subset selection strategies for a specific attack should be added, so that it can be applied using the core interface.

If you would like to implement an attack, comment with the name of the paper. Then, create a new issue referencing this issue with the name of the paper in the title.

Detail any additional context.
This is a list of papers describing model extraction attacks that should be added to PrivacyRaven.

  1. Knockoff nets: Stealing functionality of black-box models: Blocked on #10
  2. PRADA: protecting against DNN model stealing attacks: Missing synthesizer
  3. CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples: Missing some synthesizers
  4. ACTIVETHIEF: Model Extraction Using Active Learning and Unannotated Public Data: Blocked on #10
  5. Thief, Beware of What Get You There: Towards Understanding Model Extraction Attack
  6. Special-Purpose Model Extraction Attacks: Stealing Coarse Model with Fewer Queries
  7. Model Extraction Attacks on Graph Neural Networks: Taxonomy and Realization
  8. ES Attack: Model Stealing against Deep Neural Networks without Data Hurdles
  9. Simulating Unknown Target Models for Query-Efficient Black-box Attacks
  10. Thieves on Sesame Street! Model Extraction of BERT-based APIs

Create an overlaid diverging histogram for extraction

Is your feature request related to a problem? Please describe.
It is difficult to reason about the difference between a victim and substitute model from a simple label agreement score.

Describe the solution you'd like.
Create an overlaid diverging histogram that showcases the correct and incorrect responses on a single axis.

Describe alternatives you've considered.
If a better solution is available, please comment on this issue.

Detail any additional context.
This was inspired by Figure 3 of Understanding and Visualizing Data Iteration in Machine Learning.

Modify the model extraction output

Is your feature request related to a problem? Please describe.
Currently, it is hard to evaluate the risk of a model extraction attack. We need the risk to be more obvious and to have better visuals.

Describe the solution you'd like.
The current output should (at the minimum):

  • Print blocks to separate the different phases
  • Display a sentence describing the label agreement ("Out of 1000 data points, the target model and substitute model agreed upon 900 data points")
  • Clearly delineate whether a metric is for accuracy or fidelity
  • Clean up the current extraction output; in particular, remove as many warning as possible (the PyTorch Lightning progress bar may need to be changed)

Optimize extraction example

Is your feature request related to a problem? Please describe.
Apply hyperparameter optimization and other tuning to the extraction example to increase the performance. Display as many metrics as possible.

Describe the solution you'd like.
This would be best displayed in a Jupyter Notebook and potentially turned into a tutorial.

Detail any additional context.
If there are any limits on the performance of the attack caused by PrivacyRaven, please raise an issue or fix the issue inside of your PR. It would also be nice if the extraction unit test asserts that the success of the attack is over a specific value/percentage instead of merely checking for failures.

Use attrs for model extraction

Refactor src/extraction/core.py by building the class for model extraction attacks with attrs instead (as has been done in src/m_inference/core.py).

Separate model-specific and data-specific hyperparameters

Is your feature request related to a problem? Please describe.
Right now, PrivacyRaven mixes all of the hyperparameters necessary to train a substitute model into a single dictionary. This should be replaced with Lightning Data Modules and/or other structures to clarify the role of each parameter and make it easier to extend PrivacyRaven for different tasks. This may require automatically turning all synthesized data into a Data Module.

Detail any additional context.

Create an aggregated embedding for membership inference hot spots

Is your feature request related to a problem? Please describe.
We need a clarifying visual to showcase the privacy risks of membership inference, especially as it varies between classes.

Describe the solution you'd like.
An aggregated embedding similar to Figure 4 of Understanding and Visualizing Data Iteration in Machine Learning would be useful. Highlight the worst case.

Describe alternatives you've considered.
Please comment any alternatives.

Build tests for model extraction and utils

Is your feature request related to a problem? Please describe.
No tests exist for the different model extraction interfaces and for the utilities.

Describe the solution you'd like.
Create unit tests using Pytest and/or Hypothesis.

Describe alternatives you've considered.
N/A

Detail any additional context.
Either traditional unit testing or property-based testing is appropriate as long as the decision is justified. Make sure to add any new libraries to Poetry.

Add retraining and subset sampling to extraction

Is your feature request related to a problem? Please describe.
PrivacyRaven currently trains a substitute model on synthetic data generated separately from substitute training. However, many model extraction attacks train the substitute model adaptively using subset sampling strategies.

Describe the solution you'd like.
PrivacyRaven should allow users to specify a subset sampling strategy. By default, it should be assumed that a subset sampling strategy is not being used.

Detail any additional context.
Multiple functions in extraction/attacks.py as well as the core classes for membership inference and model extraction need to be changed in addition to the attributes of the core model extraction class.

Add support for Python 3.6, 3.8, and 3.9

Is your feature request related to a problem? Please describe.
Presently, PrivacyRaven supports 3.7. Supporting 3.6 specifically would enable PrivacyRaven to work on Google Colab.

Describe the solution you'd like.
Minimal changes to existing code is preferred.

Describe alternatives you've considered.
Forcing Colab to run on a 3.8 kernel would increase the complexity of getting started with PrivacyRaven.

Detail any additional context.
This is in response to #51.

Unable to run or use PrivacyRaven in Colab

Hi,

It appears due to the use of poetry to enforce dependencies and create virtualenv for PrivacyRaven, we are unable to use this in google colab or jupyter notebook environments. Please suggest a work around or alternative for the same.

Thanks and Regards,
Sasikanth

Create a differentially private victim model

Is your feature request related to a problem? Please describe.
This would increase the variety of victim models for quick prototyping. Presently, PrivacyRaven only provides a three layer classifier trained on the MNIST dataset as a victim model.

Describe the solution you'd like.
A lightweight model written using Opacus is preferable.

Describe alternatives you've considered.
The model can also be built with TensorFlow Privacy or PySyft.

Detail any additional context.
This victim model could also be used inside of examples, reinforcing and/or extending the results from this paper among others. Multiple people can work on this issue.

Add GPU detection within attacks

Is your feature request related to a problem? Please describe.
This should be integrated into all of the attacks. Users will no longer be required to explicitly state how many GPUs they are using. However, they should have the ability to do so. This should output something like:

GPU Available: True; GPU Used: False

This is motivated by the fact that the tests only work on a CPU due to GitHub Actions, so all tests must explicitly define that they are not using a GPU.

Describe the solution you'd like.
Explicit statements take priority over automated detection. This should turn off all GPU-specific behavior, especially .cuda methods.

Describe alternatives you've considered.
We could have separate tests on a GPU with a different CI. Self-hosted runners are not an option. We are open to alternatives.

Detail any additional context.
If you include a test for GPU behavior, it must be ignored by GitHub Actions somehow.

Potentially interesting links:

  1. How PyTorch Lightning became the first ML framework to run continuous integration on TPUs
  2. Effective testing for machine learning systems

Create a PyTorch Lightning callback that uses model extraction

Is your feature request related to a problem? Please describe.
Have a callback perform optimized model extraction.

Detail any additional context.
This may be better suited to be stored within the PyTorch Lightning Bolts repository. A callback for #24 would also be of interest.

Add guidance for protecting against these attacks in README

Is your feature request related to a problem? Please describe.
Currently, the tool provides various ways to attack a users model. It would be beneficial to the user to also inform them of ways in which they can protect their models.

Describe the solution you'd like.
Consult the relevant literature and consolidate advice, such as using differential privacy.

Add "help" function

Is your feature request related to a problem? Please describe.
Currently, there is no option for users to request help for how to use this tool. Adding some "help" function will improve usability.

Blocked on #22

Create a wrapper around PyTorch Lightning Callbacks

Is your feature request related to a problem? Please describe.
PyTorch Lightning callbacks enable users to extend the functionality after already creating the attack. In other words, this should be a function that takes in the attack output itself.

Detail any additional context.
If there are any callbacks that should be integrated within PrivacyRaven, create an issue or comment on this one.

Determine query complexity of attacks

Is your feature request related to a problem? Please describe.
Currently, users don't have much guidance when choosing the best attack for their use case. One of the most important factors is the query complexity: how difficult it is to launch one of these attacks and a relative estimate of the computational complexity.

Describe the solution you'd like.
More research is needed to detail and visualize a solution. Comment on this issue with your solution before submitting a PR.

Detail any additional context.
This will affect metrics visualization issues. This paper might be useful.

Allow membership inference attacks to accept extracted models

Is your feature request related to a problem? Please describe.
The membership inference attack currently performs model extraction, which would be redundant if model extraction was already performed. Allow users to input in an extracted model and skip the extraction component of membership inference.

Add more privacy metrics

Describe the solution you'd like.
I would like a new folder in src with a plethora of metrics.

Detail any additional context.
This is a list of papers. For each metric, create a new issue linking to this one and resolve that one.

Edit CONTRIBUTING.md

Is your feature request related to a problem? Please describe.
Instead of a list of bullet points, it would be nice to reorganize the file into sections. It would be helpful if more attention was paid to installation and setup.

Detail any additional context.
Make sure to include that:

  • Right now, all of the tests run on GitHub Actions and therefore must explicitly be defined as not using a GPU
  • All issues under "needs validation" (like this one) should have a comment with an overview of the solution on the issue itself
  • Commands often need poetry to run like poetry run python or poetry run pytest
  • It is best to run Nox before making any commits
  • Explain how to look at the GitHub Projects. The most important issues are at the top
  • Take a look at "help wanted" issues first
  • We prefer tests using Python Hypothesis
  • Link to relevant papers
  • Code should adhere to PyTorch and PyTorch Lightning best practices (See: 1, 2, 3, 4, 5)
  • We're interested in new issues that add new synthesizers, attack configurations, strategies, victim models, and substitute model architectures (or, broadly, whatever improves the scope, efficiency, and effectiveness of PrivacyRaven)
  • Some papers are not reproducible

Add more membership inference configurations

Is your feature request related to a problem? Please describe.
Create functions that call the membership inference class with specific configurations as has been done with model extraction.

Detail any additional context.
Blocked on #7 and #8.

PrivacyRaven as a property testing tool

Is your feature request related to a problem? Please describe.
Currently, there is not an easy way to integrate PrivacyRaven as an assurance tool into an existing project.

Describe the solution you'd like.
Adjust PrivacyRaven to function as a property testing tool that can integrate with existing projects. Comment on this issue with your solution before submitting a PR.

Blocked on #22

Create a tabular output for run-all-attacks

Is your feature request related to a problem? Please describe.
When running multiple attacks, it should be easy to compare the effectiveness of each one.

Describe the solution you'd like.
Returning a table of each attack and respective metrics is a start. This could eventually be turned into graphs and other visuals.
A pandas data frame is a potential format.

Detail any additional context.
TensorFlow Privacy is an excellent example.

Add PrivacyRaven-specific Jupyter Widgets

Is your feature request related to a problem? Please describe.
The title is self-explanatory. Comment and/or create a new issue for the proposed widget. Metrics visualization is the primary use case.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.