Git Product home page Git Product logo

face-auditor's Introduction

Introduction

This repo contains the implementation of Face-Auditor, which aims to evaluate the privacy leakage in the state-of-the-art few-shot learning pipeline.

Code Strcuture

.
├── config.py
├── exp
├── lib_classifer
├── lib_dataset
├── lib_metrics
├── lib_model
├── main.py
├── parameter_parser.py
└── README.md

Environment Prepare

conda create --name face_auditor python=3.6.10
conda activate face_auditor
pip install numpy pandas seaborn matplotlib sklearn MulticoreTSNE cython facenet_pytorch deepface opacus psutil GPUtil
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

Dataset Prepare

The corresponding file for downloading the datasets is in lib_dataset/datasets/. In our paper, we mainly focus on four open-source human face image datasets, they are:

  • UMDFaces
  • WebFace
  • VGGFace2
  • CelebA

Other datasets should also work with our Face-Auditor.

Evaluations

In the following, we give some examples of the experimental configurations, see more details in parameter_parser.py.

Training Shadow and Target Models

exp='class_mem_infer_meta'

python main.py --exp $exp --is_train_target true --is_train_shadow true

Constructing the Probing Set

shot=5
way=5
probe_num_task=100
probe_num_query=5

python main.py --is_generate_probe true --probe_ways $way --probe_shot $shot --probe_num_task $probe_num_task --probe_num_query $probe_num_query 

Reference Information related Configurations

## probe controlling parameters ##
python main.py --is_similarity_aided true --is_use_image_similarity true --image_similarity_name cosine

On the Robustness of FACE-AUDITOR


## adv (input) defense parameters ##
python main.py --is_adv_defense true

## dp (training) defense parameters ##
python main.py --is_dp_defense true

## noise (output) defense parameters ##
python main.py --is_noise_defense true

## memguard (adaptive) defense parameters ##
python main.py --is_memguard_defense true

face-auditor's People

Contributors

minchen00 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

face-auditor's Issues

DP implementation for RelationNet

Note: My understanding of using DP-SGD in a situation like this (where there are 2 models being trained simultaneously) is limited, so please ignore this if not valid:

From what I understand, the opacus wrapper uses the given model to compute gradients, and then adds noise and clips them at the optimizer level. While the optimizer attached here contains references to both the feature extractor and the relation model, having only the main model referenced here might mean that the mentioned noise and clipping only happen at the relation-model end, and not the feature extractor. From my understanding, DP noise should be added to both of them (or does some post-processing theory suggest that doing it only for the relation model would be sufficient)?

privacy_engine = PrivacyEngine(

Questions about attack setup

Hey,

I had a couple of questions regarding the code structure (specifically the attack setup)

  1. The following:

    self.attack_f1_score, self.attack_recall, self.attack_precision, self.attack_FPR = self.attack_model.test_model_metrics(self.attack_train_data, self.attack_test_label)

    seems to be computing train-metrics for the shadow model, but uses self.attack_train_data and self.attack_test_label. If I understand correctly, the latter should be self.attack_train_label? Or is it the case that they are equivalent?

    On a side note, .test_model_acc (in the line above) and on L60, 61 seem to be called twice- is there a reason to do so?

  2. From my understanding after looking at the following:

    self.attack_test_label = np.concatenate((np.ones(train_score.shape[0]), np.zeros(test_score.shape[0])))

    the meta-classifier is trained to predict 0 if the user was part of training?

A question from the _sort_proto_similarity function in the class ProbeDatasetSort (task_dataset.py)

Hello Chen,

Thank you for your sharing of the code. I'm watching your code and paper. From your code, the probe_num_query data points from the target label are sampled according to their similarities with pre-selected data points of the target label (in the support set), while the is_sort_query is true. From the _sort_proto_similarity of the current version, you use one pre-selected data point to calculate the similarities and comment on one sentence using all the pre-selected data points (first two lines of the function). As similarity calculation is designed for all the pre-selected data points, I'm wondering which is the case you run in your experiments. In other words, how many pre-selected data points of the target label are used for finding the probe_num_query data points of the target label?

Consistent .train() and .eval() calls?

More of a question than an 'issue':


For the above (and subsequent calls to self.model.train()), should self.feat_encoder also not be set to train()?
The same question for when self.model is set to eval():

This should not lead to any issue while training (since gradients are cleared out anyway) but may affect performance if layers like Dropout and BatchNorm are present.

`train_num_task` and `test_num_task`

Hey,

What do the values of train_num_task and test_num_task correspond to? Looking at the codebase I thought it might be the number of iterations per epoch, but am not sure how these values are picked (default of 100 and 80 respectively) and how they would map to the experimental setup in the paper.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.