Git Product home page Git Product logo

few_shot_fas's Introduction

Adaptive Transformers for Robust Few-shot Cross-domain Face Anti-spoofing

Contact: Hsin-Ping Huang ([email protected])
[Paper]

Introduction

While recent face anti-spoofing methods perform well under the intra-domain setups, an effective approach needs to account for much larger appearance variations of images acquired in complex scenes with different sensors for robust performance. In this paper, we present adaptive vision transformers (ViT) for robust cross-domain face anti-spoofing. Specifically, we adopt ViT as a backbone to exploit its strength to account for long-range dependencies among pixels. We further introduce the ensemble adapters module and feature-wise transformation layers in the ViT to adapt to different domains for robust performance with a few samples. Experiments on several benchmark datasets show that the proposed models achieve both robust and competitive performance against the state-of-the-art methods.

Paper

Adaptive Transformers for Robust Few-shot Cross-domain Face Anti-spoofing
Hsin-Ping Huang, Deqing Sun, Yaojie Liu, Wen-Sheng Chu, Taihong Xiao, Jinwei Yuan, Hartwig Adam, and Ming-Hsuan Yang

European Conference on Computer Vision (ECCV), 2022

Please cite our paper if you find it useful for your research.

@inproceedings{huang_2022_adaptive,
   title = {Adaptive Transformers for Robust Few-shot Cross-domain Face Anti-spoofing},
   author={Huang, Hsin-Ping and Sun, Deqing and Liu, Yaojie, and Chu, Wen-Sheng and Xiao, Taihong and Yuan, Jinwei and Adam, Hartwig and Yang, Ming-Hsuan},
   booktitle = {ECCV},
   year={2022}
}

Installation and Usage

Clone this repo.

git clone https://github.com/hhsinping/few_shot_fas.git
cd few_shot_fas

Install the packages.

  • Create conda environment and install required packages.
  1. Python 3.7
  2. Pytorch 1.7.1, Torchvision 0.8.2, timm 0.4.9
  3. Pandas, Matplotlib, Opencv, Sklearn
conda create -n fas python=3.7.4 -y
conda activate fas
pip install torch==1.7.1 torchvision==0.8.2 timm==0.4.9
pip install pandas scikit-learn matplotlib opencv-python
  • Clone the external repo timm and copy our code there
git clone https://github.com/rwightman/pytorch-image-models.git
cd pytorch-image-models && git checkout e7f0db8 && cd ../
mv pytorch-image-models/timm ./third_party && rm pytorch-image-models -r && mv vision_transformer.py third_party/models && mv helpers.py third_party/models
  • Download and extract the file lists of external data for protocol 1 and protocol 2
wget https://www.dropbox.com/s/2mxh5r8hf0m8m1n/data.tgz
tar zxvf data.tgz

Datasets

  • To run the code, you will need to request the following datasets.
  1. Protocol 1: MSU-MFSD, CASIA-MFSD, Replay-attack, CelebA-Spoof, OULU-NPU
  2. Protocol 2: WMCA, CASIA-SURF CeFA, CASIA-SURF
  • For protocol 1, we follow the preprocessing step of SSDG which detect and align the faces using MTCNN.
  1. For each video, we only sample two frames: frame[6] and frame[6+math.floor(total_frames/2)] and save the frame as videoname_frame0.png/videoname_frame1.png, except for the CelebA-Spoof dataset.
  2. We input the sample frames into MTCNN to detect, align and crop the images. The image are resized to (224,224,3), only the RGB channels are used.
  3. We save the frames into data/MCIO/frame/ following the file name listed in data/MCIO/txt/, we provide the file structure below:
    data/MCIO/frame/
    |-- casia
        |-- train
        |   |--real
        |   |  |--1_1_frame0.png, 1_1_frame1.png 
        |   |--fake
        |      |--1_3_frame0.png, 1_3_frame1.png 
        |-- test
            |--real
            |  |--1_1_frame0.png, 1_1_frame1.png 
            |--fake
               |--1_3_frame0.png, 1_3_frame1.png 
    |-- msu
        |-- train
        |   |--real
        |   |  |--real_client002_android_SD_scene01_frame0.png, real_client002_android_SD_scene01_frame1.png
        |   |--fake
        |      |--attack_client002_android_SD_ipad_video_scene01_frame0.png, attack_client002_android_SD_ipad_video_scene01_frame1.png
        |-- test
            |--real
            |  |--real_client001_android_SD_scene01_frame0.png, real_client001_android_SD_scene01_frame1.png
            |--fake
               |--attack_client001_android_SD_ipad_video_scene01_frame0.png, attack_client001_android_SD_ipad_video_scene01_frame1.png
    |-- replay
        |-- train
        |   |--real
        |   |  |--real_client001_session01_webcam_authenticate_adverse_1_frame0.png, real_client001_session01_webcam_authenticate_adverse_1_frame1.png
        |   |--fake
        |      |--fixed_attack_highdef_client001_session01_highdef_photo_adverse_frame0.png, fixed_attack_highdef_client001_session01_highdef_photo_adverse_frame1.png
        |-- test
            |--real
            |  |--real_client009_session01_webcam_authenticate_adverse_1_frame0.png, real_client009_session01_webcam_authenticate_adverse_1_frame1.png
            |--fake
               |--fixed_attack_highdef_client009_session01_highdef_photo_adverse_frame0.png, fixed_attack_highdef_client009_session01_highdef_photo_adverse_frame1.png
    |-- oulu
        |-- train
        |   |--real
        |   |  |--1_1_01_1_frame0.png, 1_1_01_1_frame1.png
        |   |--fake
        |      |--1_1_01_2_frame0.png, 1_1_01_2_frame1.png
        |-- test
            |--real
            |  |--1_1_36_1_frame0.png, 1_1_36_1_frame1.png
            |--fake
               |--1_1_36_2_frame0.png, 1_1_36_2_frame1.png
    |-- celeb
        |-- real
        |   |--167_live_096546.jpg
        |-- fake
            |--197_spoof_420156.jpg       
    
  • For protocol 2, we use the original frames and cut the black borders as inputs.
  1. We use all the frames in surf dataset with their original file names. We sample 10 frames in each video equidistantly for cefa and wmca datasets. We save the sampled frame as videoname_XX.jpg (where XX denotes the index of sampled frame, detailed file names can be found in data/WCS/txt/.
  2. We cut the black borders of the images and input to our code. The images are then resized to (224,224,3), only the RGB channels are used.
  3. We save the frames into data/WCS/frame/ following the file name listed in data/WCS/txt/, we provide the file structure below:
    data/WCS/frame/
    |-- wmca
        |-- train
        |   |--real
        |   |  |--31.01.18_035_01_000_0_01_00.jpg, 31.01.18_035_01_000_0_01_05.jpg
        |   |--fake
        |      |--31.01.18_514_01_035_1_05_00.jpg, 31.01.18_514_01_035_1_05_05.jpg
        |-- test
            |--real
            |  |--31.01.18_036_01_000_0_00_00.jpg, 31.01.18_036_01_000_0_00_01.jpg
            |--fake
               |--31.01.18_098_01_035_3_13_00.jpg, 31.01.18_098_01_035_3_13_01.jpg
    |-- cefa
        |-- train
        |   |--real
        |   |  |--3_499_1_1_1_00.jpg, 3_499_1_1_1_01.jpg
        |   |--fake
        |      |--3_499_3_2_2_00.jpg, 3_499_3_2_2_01.jpg
        |-- test
            |--real
            |  |--3_299_1_1_1_00.jpg, 3_299_1_1_1_01.jpg
            |--fake
               |--3_299_3_2_2_00.jpg, 3_299_3_2_2_01.jpg
    |-- surf
        |-- train
        |   |--real
        |   |  |--Training_real_part_CLKJ_CS0110_real.rssdk_color_91.jpg
        |   |--fake
        |      |--Training_fake_part_CLKJ_CS0110_06_enm_b.rssdk_color_91.jpg
        |-- test
            |--real
            |  |--Val_0007_007243-color.jpg
            |--fake
               |--Val_0007_007193-color.jpg
    

Training example

To train the network you can run

python train.py --config [C/M/I/O/cefa/surf/wmca]

Saved model can be found as casia_checkpoint.pth.tar, msu_checkpoint.pth.tar, replay_checkpoint.pth.tar, oulu_checkpoint.pth.tar, cefa_checkpoint.pth.tar, surf_checkpoint.pth.tar, wmca_checkpoint.pth.tar.

Testing example

To evaluate the network you can run

python test.py --config [C/M/I/O/cefa/surf/wmca]

The test script will read the checkpoint msu_checkpoint.pth.tar, replay_checkpoint.pth.tar, oulu_checkpoint.pth.tar, cefa_checkpoint.pth.tar, surf_checkpoint.pth.tar, wmca_checkpoint.pth.tar, and the test AUC/HTER/TPR@FPR will be printed.

Config

The file config.py contains the hyper-parameters used during training/testing.

Acknowledgement

The implementation is partly based on the following projects: SSDG, timm, BERT on STILTs, CrossDomainFewShot.

few_shot_fas's People

Contributors

hhsinping avatar

Stargazers

 avatar Jack Li avatar Natália Meira avatar Randy Pangestu avatar  avatar Haneol Jang avatar Qianyu Chow avatar  avatar SusangKim avatar 芒河 avatar  avatar Shamima Hossain avatar  avatar Umit Kacar, PhD avatar  avatar Anonymous Dragon avatar AjianLiu avatar  avatar SoNguyen avatar S.PO.I.L.E.R avatar Wizyoung avatar

Watchers

 avatar

few_shot_fas's Issues

Cannot reproduce the performance by ViT

Hi, thank you for impressive research.

I have been trying to reproduce the performance of ViT not ViTAF, but cannot reproduce the performance of the experiments on ViT both MICO protocols.

I referenced the ViT code by this GitHub

just wondering if you were able to reproduce the ViT performance. TPR@FPR, which differs from the paper by about 2-30%.

Here is the result I got by running python train_vit.py --config M, C, I, O for each.

Can you give any suggestions for result?

Run, HTER, AUC, TPR@FPR=1%
0, 6.333333333333332, 98.43333333333332, 76.66666666666667
1, 9.75, 97.425, 63.33333333333333
2, 6.583333333333333, 96.64166666666667, 60.0
3, 5.0, 98.26666666666667, 68.33333333333333
4, 8.416666666666666, 95.35, 65.0
Mean,7.216666666666666, 97.22333333333333, 66.66666666666666
Std dev, 1.670495601776777, 1.134991434133119, 5.676462121975469
Run, HTER, AUC, TPR@FPR=1%
0, 13.426931056110385, 94.66557308502598, 55.319148936170215
1, 10.647947122111256, 96.7948408677892, 63.12056737588653
2, 9.258455155111688, 96.38153133593863, 53.90070921985816
3, 10.647947122111256, 95.23487882150496, 25.53191489361702
4, 9.258455155111688, 96.95726990559817, 60.99290780141844
Mean,10.647947122111255, 96.0068188031714, 51.773049645390074
Std dev, 1.522112187595772, 0.9010634790490837, 13.560758471168365
Run, HTER, AUC, TPR@FPR=1%
0, 12.385730211817169, 94.78595317725753, 24.615384615384617
1, 16.19286510590858, 91.21181716833891, 29.230769230769234
2, 14.626532887402455, 94.17614269788183, 34.61538461538461
3, 15.228539576365662, 92.93422519509475, 30.0
4, 13.87959866220736, 94.22742474916387, 38.46153846153847
Mean,14.462653288740245, 93.46711259754738, 31.384615384615387
Std dev, 1.2853508023630207, 1.2798800867634041, 4.751829295840155
Run, HTER, AUC, TPR@FPR=1%
0, 20.0, 87.72822299651567, 4.366197183098591
1, 20.42032683908328, 86.16719831182216, 2.3943661971830985
2, 20.509643225204886, 87.46181969867989, 14.225352112676056
3, 20.52559257986946, 88.02144574765667, 20.704225352112676
4, 19.73720371006527, 87.68459537714091, 22.3943661971831
Mean,20.23855327084458, 87.41265642636306, 12.816901408450704
Std dev, 0.3153353781341504, 0.6477253172896427, 8.197134698080816

Reproducing paper results

Thank you for providing this repo for anti-spoofing @hhsinping.

I am trying to reproduce paper results by followın the steps you provided.
I have used this MTCNN for face detection and alignment wıth default config:
detector = MtcnnDetector(model_folder='model', ctx=mx.cpu(0), num_worker = 4 , accurate_landmark = False)
However, MTCNN detector could not detect some of the faces under data/MCIO/{dataset}/txt which resulted with NoneType objects while training.

Could you please share the face detection part for reproduction of the results?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.