Git Product home page Git Product logo

msrocean / tik_tok Goto Github PK

View Code? Open in Web Editor NEW
39.0 3.0 5.0 120.32 MB

This repository contains code and data for "Tik-Tok: The Utility of Packet Timing in Website Fingerprinting Attacks" paper, published in PETS 2020.

License: MIT License

Python 93.14% Jupyter Notebook 6.86%
website-fingerprinting tor-network timing-analysis attacks privacy-enhancing-technologies tik-tok deep-learning machine-learning privacy

tik_tok's Introduction

PWC

⚠️ ⚠️ ⚠️ Experimental - PLEASE BE CAREFUL. Intended for Reasearch purposes ONLY. ⚠️ ⚠️ ⚠️

This repository contains the code and data to demonstrate the Experiments and Reproduce the results of the Privacy Enhancing Technologies Symposium (PETS) 2020 paper:

Tik-Tok: The Utility of Packet Timing in Website Fingerprinting Attacks (Read the Paper)

Reference Format

@article{rahman2020tik,
  title={{Tik-Tok}: The utility of packet timing in website fingerprinting attacks},
  author={Rahman, Mohammad Saidur and Sirinam, Payap and Mathews, Nate and Gangadhara, Kantha Girish and Wright, Matthew},
  journal={Proceedings on Privacy Enhancing Technologies},
  volume={2020},
  number={3},
  pages={5--24},
  year={2020},
  publisher={Sciendo}
}

Dataset

In this paper, we use five datasets for our experiments. Among those, four datasets are from previous research, and we have collected the Walkie-Talkie (Real) dataset. We list the datasets as follows with appropriate description and references:

  1. Undefended [1]: Undefended dataset contains both closed-world (CW) & open-world (OW) data, and collected in 2016. CW data contains 95 sites with 1000 instances each and OW data contain 40,716 sites with 1 instance each.
  2. WTF-PAD [1]: WTF-PAD dataset contains both closed-world (CW) & open-world (OW) data, and collected in 2016 as well. CW data contains 95 sites with 1000 instances each and OW data contain 40,716 sites with 1 instance each.
  3. Walkie-Talkie (Simulated) [1]: Walkie-Talkie (Simulated) dataset contains only closed-world (CW) data, and collected in 2016 as well. This dataset contains 100 sites with 900 instances each.
  4. Onion Sites [2]: Onion Sites dataset contains only closed-world (CW) data, and collected in 2016 as well. This dataset contains 538 sites with 77 instances each.
  5. Walkie-Talkie (Real): Walkie-Talkie (Real) dataset contains 100 sites with over 750 instances each. We collected this dataset using our implemented Walkie-Talkie prototype in 2019. See the W-T_Experiments subdirectory for additional details.
[1] Payap Sirinam, Mohsen Imani, Marc Juarez, and Matthew Wright. 2018. 
Deep Fingerprinting: Undermining Website Fingerprinting Defenses 
with Deep Learning. In Proceedings of the 2018 ACM Conference on 
Computer and Communications Security (CCS). ACM.

[2] Rebekah Overdorf, Mark Juarez, Gunes Acar, Rachel Greenstadt, and Claudia
Diaz. 2017. How Unique is Your. onion?: An Analysis of the Fingerprintability
of Tor Onion Services. In Proceedings of the 2017 ACM Conference on Computer
and Communications Security (CCS). ACM.

Data Representation

We have experiments with four types of data representations. We explain each of the data representation as follows:

  • Timing Features: Timing features consist of 160 feature values (20 feature values from 8 feature categories). In the model, timing features are represented as an 1-D array of [1x160].

  • Direction (D): We represent the direction information of an instance as a sequence of +1 and -1, +1 representing an outgoing packet and -1 representing an incoming packet. The sequences are trimmed or padded with 0’s as needed to reach a fixed length of 5,000 packets. Thus, the input forms an 1-D array of [1 x 5000].

  • Raw Timing (RT): We represent the raw timing information as a sequence of raw timestamps of an instance. The sequences are trimmed or padded with 0’s as needed to reach a fixed length of 5,000 packets. Thus, the input forms an 1-D array of [1 x 5000].

  • Directional Timing (DT): We represent the directional timing information as a sequence of the multiplication of raw timestamps and the corresponding packet direction (+1 (outgoing) or -1 (incoming)) of a particular packet of an instance. The sequences are trimmed or padded with 0’s as needed to reach a fixed length of 5,000 packets. Thus, the input forms an 1-D array of [1 x 5000].

Reproducability of the Results

Dependencies & Required Packages

Please make sure you have all the dependencies available and installed before running the models.

  • NVIDIA GPU should be installed in the machine, running on CPU will significantly increase time complexity.
  • Ubuntu 16.04.5
  • Python3-venv
  • Keras version: 2.3.0
  • TensorFlow version: 1.14.0
  • CUDA Version: 10.2
  • CuDNN Version: 7
  • Python Version: 3.6.x

Please install the required packages using:

pip3 install -r requirements.txt

We explain the ways to reproduce each of experimental results one by one as the following:

1. Timing Features

  • Traditional machine-learning (ML) classifier: For the experiments with k-NN [3], SVM (CUMUL) [4], and k-FP [5], we refer to the classifier from the respective repositories.

     [3] Tao Wang, Xiang Cai, Rishab Nithyanand, Rob Johnson, and 
         Ian Goldberg. 2014. Effective attacks and provable defenses for 
         website fingerprinting. In Proceedings of the 23rd USENIX Conference 
         on Security Symposium.
     
     [4] Andriy Panchenko, Fabian Lanze, Jan Pennekamp, Thomas Engel, 
         Andreas Zinnen, Martin Henze, and Klaus Wehrle. 2016. Website 
         fingerprinting at Internet scale. In Proceedings of the 23rd Network and
         Distributed System Security Symposium (NDSS).
    
     [5] Jamie Hayes and George Danezis. 2016. k-Fingerprinting: A robust 
         scalable website fingerprinting technique. In Proceedings of the 25th 
         USENIX Conference on Security Symposium.
    
  • Timing Features in Deep Fingerprinting [1] model:

    You can either

    i) process raw data to get the features (google drive url.), or ii) use our processed data given in this (google drive url.) If you are using our processed data, please download the processed data and put them into the Timing_Features/save_data/ directory. Please go to Timing_Features directory and run the following command. In the place of dataset, please write any of the following: Undefended, WTF-PAD, W-T-Simulated, Onion-Sites

    python Tik_Tok_timing_features.py dataset
    

    Optional: We have also added a jupyter notebook (Tik_Tok_timing_features.ipynb) for a better interactive environment.

    A snippet of output for Undefended data:

    python Tik_Tok_timing_features.py Undefended
    
    Using TensorFlow backend.
    76000 train samples
    9500 validation samples
    9500 test samples
    Train on 76000 samples, validate on 9500 samples
    Epoch 1/100
     - 11s - loss: 4.1017 - acc: 0.0593 - val_loss: 2.9626 - val_acc: 0.1926
    Epoch 2/100
     - 7s - loss: 2.9497 - acc: 0.1976 - val_loss: 2.4673 - val_acc: 0.3026
    
    .....
    
    Epoch 99/100
     - 7s - loss: 0.3103 - acc: 0.9109 - val_loss: 0.7414 - val_acc: 0.8216
    Epoch 100/100
     - 7s - loss: 0.3096 - acc: 0.9104 - val_loss: 0.7639 - val_acc: 0.8239
    
    Testing accuracy: 0.843284285
    

2. Closed and Open-world Experiments w/ Deep Fingerprinting

See the DL_Experiments directory for the scripts used to perform the Direction, Raw Timing, and Directional Timing experiments.

3. W-T Prototype Experiments

Our W-T crawling software and instructions can be downloaded as a zip file from the following link: gdrive

The scripts used to evaluate the dataset and related instructions are found in the W-T_Experiments subdirectory.

4. Information Leakage Analysis:

For information leakage analysis, we refer to our re-implemented github repository of WeFDE: https://github.com/notem/reWeFDE.

5. Congestion Analysis

See the Congestion_Analysis directory for the scripts used to perform the experiments with the instances of slow circuits as test set and instances of fast circuits as test set. We processed the data to feed into model. Please create a sub-directory named datasets inside the Congestion_Analysis directory. Download the data from this google drive url. Extract the downloaded files to datasets sub-directory.

Parameters:

  • --congestion : choices = ['slow', 'fast']
    slow: Instances of Slow cirtuits as test set.
    fast: Instances of fast circuits as test set.)
  • --dataset : choices=['Undefended', 'WTF-PAD', 'Onion-Sites']
  • --data_rep : choices = ['D', 'RT', 'DT']
    Type of data representation to be used.
    D: direction, RT: Raw Timing, and DT: Directional Timing

Example of Usage:
python Tik_Tok_Congestion.py --congestion slow --dataset Undefended --data_rep D

Questions, Comments, & Feedback

Please, address any questions, comments, or feedback to the authors of the paper. The main developers of this code are:

Acknowledgements

We thank the anonymous reviewers for their helpful feedback. We give special thanks to Tao Wang for providing details about the technical implementation of the W-T defense, and to Marc Juarez for providing guidelines on developing the W-T prototype. This material is based upon work supported in part by the National Science Foundation (NSF) under Grants No. 1722743 and 1816851.

tik_tok's People

Contributors

dependabot[bot] avatar msrocean avatar notem avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

tik_tok's Issues

the raw data

Hello,
I would like to ask, the data set in the raw data link you gave is not the same as the data set which only contains the direction sequence information given in the paper on Deep Fingerprinting, so how did you get the data set in the raw data link? Is it obtained by processing the dataset of Deep Fingerprinting?

Tuning the model for recall

Hi,

It seems that the model in the DL_Experiments/ow_attack.py is tuned for precision. Would you like to share the model for recall?

Thanks,
Liang

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.