Git Product home page Git Product logo

refined-score-atk-sse's Introduction

Implementations of Score attacks

This repository contains Python implementation of the attacks presented in:

M. Damie, F. Hahn, A. Peter, "A Highly Accurate Query-Recovery Attack against Searchable Encryption using Non-Indexed Documents" (USENIX 2021)

About

This repository contains everything necessary to simulate the attacks (Score attack, Refined score attack, with clustering, generalized version). This code was used to obtained all the results presented in the paper mentioned above. It allows a simulation with varying parameters (choice of dataset, countermeasures, document set sizes, etc.).

In the folder score_attacks, you find 3 elements:

  • src: folder containing all the utilitary codes to simulate the attacks
  • simulate_attack.py: Python script to simulate an attack with parameters specified by the user (dataset, countermeasure, vocabulary size, query set size, number of known queries).
  • generate_results.py: launch scripts that were used to obtain the vast majority of the results presented in the paper.

Getting started

To simulate a simple attack, you just have to follow these instructions:

  1. You must have at least Python 3.6 and pip installed. NB: you can use a Python virtual environment if you want to keep a clean Python environment (the setup script will install few dependencies).
  2. Download the repository
  3. Launch the setup script (install dependencies and download the datasets): bash setup.sh. The dataset download and decompression may take a while (~1h) depending on your connection and your computer. Dataset size: Apache emails takes 252 MB and Enron dataset takes 2.6 GB!!
  4. Move to score_attacks folder: cd score_attacks
  5. Launch the simulation script: python3 simulate_attack.py. One attack simulation (including the keyword extraction) takes few minutes.

To explore further the attack, you can change the parameters:

Score attack simulator

optional arguments:
  -h, --help            show this help message and exit
  --similar-voc-size SIMILAR_VOC_SIZE
                        Size of the vocabulary extracted from similar
                        documents.
  --server-voc-size SERVER_VOC_SIZE
                        Size of the 'queryable' vocabulary.
  --queryset-size QUERYSET_SIZE
                        Number of queries which have been observed.
  --nb-known-queries NB_KNOWN_QUERIES
                        Number of queries known by the attacker. Known
                        Query=(Trapdoor, Corresponding Keyword)
  --countermeasure COUNTERMEASURE
                        Which countermeasure will be applied.
  --attack-dataset ATTACK_DATASET
                        Dataset used for the attack

You can find all these information using python3 simulate_attack.py --help. For example, you can launch the following command: python3 simulate_attack.py --queryset-size 500 --nb-known-queries 5. At the start of each simulation, the parameters are systematically printed. Moreover, we developped some logging so you can easily follow the simulation progress. At the end, the accuracies with and without refinement are printed.

Attack code

The attack codes are in the file src/attackers.py. The rest of the files are only used to setup a simulation (keyword extraction, email cleaning, query generation, etc.).

In src/attackers.py are present two classes: ScoreAttacker (for the Score and Refined score attacks) and GeneralizedScoreAttacker (for their "generalized" version cf. the paper). Both have methods predict and predict_with_refinement which needs a list of trapdoors as parameter. These methods will return, for each trapdoor, a keyword. You can also specify a cluster_max_size in the predict_with_refinement so you can perform a refinement with clustering (cf. paper appendices).

If you want to understand how to assess the accuracy of these attackers, you can watch either the simulate_attack.py or the result_procedures.py (this second file is much messier).

Result reproduction

If you want to reproduce most results that are presented in the paper, you have to execute generate_results.py. The complete execution with 50 repetitions of each experiment may take several days depending on your machine specifications. If you want to reduce the number of repetitions, you can change the variable NB_REP in src/result_procedures.py. For convenience purpose, this script also have a file logging (in results.log) so you can analyze the logs later.

Code readability

Before the publication of this repo, we tried to simplify as much as possible our function to only keep what is relevent in the paper. We also added docstrings and typing so it is easier for a reader to understand what the functions/classes do and how to use them.

Further questions

Feel free to contact me (marc [at] damie [dot] eu) if you have some questions about the code or the attack itself (or if you spotted some bugs in our code).

Known issues

If you have the error setup.sh: 1: set: Illegal option - when executing setup.sh, this may be due to the file formatting. It happens when sending (via scp) a file from a Windows computer to a Linux server (Windows formatter using different hidden characters than Unix formatters). To solve this issue, on the Linux server, you just have to install dos2unix (via apt for Debian and Ubuntu) and launch dos2unix setup.sh. Then, the script should have the adequate format and should be executed successfully.

The installation and the code have been tested on Debian and Ubuntu so some issues may occur on other operating systems.

refined-score-atk-sse's People

Contributors

marct0k avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.