Git Product home page Git Product logo

fuzzing-seed-selection's Introduction

Seed Selection for Successful Fuzzing

The artifact associated with our ISSTA 2021 paper "Seed Selection for Successful Fuzzing". While our primary artifact is the OptiMin corpus minimizer, we also provide the necessary infrastructure to reproduce our fuzzing experiments.

Getting Started

Setup your environment

Set up your environment (assumes a modern Ubuntu OS, >= 18.04 && <= 20.04, and Python, >= 3.6 && <= 3.8):

# Install prerequisites
sudo apt update
sudo apt install -y git docker.io python3-venv 

# Add yourself to the docker group (don't forget to log out and log back in so
# that the group changes take effect)
sudo usermod -aG docker $USER

# Setup virtualenv
python3 -m venv seed_selection
source seed_selection/bin/activate
pip3 install wheel

# Get this repo
git clone https://github.com/HexHive/fuzzing-seed-selection
pip3 install fuzzing-seed-selection/scripts

Build OptiMin

OptiMin is our SAT-based corpus minimization tool. It supports coverage generated by both AFL and llvm-cov (only AFL is used in the paper). Similarly, OptiMin can back out to both Z3 or EvalMaxSAT (only EvalMaxSAT is used in the paper). To build:

docker build -t seed-selection/optimin fuzzing-seed-selection/optimin

Run OptiMin

OptiMin takes a large "collection corpus" and selects a subset of seeds that are used for fuzzing. This is based on the code coverage for each seed in the collection corpus.

While we provide tools to generate code coverage information for a given corpus (based on afl-showmap), this can be time consuming (depending on the size of the corpus). Thus, we provide seed traces in HDF5 archives.

For example, to perform a corpus minimization base on Google FTS FreeType2 coverage:

  1. Download the coverage HDF5 from afl-showmap-coverage/fts/freetype2.hdf5 here.

    Alternatively, you could also use the osfclient (installed into the seed_selection virtualenv), but this can be very slow (up to 10-12 minutes)

    osf -p hz8em fetch afl-showmap-coverage/fts/freetype2.hdf5
  2. Expand the HDF5 using the expand_hdf5_coverage.py script

    expand_hdf5_coverage.py -i freetype2.hdf5 -o /tmp/freetype2
    
    # Expected output:
    #
    # 466 seeds to extract
    # Expanding freetype2.hdf5: 100%
  3. Perform an unweighted minimization based on edges only (not hit counts)

    docker run -v /tmp/freetype2:/tmp/freetype2   \
      seed-selection/optimin -e /tmp/freetype2
    
    # Expected output:
    #
    # afl-showmap corpus minimization
    #
    # [############################################################] 100% Reading seed coverage
    # [############################################################] 100% Generating clauses
    # [*] Running Optimin on /tmp/freetype2
    # [*] Running EvalMaxSAT on WCNF
    # [+] EvalMaxSAT completed
    # [*] Parsing EvalMaxSAT output
    # [+] Solution found for /tmp/freetype2
    # 
    # [+] Total time: 0.01 sec
    # [+] Num. seeds: 37
    #
    # ...
  4. Perform an unweighted minimization including edge hit counts

    docker run -v /tmp/freetype2:/tmp/freetype2  \
      seed-selection/optimin /tmp/freetype2
    
    # Expected output:
    #
    # afl-showmap corpus minimization
    #
    # [############################################################] 100% Reading seed coverage
    # [############################################################] 100% Generating clauses
    # [*] Running Optimin on /tmp/freetype2
    # [*] Running EvalMaxSAT on WCNF
    # [+] EvalMaxSAT completed
    # [*] Parsing EvalMaxSAT output
    # [+] Solution found for /tmp/freetype2
    #
    # [+] Total time: 0.01 sec
    # [+] Num. seeds: 53
    #
    # ...
  5. Download the file weights (i.e., sizes) from weights/ttf.csv here.

    Again, you can use osfclient (this can take up to 10 minutes)

    osf -p hz8em fetch weights/ttf.csv
  6. Perform a weighted minimization based on file size and edges only

    docker run -v /tmp/freetype2:/tmp/freetype2 -v $(pwd):/tmp   \
      seed-selection/optimin -e -w /tmp/ttf.csv /tmp/freetype2
    
    # Expected output:
    #
    # afl-showmap corpus minimization
    #
    # [*] Reading weights from `/tmp/ttf.csv`... 0s
    # [############################################################] 100% Calculating top
    # [############################################################] 100% Reading seed coverage
    # [############################################################] 100% Generating clauses
    # [*] Running Optimin on /tmp/freetype2
    # [*] Running EvalMaxSAT on WCNF
    # [+] EvalMaxSAT completed
    # [*] Parsing EvalMaxSAT output
    # [+] Solution found for /tmp/freetype2
    #
    # [+] Total time: 0.01 sec
    # [+] Num. seeds: 37
    #
    # ...

Detailed Description

Additional Files

The sizes of our collection corpora mean that we cannot store them in a Git repo. Instead, we store ancillary data at

  1. Cloudstor. The actual seed files are stored here.
  2. OSF. This contains the compiled binaries that we fuzzed with, afl-showmap coverage (so that you do not have to trace each seeds coverage yourself), and the various corpora fuzzed in our paper (OSF just stores text files listing the names of the seeds in each corpus. The seeds themselves are stored on Cloudstor).

Tracing Code Coverage

Corpus minimization is typically based on some notion of "code coverage". To ensure a fair and uniform comparison across the three corpus minimization tools (afl-cmin, MinSet, and OptiMin), we use AFL's notion of edge coverage. This coverage information can be generated as follows

  1. Compile your target with AFL instrumentation. See the AFL documentation for instructions on how to do this.
  2. Run replay_seeds.py with your target program and your collection corpus. This will generate an HDF5 archive containing coverage information that can then be minimized.

Corpus Minimization

Our paper surveys a number of corpus minimization tools: OptiMin, afl-cmin, and MinSet. A more detailed explanation on how to use these tools and reproduce our results is given below.

OptiMin

Instructions for running OptiMin are given above. As described previously, a weighted minimization can be performed by supplying a weights CSV file to OptiMin's -w option. This weights file has the following format:

FILE_1,WEIGHT
FILE_2,WEIGHT
FILE_3,WEIGHT
FILE_4,WEIGHT
FILE_5,WEIGHT

Where FILE_1, FILE_2, ... corresponds to the name of a file within the corpus directory (only the filename needs to be provided: the corpus directory path should not be provided), and WEIGHT is an unsigned integer >= 1. We provide weights for our collection corpora here (under the weights directory).

afl-cmin

afl-cmin is AFL's inbuilt corpus minimization tool. afl_cmin.py wraps afl-cmin so that it outputs the names of the seeds in the minimized corpus (rather than copying the seeds and wasting storage).

MinSet

MinSet is the tool developed by Rebert et al. in their paper Optimizing Seed Selection for Fuzzing. While we were able to obtain the tool from the authors, it is not open source and thus we are unable to provide it here. Please contact the authors if you would like to obtain the source code.

If you have access to the source code, you can perform a MinSet minimization by:

  1. Generate code coverage as described here
  2. Expand the generated HDF5 archive using expand_hdf5_coverage.py
  3. Convert the expanded coverage to a set of bitvector traces using MoonBeam
  4. Run the qminset.py wrapper on the bitvector traces

Fuzzing Experiments

In addition to the OptiMin tool, we also provide the necessary infrastructure to reproduce our fuzzing experiments. Detailed instructions are provided here.

fuzzing-seed-selection's People

Contributors

adrianherrera avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.