Git Product home page Git Product logo

kvm-fuzz's Introduction

kvm-fuzz

The goal is to emulate and fuzz x86_64 binaries using KVM. The original idea was simply running userspace code inside the virtual machine, generating a VM exit when it called a syscall, which was then satisfied by the hypervisor. However, performance was bad when there were a significant amount of syscalls because VM exits are very expensive. In order to solve this, syscalls are now handled in the kernel inside the VM, so the only VM exit is at the end of each run.

Code coverage is achieved either using breakpoint-based coverage (something like mesos but much smaller), or using Intel Processor Trace. Intel PT is implemented using KVM-PT, a modified linux kernel used by kAFL which allows tracing virtual CPUs (you can see the paper here). Packet decoding is done using libxdc, also released by Sergej (@ms_s3c) and Cornelius (@is_eqv).

kvm-fuzz: fuzz x86_64 closed-source applications with hardware acceleration

Usage:
  kvm-fuzz [ options ] -- /path/to/fuzzed_binary [ args ]

Available options:
      --minimize-corpus    Set corpus minimization mode
      --minimize-crashes   Set crashes minimization mode
  -j, --jobs arg           Number of threads to use (default: 8)
  -m, --memory arg         Virtual machine memory limit (default: 8M)
  -k, --kernel path        Kernel path (default: ./kernel/kernel)
  -i, --input dir          Input folder (initial corpus) (default: ./in)
  -o, --output dir         Output folder (corpus, crashes, etc) (default: ./out)
  -f, --file path          Memory loaded files for the target. Set once for
                           each file, or as a list: -f file1,file2
  -b, --basic-blocks path  Path to file containing a list of basic blocks for
                           code coverage. Default value is
                           basic_blocks_<BinaryMD5Hash>.txt
  -s, --single-input path  Path to single input file. A single run will be
                           performed with this input.
  -h, --help               Print usage

Is this fast?

It should be. As it uses KVM virtualization, execution speed should be near-native. However, it doesn't run Linux, but a much smaller kernel that attempts to emulate it. This results in less time spent executing in kernel mode, simply because we execute less instructions. As an example of this, this graph represents how many instructions are executed in two different runs of readelf and tiff2rgba in both kernel and user mode, running natively vs inside the VM. Every measure is from main until process calls exit.

In average, in each execution the VM ran 78% less of kernel instructions, and 37% less of total instructions than Linux. The experiment was probably not very rigorous, but it gives an idea.

I've also compared the fuzzing speed of kvm-fuzz to AFL++ targetting libtiff. AFL++ ran libtiff compiled with afl-clang-fast, using persistent mode and shared memory (as described here), while kvm-fuzz followed a similar setup (just running the fuzzed function and writing the input file into the VM's memory) with the non-instrumented library using basic block coverage. AFL++ was between 20% and 50% faster depending on the run. It should also be noted that there are a lot of differences between both: kvm-fuzz resets the guest memory after each execution and doesn't require source; AFL++ gets full-edge coverage and has a better mutator, etc.

Notes

This isn't ready to be used yet. Building the kernel requires a cross compiler, and using Intel PT requires building a custom Linux kernel (KVM-PT with some little changes). Wait for it! :)

kvm-fuzz's People

Contributors

klecko avatar fengjixuchui avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.