Git Product home page Git Product logo
============================================
Fast LLVM-based instrumentation for afl-fuzz
============================================

  (See ../docs/README for the general instruction manual.)

1) Introduction
---------------

The code in this directory allows you to instrument programs for AFL using
true compiler-level instrumentation, instead of the more crude
assembly-level rewriting approach taken by afl-gcc and afl-clang. This has
several interesting properties:

  - The compiler can make many optimizations that are hard to pull off when
    manually inserting assembly. As a result, some slow, CPU-bound programs will
    run up to around 2x faster.

    The gains are less pronounced for fast binaries, where the speed is limited
    chiefly by the cost of creating new processes. In such cases, the gain will
    probably stay within 10%.

  - The instrumentation is CPU-independent. At least in principle, you should
    be able to rely on it to fuzz programs on non-x86 architectures (after
    building afl-fuzz with AFL_NO_X86=1).

  - The instrumentation can cope a bit better with multi-threaded targets.

  - Because the feature relies on the internals of LLVM, it is clang-specific
    and will *not* work with GCC.

Once this implementation is shown to be sufficiently robust and portable, it
will probably replace afl-clang. For now, it can be built separately and
co-exists with the original code.

The idea and much of the implementation comes from Laszlo Szekeres.

2) How to use
-------------

In order to leverage this mechanism, you need to have clang installed on your
system. You should also make sure that the llvm-config tool is in your path
(or pointed to via LLVM_CONFIG in the environment).

Unfortunately, some systems that do have clang come without llvm-config or the
LLVM development headers; one example of this is FreeBSD. FreeBSD users will
also run into problems with clang being built statically and not being able to
load modules (you'll see "Service unavailable" when loading afl-llvm-pass.so).

To solve all your problems, you can grab pre-built binaries for your OS from:

  http://llvm.org/releases/download.html

...and then put the bin/ directory from the tarball at the beginning of your
$PATH when compiling the feature and building packages later on. You don't need
to be root for that.

To build the instrumentation itself, type 'make'. This will generate binaries
called afl-clang-fast and afl-clang-fast++ in the parent directory. Once this
is done, you can instrument third-party code in a way similar to the standard
operating mode of AFL, e.g.:

  CC=/path/to/afl/afl-clang-fast ./configure [...options...]
  make

Be sure to also include CXX set to afl-clang-fast++ for C++ code.

The tool honors roughly the same environmental variables as afl-gcc (see
../docs/env_variables.txt). This includes AFL_INST_RATIO, AFL_USE_ASAN,
AFL_HARDEN, and AFL_DONT_OPTIMIZE.

Note: if you want the LLVM helper to be installed on your system for all
users, you need to build it before issuing 'make install' in the parent
directory.

3) Gotchas, feedback, bugs
--------------------------

This is an early-stage mechanism, so field reports are welcome. You can send bug
reports to <[email protected]>.

4) Bonus feature #1: deferred instrumentation
---------------------------------------------

AFL tries to optimize performance by executing the targeted binary just once,
stopping it just before main(), and then cloning this "master" process to get
a steady supply of targets to fuzz.

Although this approach eliminates much of the OS-, linker- and libc-level
costs of executing the program, it does not always help with binaries that
perform other time-consuming initialization steps - say, parsing a large config
file before getting to the fuzzed data.

In such cases, it's beneficial to initialize the forkserver a bit later, once
most of the initialization work is already done, but before the binary attempts
to read the fuzzed input and parse it; in some cases, this can offer a 10x+
performance gain. You can implement delayed initialization in LLVM mode in a
fairly simple way.

First, find a suitable location in the code where the delayed cloning can 
take place. This needs to be done with *extreme* care to avoid breaking the
binary. In particular, the program will probably malfunction if you select
a location after:

  - The creation of any vital threads or child processes - since the forkserver
    can't clone them easily.

  - The initialization of timers via setitimer() or equivalent calls.

  - The creation of temporary files, network sockets, offset-sensitive file
    descriptors, and similar shared-state resources - but only provided that
    their state meaningfully influences the behavior of the program later on.

  - Any access to the fuzzed input, including reading the metadata about its
    size.

With the location selected, add this code in the appropriate spot:

#ifdef __AFL_HAVE_MANUAL_CONTROL
  __AFL_INIT();
#endif

You don't need the #ifdef guards, but including them ensures that the program
will keep working normally when compiled with a tool other than afl-clang-fast.

Finally, recompile the program with afl-clang-fast (afl-gcc or afl-clang will
*not* generate a deferred-initialization binary) - and you should be all set!

5) Bonus feature #2: persistent mode
------------------------------------

Some libraries provide APIs that are stateless, or whose state can be reset in
between processing different input files. When such a reset is performed, a
single long-lived process can be reused to try out multiple test cases,
eliminating the need for repeated fork() calls and the associated OS overhead.

The basic structure of the program that does this would be:

  while (__AFL_LOOP(1000)) {

    /* Read input data. */
    /* Call library code to be fuzzed. */
    /* Reset state. */

  }

  /* Exit normally */

The numerical value specified within the loop controls the maximum number
of iterations before AFL will restart the process from scratch. This minimizes
the impact of memory leaks and similar glitches; 1000 is a good starting point,
and going much higher increases the likelihood of hiccups without giving you
any real performance benefits.

A more detailed template is shown in ../experimental/persistent_demo/.
Similarly to the previous mode, the feature works only with afl-clang-fast;
#ifdef guards can be used to suppress it when using other compilers.

Note that as with the previous mode, the feature is easy to misuse; if you
do not fully reset the critical state, you may end up with false positives or
waste a whole lot of CPU power doing nothing useful at all. Be particularly
wary of memory leaks and of the state of file descriptors.

PS. Because there are task switches still involved, the mode isn't as fast as
"pure" in-process fuzzing offered, say, by LLVM's LibFuzzer; but it is a lot
faster than the normal fork() model, and compared to in-process fuzzing,
should be a lot more robust.

6) Bonus feature #3: new 'trace-pc-guard' mode
----------------------------------------------

Recent versions of LLVM are shipping with a built-in execution tracing feature
that provides AFL with the necessary tracing data without the need to
post-process the assembly or install any compiler plugins. See:

  http://clang.llvm.org/docs/SanitizerCoverage.html#tracing-pcs-with-guards

As of this writing, the feature is only available on SVN trunk, and is yet to
make it to an official release of LLVM. Nevertheless, if you have a
sufficiently recent compiler and want to give it a try, build afl-clang-fast
this way:

  AFL_TRACE_PC=1 make clean all

Note that this mode is currently about 20% slower than "vanilla" afl-clang-fast,
and about 5-10% slower than afl-clang. This is likely because the
instrumentation is not inlined, and instead involves a function call. On systems
that support it, compiling your target with -flto should help.


xiaozhuanzi123's Projects

afl icon afl

american fuzzy lop - a security-oriented fuzzer

aflplusplus icon aflplusplus

The fuzzer afl++ is afl with community patches, qemu 5.1 upgrade, collision-free coverage, enhanced laf-intel & redqueen, AFLfast++ power schedules, MOpt mutators, unicorn_mode, and a lot more!

aflsmart icon aflsmart

Smart Greybox Fuzzing (https://thuanpv.github.io/publications/TSE19_aflsmart.pdf)

cve-2020-16898 icon cve-2020-16898

CVE-2020-16898 (Bad Neighbor) Microsoft Windows TCP/IP Vulnerability Detection Logic and Rule

damn_vulnerable_c_program icon damn_vulnerable_c_program

An example C program which contains vulnerable code for common types of vulnerabilities. It can be used to show fuzzing concepts.

engine-sim icon engine-sim

Combustion engine simulator that generates realistic audio.

fuzzgoat icon fuzzgoat

A vulnerable C program for testing fuzzers.

login-system icon login-system

In this project you can learn how to create a simple registration and login system using the PHP and MySQL. This tutorial is comprised of two parts: in the first part we'll create a user registration form, and in the second part we'll create a login form, as well as a welcome page and a logout script.

tensorfuzz icon tensorfuzz

A library for performing coverage guided fuzzing of neural networks

wasbook icon wasbook

《Web应用安全权威指南》源代码中文版

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.