Git Product home page Git Product logo

centipede's Introduction

Public announcement

Centipede has been merged into FuzzTest to consolidate fuzzing development - see documentation here for user migration.

This repository is now empty and archived.

centipede's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

centipede's Issues

Would there be a file to record all the dependencies in a machine readable way?

Recording all dependencies in a machine-readable file could be very helpful for users who always want to build and use the latest version of Centipede (e.g., FuzzBench, OSS-FUZZ).
For example, I just noticed that Centipede now requires OpenSSL(libssl-dev) to build, and updated FuzzBench accordingly, I would really appreciate it if we could do that automatically : )

(I really like the way you use clang-flags.txt to store flags.)

How to run tests with AddressSantizer?

Hi Sergey and Kostya,
I am working on a unit test to capture some patterns in the crash log output of ASan required by ClusterFuzz.
For example, the crash path: [sS]aving input to:? [\n]?(.*).
However, I was unsure what's the best way to utilize the existing multi_sanitizer_fuzz_target.

The BUILD file mentions that the sanitizer fuzz target is only here for manual tests.
Would you mind showing me how it is generally used for manual tests?
This will be helpful for me when writing automated tests for it accordingly : )

Thanks!

Is there a way to disable the limit on the number of runs?

The default value of num_runs (1000000000) is very large yet still insufficient for long experiments (e.g., 24 hours).
For example, Centipede passed that number in around 4 hours when running on benchmark curl_fuzzer_http.

Here are a log file showing Centipede reached that number and exited when fuzzing curl_fuzzer_http and a csv file showing that happened in around 4 hours (14400 seconds) and no data was recorded afterwards.
The second file also shows this happened on almost all benchmarks within 24 hours.

A temporary solution could be increasing the value of num_runs, but that is not reliable, given we cannot tell how many runs will be sufficient for all benchmarks.

Would there be a way to disable this limit? Thanks!

`Centipede` cannot detect/report `AddressSanitizer ` errors

Description

Centipede cannot detect ASAN errors. For example, given the following target program:

#include <stdint.h>
#include <stddef.h>
#include <stdlib.h>

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  char *x = (char*)malloc(size * sizeof(char*));
  free(x);
  x[1] = 'a';
  return data[size];
}

ASAN should find two errors:

  1. heap-use-after-free at x[1] = 'a';
  2. stack-buffer-overflow at return data[size];

However Centipede fail to report either.

Reproduction

This behaviour can be reproduced with Scarecrow (branch asan_target):

CENTIPEDE=</path/to/centipede>
FUZZ_TARGET=scarecrow

git clone --branch asan_target [email protected]:Alan32Liu/github-scarecrow.git
cd github-scarecrow

clang++ @$CENTIPEDE/clang-flags.txt -c ./$FUZZ_TARGET.cc -o ./$FUZZ_TARGET.o
clang++ ./$FUZZ_TARGET.o $CENTIPEDE/bazel-bin/libcentipede_runner.pic.a -ldl -lrt -lpthread -o ${FUZZ_TARGET}
clang++ ./$FUZZ_TARGET.o $CENTIPEDE/bazel-bin/libcentipede_runner.pic.a -fsanitize=address -ldl -lrt -lpthread -o ${FUZZ_TARGET}_asan

mkdir workdir
$CENTIPEDE/bazel-bin/centipede --workdir="./workdir" --exit_on_crash=1 --binary=./scarecrow --extra_binaries=./scarecrow_asan

Centipede runs without reporting any error.

Notes

(Hope this can help debug)

  1. To prove that ASAN can detect and report the errors above, I used the following simple PoE:
/* 
 * scarecrow_main.cc 
 * Build&run: clang++ -fsanitize=address scarecrow_main.cc; ./a.out
*/
#include <stdint.h>
#include <stddef.h>
#include <stdlib.h>

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  char *x = (char*)malloc(size * sizeof(char*));
  free(x);
  x[1] = 'a';
  return data[size];
}

int main() {
  const uint8_t data[2] = {1, 2};
  LLVMFuzzerTestOneInput(data, 2);
  return 0;
}

  1. Centipede CAN detect and report LeakSanitizer errors in Scarecrow (branch lsan_target) with the same build&run steps in Sec. Reproduction.

Please let me know if there is anything else I can do to help : )

Does not build cleanly with `-Wsign-compare`

Experimenting with building bits of centipede with Chromium, which uses -Wsign-compare

../../third_party/centipede/src/runner.cc:598:29: error: comparison of integers of different signs: 'int' and 'uint64_t' (aka 'unsigned long') [-Werror,-Wsign-compare]
        rand_r(&seed) % 100 < state.run_time_flags.crossover_level) {

An argument to customise the directory of crash reproducers

Currently, Centipede saves all inputs that crash the target program into crashes/ of workdir/.
Would it be possible to add a feature to allow users to customise which directory to save those inputs?
For example, it would be great if Centipede could take a flag --crashes-dir=</path/to/crashes> and save crash reproducers to the dir </path/to/crashes> at the users' preference.
Thanks!

May I contribute code to this repo?

I was wondering if there would be any chance that you could kindly allow me to create PRs for this repo?
I am planning to push 1) a fix to keep linkopts and 2) a unit test to check if the output format can match the regex used in clusterfuzz, and I would really appreciate it if I could create PR directly : )

If that is inconvenient, I am still happy to create CLs as before (which might take me a bit more time).

Thanks!

conflict between Centipede's and ASAN's interceptors

Centipede intercepts memcmp (and will intercept more in future).
Sanitizers (ASAN, etc) also intercept the same functions.
The way we currently build binaries for Centipede's --extra_binaries= makes these interceptors conflict.

% cat memcmp_fuzz.cc 
#include <cstdint>
#include <cstddef>
#include <cstring>

volatile auto M = &memcmp;

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  char x[3] = {'z', 'z', 'z'};
  if (size != 2) return 0;
  M(x, data, 3);  // should trigger an ASAN report.
  return 0;
}

% clang -O2 -fsanitize=address,fuzzer ./memcmp_fuzz.cc -o lf
% ./lf zz  2>&1 | grep ERROR 
==3371320==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000032 at pc 0x56499c281f3c bp 0x7ffc82201d00 sp 0x7ffc822014b0

libFuzzer+ASAN can easily detect a buffer overflow inside memcmp.

% clang++ @$CENTIPEDE/clang-flags.txt -c memcmp_fuzz.cc 
% clang++ memcmp_fuzz.o ./bazel-bin/libcentipede_runner.pic.a -fsanitize=address -ldl -lrt -lpthread -o memcmp_centipede
% ./memcmp_centipede zz
Centipede fuzz target runner; argv[0]: ./memcmp_centipede flags: (null)
Not using RLIMIT_AS; VmSize is 20480Gb, suspecting ASAN/MSAN/TSAN
<no report>

Internally, we build Centipede's runner together with the target, and the #if !defined(ADDRESS_SANITIZER) inside runner_interceptors.cc avoids this problem.

Creating work directory and corpus directory automatically.

Centipede assumes the existence of workdir and corpus_dir.
It will exit due to a check failure if workdir is missing:

centipede.cc:149] Check failed: !!((appender->Open(env.MakeCorpusPath(shard))).ok())!=false [0!=0]

Or terminate with an uncaught exception if corpus_dir is mssing:

libc++abi: terminating with uncaught exception of type std::__1::__fs::filesystem::filesystem_error: filesystem error: in recursive_directory_iterator: No such file or directory ["/out/corpus_not_exist"]

Would it be a good idea to let Centipede create those two directories when they do not exist?
This will help OSS-Fuzz (and probably other users) : )
Thanks!

Would the be a chance to include *time* in the log messages?

Centipede has very informative log messages, e.g.:

centipede.cc:177] [74000000] new-feature: ft: 16886 cov: 717 cnt: 717 df: 17 cmp: 16152 path: 0 pair: 0 corp: 10002/10061 max/avg 998242 21191 d202/f0 exec/s: 1111.6 mb: 823

Would there be any chance that it could also include how long it has been running?
I would really appreciate that, as it could largely help us to easily identify if Centipede has terminated earlier than expected during evaluation.

Thanks : )

Segmentation Fault when using `AddressSanitizer` with `address_space_limit_mb`

Segmentation Fault

Centipede runs into the following segmentation fault after a check failure:

centipede.cc:543] Batch execution failed; exit code: 1
Log of batch follows: [[[==================
Centipede fuzz target runner; argv[0]: /path/to/target/scarecrow.address flags: :timeout_in_seconds=1200::address_space_limit_mb=8192::rss_limit_mb=4096::crossover_level=50::shmem:arg1=/centipede-shm1-1824572-139817135887936:arg2=/centipede-shm2-1824572-139817135887936:failure_description_path=/tmp/centipede-1824572-139817135887936/failure_description:
timeout_in_seconds: 1200 rss_limit_mb: 4096
=================================================================
mmap failed: Cannot allocate memory
ERROR: Failed to mmap
==================]]]
centipede.cc:552] ReportCrash[0]: the crash occurred when running  /path/to/target/scarecrow.address on 1000 inputs
centipede.cc:594] ReportCrash[0]: executing inputs one-by-one, trying to find the reproducer
centipede_callbacks.cc:143] Check failed: !!(batch_result.Read(outputs_blobseq_))!=false [[1]    1824572 segmentation fault  $CENTIPEDE/bazel-bin/centipede --workdir= /path/to/target/workdir --corpus_dir= /path/to/target/corpus

Reproduce

The SegFault can be reproduced with Scarecrow, a minimum target with a planted memory leak that can be captured by the Address Sanitizer.
The sanitized and unsanitized targets are respectively built by:

# Build target with ASAN
clang++ -fsanitize-coverage=trace-loads -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION -fno-builtin -fsanitize-coverage=trace-pc-guard,pc-table,trace-cmp -O2 -gline-tables-only  -ldl -lrt -lpthread -std=c++11 -Ilib/ -fsanitize=address scarecrow.cc  -o scarecrow.address  $CENTIPEDE/bazel-bin/libcentipede_runner.pic.a
# Build target without ASAN
clang++ -fsanitize-coverage=trace-loads -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION -fno-builtin -fsanitize-coverage=trace-pc-guard,pc-table,trace-cmp -O2 -gline-tables-only  -ldl -lrt -lpthread -std=c++11 -Ilib/ scarecrow.cc  -o scarecrow  $CENTIPEDE/bazel-bin/libcentipede_runner.pic.a

Then the SegFault can be reproduced by:

mkdir -p workdir corpus
$CENTIPEDE/bazel-bin/centipede --workdir=$PWD/workdir --corpus_dir=$PWD/corpus --fork_server=1 --exit_on_crash=0 --timeout=1200 --rss_limit_mb=4096 --address_space_limit_mb=8192 --binary=$PWD/scarecrow --extra_binaries=$PWD/scarecrow.address

While the following two commands are free from the SegFault and work fine:

# Disable address_space_limit
$CENTIPEDE/bazel-bin/centipede --workdir=$PWD/workdir --corpus_dir=$PWD/corpus --fork_server=1 --exit_on_crash=0 --timeout=1200 --rss_limit_mb=4096 --address_space_limit_mb=0 --binary=$PWD/scarecrow --extra_binaries=$PWD/scarecrow.address
# No sanitized binary
$CENTIPEDE/bazel-bin/centipede --workdir=$PWD/workdir --corpus_dir=$PWD/corpus --fork_server=1 --exit_on_crash=0 --timeout=1200 --rss_limit_mb=4096 --address_space_limit_mb=8192 --binary=$PWD/scarecrow

My Guess

Empty corpus file.
The check failure and SegFault happened on the following line:

CHECK(batch_result.Read(outputs_blobseq_));

which points to the batch_result (which I reckon is the result?).
File corpus.0 and directory corpus/ are both empty when the SegFault happens; They were not empty in the other cases above.
Maybe the testcase was not successfully saved into corpus.0 and corpus/?

Please let me know if you could reproduce the SegFault or if more information is required, thanks!

libpng: Could not get PCTable

When trying to fuzz libpng, after compiling it like descripted in README and with clang-flags.txt, I always get the error
I0302 13:38:17.196240 400288 control_flow.cc:81] Fall back to GetPcTableFromBinaryWithTracePC I0302 13:38:17.271314 400288 centipede_callbacks.cc:52] Could not get PCTable, exiting (override with --require_pc_table=0)

I calling centipede with
centipede/bazel-bin/centipede --binary=libpng_read_fuzzer --workdir=test_fuzzers/workdir_centipede

Did I do something wrong calling centipede that way or with the instrumentation flags of clang in clang-flags.txt? Or is a specific clang version necessary? I tried it with clang 17.0 and 12.0

build fails with '__builtin_ia32_crc32di' needs target feature sse4.2

Building centipede seems to currently fail with:

#7 20.99 Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
#7 21.00 In file included from centipede.cc:44:
#7 21.00 In file included from ././centipede.h:26:
#7 21.00 ././corpus.h:77:12: error: '__builtin_ia32_crc32di' needs target feature sse4.2
#7 21.00     return __builtin_ia32_crc32di(feature,
#7 21.00            ^
#7 21.00 ././corpus.h:78:35: error: '__builtin_ia32_crc32di' needs target feature sse4.2
#7 21.00                                   __builtin_ia32_crc32di(feature >> 32, 0)) %
#7 21.00                                   ^
#7 21.00 2 errors generated.
#7 21.11 Target //:centipede failed to build
#7 21.11 Use --verbose_failures to see the command lines of failed build steps.
#7 21.17 INFO: Elapsed time: 16.628s, Critical Path: 2.71s
#7 21.17 INFO: 458 processes: 132 internal, 326 processwrapper-sandbox.
#7 21.17 FAILED: Build did NOT complete successfully
#7 21.18 FAILED: Build did NOT complete successfully

absl is too old for us in Chromium

centipede is using absl from June, while Chromium last updated its absl in late January.

We have some build errors which I assume are caused by this:

../../third_party/centipede/src/environment.cc:442:11: error: no matching function for call to 'StrCat'
    ret = absl::StrCat(".", annotation);
          ^~~~~~~~~~~~
../../third_party/abseil-cpp/absl/strings/str_cat.h:387:34: note: candidate function not viable: no known conversion from 'std::string_view' (aka 'basic_string_view<char>') to 'const AlphaNum' for 2nd argument
ABSL_MUST_USE_RESULT std::string StrCat(const AlphaNum& a, const AlphaNum& b);
                                 ^
../../third_party/abseil-cpp/absl/strings/str_cat.h:383:41: note: candidate function not viable: requires single argument 'a', but 2 arguments were provided
ABSL_MUST_USE_RESULT inline std::string StrCat(const AlphaNum& a) {
                                        ^
../../third_party/abseil-cpp/absl/strings/str_cat.h:388:34: note: candidate function not viable: requires 3 arguments, but 2 were provided
ABSL_MUST_USE_RESULT std::string StrCat(const AlphaNum& a, const AlphaNum& b,
                                 ^
../../third_party/abseil-cpp/absl/strings/str_cat.h:381:41: note: candidate function not viable: requires 0 arguments, but 2 were provided
ABSL_MUST_USE_RESULT inline std::string StrCat() { return std::string(); }
                                        ^
../../third_party/abseil-cpp/absl/strings/str_cat.h:390:34: note: candidate function not viable: requires 4 arguments, but 2 were provided
ABSL_MUST_USE_RESULT std::string StrCat(const AlphaNum& a, const AlphaNum& b,
                                 ^
../../third_party/abseil-cpp/absl/strings/str_cat.h:395:41: note: candidate function template not viable: requires at least 5 arguments, but 2 were provided
ABSL_MUST_USE_RESULT inline std::string StrCat(

and

../../third_party/abseil-cpp/absl/container/internal/raw_hash_set.h:2156:22: error: no matching function for call to object of type 'hasher' (aka 'absl::container_internal::StringHash')
    return find(key, hash_ref()(key));
                     ^~~~~~~~~~
../../third_party/centipede/src/environment.cc:596:31: note: in instantiation of function template specialization 'absl::container_internal::raw_hash_set<absl::container_internal::FlatHashMapPolicy<std::string, bool *>, absl::container_internal::StringHash, absl::container_internal::StringEq, std::allocator<std::pair<const std::string, bool *>>>::find<std::string_view>' requested here
  auto bool_iter = bool_flags.find(name);
                              ^
../../third_party/abseil-cpp/absl/container/internal/hash_function_defaults.h:73:10: note: candidate function not viable: no known conversion from 'const key_arg<string_view>' (aka 'const std::string_view') to 'absl::string_view' for 1st argument
  size_t operator()(absl::string_view v) const {
         ^
../../third_party/abseil-cpp/absl/container/internal/hash_function_defaults.h:76:10: note: candidate function not viable: no known conversion from 'const key_arg<string_view>' (aka 'const std::string_view') to 'const absl::Cord' for 1st argument
  size_t operator()(const absl::Cord& v) const {
         ^
In file included from ../../third_party/centipede/src/environment.cc:24:
In file included from ../../third_party/abseil-cpp/absl/container/flat_hash_map.h:42:
In file included from ../../third_party/abseil-cpp/absl/container/internal/raw_hash_map.h:24:

I haven't confirmed that this is the reason. It might be due to different absl build options in Chromium - yet to check.

Centipede eventually filling /dev/shm and crashing

Hi! I'm running Centipede on a single machine. I'm able to fuzz for around a day, with a large --num_runs, --batch_size=1000, and --j=12. While the run is happening, /dev/shm grows until it reaches a large fraction of the physical RAM of the machine, and eventually dies with SIGBUS on a write to shm. What's the best practice for avoiding this happening? Thanks!

How to give valid initial seed to fuzz a binary?

In the wiki it is mentioned
Input {#input}
A sequence of bytes that can be fed to a target. The input can be an arbitrary bag of bytes, or some structured data, e.g., serialized proto.

Is there a mechanism to provide valid initial input to fuzz binary which centipede will mutate as it finds new coverage.

--cxxopt=libc++ in .bazelrc?

With the OSS-Fuzz toolchain, the build complains unless we add --cxxopt=libc++ --linkopt=-lc++ to .bazelrc.

Would it make sense to add this by default?

Bundle runtime dependencies?

It looks like currently, centipede relies on:

  • libssl
  • objdump
  • anything else?

Would it make sense to vendor these and output these binaries as part of the bazel build? (or at least statically link libssl).

This would minimize hidden runtime dependencies and make integration into fuzzing infra a bit nicer, without having to require us to install additional dependencies in the runtime environment that end up causing issues due to version mismatches later.

Compilation error

Hey, thanks for the project.

Looks like I'm having issues to compile it:

Latest Ubuntu:

symeon@symeon-virtual-machine:~/centipede$ bazel build -c opt :all
Starting local Bazel server and connecting to it...
ERROR: /home/symeon/.cache/bazel/_bazel_symeon/5ec8276f5bdcf7b99a907654e1e3434f/external/com_google_absl/absl/time/BUILD.bazel:74:8: @com_google_absl//absl/time:time_test: no such attribute 'env' in 'cc_test' rule
ERROR: /home/symeon/.cache/bazel/_bazel_symeon/5ec8276f5bdcf7b99a907654e1e3434f/external/com_google_absl/absl/time/BUILD.bazel:123:8: @com_google_absl//absl/time:time_benchmark: no such attribute 'env' in 'cc_test' rule
ERROR: /home/symeon/centipede/BUILD:170:11: Target '@com_google_absl//absl/time:time' contains an error and its package is in error and referenced by '//:rusage_profiler'
ERROR: Analysis of target '//:rusage_profiler' failed; build aborted: Analysis failed
INFO: Elapsed time: 11.779s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (27 packages loaded, 53 targets configured)
    Fetching @local_config_cc; fetching

and Kali:

┌──(kali㉿kali)-[~/Desktop/centipede]
└─$ bazel build -c opt :all
ERROR: /home/kali/.cache/bazel/_bazel_kali/fad804b26592531cd34735550fb1eab2/external/com_google_absl/absl/BUILD.bazel:92:15: no such target '@platforms//cpu:wasm32': target 'wasm32' not declared in package 'cpu' defined by /home/kali/.cache/bazel/_bazel_kali/fad804b26592531cd34735550fb1eab2/external/platforms/cpu/BUILD and referenced by '@com_google_absl//absl:platforms_wasm32'
ERROR: While resolving configuration keys for @com_google_absl//absl:wasm_3: Analysis failed
ERROR: While resolving configuration keys for @com_google_absl//absl/synchronization:synchronization: Analysis failed
ERROR: Analysis of target '//:centipede_binary_test' failed; build aborted: Analysis failed
INFO: Elapsed time: 0.348s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 7 targets configured)
    Fetching @remote_java_tools_linux; fetching

Am I doing something terribly wrong?
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.