Git Product home page Git Product logo

atheris's Introduction

Atheris: A Coverage-Guided, Native Python Fuzzer

Atheris is a coverage-guided Python fuzzing engine. It supports fuzzing of Python code, but also native extensions written for CPython. Atheris is based off of libFuzzer. When fuzzing native code, Atheris can be used in combination with Address Sanitizer or Undefined Behavior Sanitizer to catch extra bugs.

Installation Instructions

Atheris supports Linux (32- and 64-bit) and Mac OS X, Python versions 3.6-3.10.

You can install prebuilt versions of Atheris with pip:

pip3 install atheris

These wheels come with a built-in libFuzzer, which is fine for fuzzing Python code. If you plan to fuzz native extensions, you may need to build from source to ensure the libFuzzer version in Atheris matches your Clang version.

Building from Source

Atheris relies on libFuzzer, which is distributed with Clang. If you have a sufficiently new version of clang on your path, installation from source is as simple as:

# Build latest release from source
pip3 install --no-binary atheris atheris
# Build development code from source
git clone https://github.com/google/atheris.git
cd atheris
pip3 install .

If you don't have clang installed or it's too old, you'll need to download and build the latest version of LLVM. Follow the instructions in Installing Against New LLVM below.

Mac

Apple Clang doesn't come with libFuzzer, so you'll need to install a new version of LLVM from head. Follow the instructions in Installing Against New LLVM below.

Installing Against New LLVM

# Building LLVM
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
mkdir build
cd build
cmake -DLLVM_ENABLE_PROJECTS='clang;compiler-rt' -G "Unix Makefiles" ../llvm
make -j 10  # This step is very slow

# Installing Atheris
CLANG_BIN="$(pwd)/bin/clang" pip3 install <whatever>

Using Atheris

Example

#!/usr/bin/python3

import atheris

with atheris.instrument_imports():
  import some_library
  import sys

def TestOneInput(data):
  some_library.parse(data)

atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()

When fuzzing Python, Atheris will report a failure if the Python code under test throws an uncaught exception.

Python coverage

Atheris collects Python coverage information by instrumenting bytecode. There are 3 options for adding this instrumentation to the bytecode:

  • You can instrument the libraries you import:

    with atheris.instrument_imports():
      import foo
      from bar import baz

    This will cause instrumentation to be added to foo and bar, as well as any libraries they import.

  • Or, you can instrument individual functions:

    @atheris.instrument_func
    def my_function(foo, bar):
      print("instrumented")
  • Or finally, you can instrument everything:

    atheris.instrument_all()

    Put this right before atheris.Setup(). This will find every Python function currently loaded in the interpreter, and instrument it. This might take a while.

Atheris can additionally instrument regular expression checks, e.g. re.search. To enable this feature, you will need to add: atheris.enabled_hooks.add("RegEx") To your script before your code calls re.compile. Internally this will import the re module and instrument the necessary functions. This is currently an experimental feature.

Similarly, Atheris can instrument str methods; currently only str.startswith and str.endswith are supported. To enable this feature, add atheris.enabled_hooks.add("str"). This is currently an experimental feature.

Why am I getting "No interesting inputs were found"?

You might see this error:

ERROR: no interesting inputs were found. Is the code instrumented for coverage? Exiting.

You'll get this error if the first 2 calls to TestOneInput didn't produce any coverage events. Even if you have instrumented some Python code, this can happen if the instrumentation isn't reached in those first 2 calls. (For example, because you have a nontrivial TestOneInput). You can resolve this by adding an atheris.instrument_func decorator to TestOneInput, using atheris.instrument_all(), or moving your TestOneInput function into an instrumented module.

Visualizing Python code coverage

Examining which lines are executed is helpful for understanding the effectiveness of your fuzzer. Atheris is compatible with coverage.py: you can run your fuzzer using the coverage.py module as you would for any other Python program. Here's an example:

python3 -m coverage run your_fuzzer.py -atheris_runs=10000  # Times to run
python3 -m coverage html
(cd htmlcov && python3 -m http.server 8000)

Coverage reports are only generated when your fuzzer exits gracefully. This happens if:

  • you specify -atheris_runs=<number>, and that many runs have elapsed.
  • your fuzzer exits by Python exception.
  • your fuzzer exits by sys.exit().

No coverage report will be generated if your fuzzer exits due to a crash in native code, or due to libFuzzer's -runs flag (use -atheris_runs). If your fuzzer exits via other methods, such as SIGINT (Ctrl+C), Atheris will attempt to generate a report but may be unable to (depending on your code). For consistent reports, we recommend always using -atheris_runs=<number>.

If you'd like to examine coverage when running with your corpus, you can do that with the following command:

python3 -m coverage run your_fuzzer.py corpus_dir/* -atheris_runs=$(( 1 + $(ls corpus_dir | wc -l) ))

This will cause Atheris to run on each file in <corpus-dir>, then exit. Note: atheris use empty data set as the first input even if there is no empty file in <corpus_dir>. Importantly, if you leave off the -atheris_runs=$(ls corpus_dir | wc -l), no coverage report will be generated.

Using coverage.py will significantly slow down your fuzzer, so only use it for visualizing coverage; don't use it all the time.

Fuzzing Native Extensions

In order for fuzzing native extensions to be effective, your native extensions must be instrumented. See Native Extension Fuzzing for instructions.

Structure-aware Fuzzing

Atheris is based on a coverage-guided mutation-based fuzzer (LibFuzzer). This has the advantage of not requiring any grammar definition for generating inputs, making its setup easier. The disadvantage is that it will be harder for the fuzzer to generate inputs for code that parses complex data types. Often the inputs will be rejected early, resulting in low coverage.

Atheris supports custom mutators (as offered by LibFuzzer) to produce grammar-aware inputs.

Example (Atheris-equivalent of the example in the LibFuzzer docs):

@atheris.instrument_func
def TestOneInput(data):
  try:
    decompressed = zlib.decompress(data)
  except zlib.error:
    return

  if len(decompressed) < 2:
    return

  try:
    if decompressed.decode() == 'FU':
      raise RuntimeError('Boom')
  except UnicodeDecodeError:
    pass

To reach the RuntimeError crash, the fuzzer needs to be able to produce inputs that are valid compressed data and satisfy the checks after decompression. It is very unlikely that Atheris will be able to produce such inputs: mutations on the input data will most probably result in invalid data that will fail at decompression-time.

To overcome this issue, you can define a custom mutator function (equivalent to LLVMFuzzerCustomMutator). This example produces valid compressed data. To enable Atheris to make use of it, pass the custom mutator function to the invocation of atheris.Setup.

def CustomMutator(data, max_size, seed):
  try:
    decompressed = zlib.decompress(data)
  except zlib.error:
    decompressed = b'Hi'
  else:
    decompressed = atheris.Mutate(decompressed, len(decompressed))
  return zlib.compress(decompressed)

atheris.Setup(sys.argv, TestOneInput, custom_mutator=CustomMutator)
atheris.Fuzz()

As seen in the example, the custom mutator may request Atheris to mutate data using atheris.Mutate() (this is equivalent to LLVMFuzzerMutate).

You can experiment with custom_mutator_example.py and see that without the mutator Atheris would not be able to find the crash, while with the mutator this is achieved in a matter of seconds.

$ python3 example_fuzzers/custom_mutator_example.py --no_mutator
[...]
#2      INITED cov: 2 ft: 2 corp: 1/1b exec/s: 0 rss: 37Mb
#524288 pulse  cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 262144 rss: 37Mb
#1048576        pulse  cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 349525 rss: 37Mb
#2097152        pulse  cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 299593 rss: 37Mb
#4194304        pulse  cov: 2 ft: 2 corp: 1/1b lim: 4096 exec/s: 279620 rss: 37Mb
[...]

$ python3 example_fuzzers/custom_mutator_example.py
[...]
INFO: found LLVMFuzzerCustomMutator (0x7f9c989fb0d0). Disabling -len_control by default.
[...]
#2      INITED cov: 2 ft: 2 corp: 1/1b exec/s: 0 rss: 37Mb
#3      NEW    cov: 4 ft: 4 corp: 2/11b lim: 4096 exec/s: 0 rss: 37Mb L: 10/10 MS: 1 Custom-
#12     NEW    cov: 5 ft: 5 corp: 3/21b lim: 4096 exec/s: 0 rss: 37Mb L: 10/10 MS: 7 Custom-CrossOver-Custom-CrossOver-Custom-ChangeBit-Custom-
 === Uncaught Python exception: ===
RuntimeError: Boom
Traceback (most recent call last):
  File "example_fuzzers/custom_mutator_example.py", line 62, in TestOneInput
    raise RuntimeError('Boom')
[...]

Custom crossover functions (equivalent to LLVMFuzzerCustomCrossOver) are also supported. You can pass the custom crossover function to the invocation of atheris.Setup. See its usage in custom_crossover_fuzz_test.py.

Structure-aware Fuzzing with Protocol Buffers

libprotobuf-mutator has bindings to use it together with Atheris to perform structure-aware fuzzing using protocol buffers.

See the documentation for atheris_libprotobuf_mutator.

Integration with OSS-Fuzz

Atheris is fully supported by OSS-Fuzz, Google's continuous fuzzing service for open source projects. For integrating with OSS-Fuzz, please see https://google.github.io/oss-fuzz/getting-started/new-project-guide/python-lang.

API

The atheris module provides three key functions: instrument_imports(), Setup() and Fuzz().

In your source file, import all libraries you wish to fuzz inside a with atheris.instrument_imports():-block, like this:

# library_a will not get instrumented
import library_a

with atheris.instrument_imports():
    # library_b will get instrumented
    import library_b

Generally, it's best to import atheris first and then import all other libraries inside of a with atheris.instrument_imports() block.

Next, define a fuzzer entry point function and pass it to atheris.Setup() along with the fuzzer's arguments (typically sys.argv). Finally, call atheris.Fuzz() to start fuzzing. You must call atheris.Setup() before atheris.Fuzz().

instrument_imports(include=[], exclude=[])

  • include: A list of fully-qualified module names that shall be instrumented.
  • exclude: A list of fully-qualified module names that shall NOT be instrumented.

This should be used together with a with-statement. All modules imported in said statement will be instrumented. However, because Python imports all modules only once, this cannot be used to instrument any previously imported module, including modules required by Atheris. To add coverage to those modules, use instrument_all() instead.

A full list of unsupported modules can be retrieved as follows:

import sys
import atheris
print(sys.modules.keys())

instrument_func(func)

  • func: The function to instrument.

This will instrument the specified Python function and then return func. This is typically used as a decorator, but can be used to instrument individual functions too. Note that the func is instrumented in-place, so this will affect all call points of the function.

This cannot be called on a bound method - call it on the unbound version.

instrument_all()

This will scan over all objects in the interpreter and call instrument_func on every Python function. This works even on core Python interpreter functions, something which instrument_imports cannot do.

This function is experimental.

Setup(args, test_one_input, internal_libfuzzer=None)

  • args: A list of strings: the process arguments to pass to the fuzzer, typically sys.argv. This argument list may be modified in-place, to remove arguments consumed by the fuzzer. See the LibFuzzer docs for a list of such options.
  • test_one_input: your fuzzer's entry point. Must take a single bytes argument. This will be repeatedly invoked with a single bytes container.
  • internal_libfuzzer: Indicates whether libfuzzer will be provided by atheris or by an external library (see native_extension_fuzzing.md). If unspecified, Atheris will determine this automatically. If fuzzing pure Python, leave this as True.

Fuzz()

This starts the fuzzer. You must have called Setup() before calling this function. This function does not return.

In many cases Setup() and Fuzz() could be combined into a single function, but they are separated because you may want the fuzzer to consume the command-line arguments it handles before passing any remaining arguments to another setup function.

FuzzedDataProvider

Often, a bytes object is not convenient input to your code being fuzzed. Similar to libFuzzer, we provide a FuzzedDataProvider to translate these bytes into other input forms.

You can construct the FuzzedDataProvider with:

fdp = atheris.FuzzedDataProvider(input_bytes)

The FuzzedDataProvider then supports the following functions:

def ConsumeBytes(count: int)

Consume count bytes.

def ConsumeUnicode(count: int)

Consume unicode characters. Might contain surrogate pair characters, which according to the specification are invalid in this situation. However, many core software tools (e.g. Windows file paths) support them, so other software often needs to too.

def ConsumeUnicodeNoSurrogates(count: int)

Consume unicode characters, but never generate surrogate pair characters.

def ConsumeString(count: int)

Alias for ConsumeBytes in Python 2, or ConsumeUnicode in Python 3.

def ConsumeInt(int: bytes)

Consume a signed integer of the specified size (when written in two's complement notation).

def ConsumeUInt(int: bytes)

Consume an unsigned integer of the specified size.

def ConsumeIntInRange(min: int, max: int)

Consume an integer in the range [min, max].

def ConsumeIntList(count: int, bytes: int)

Consume a list of count integers of size bytes.

def ConsumeIntListInRange(count: int, min: int, max: int)

Consume a list of count integers in the range [min, max].

def ConsumeFloat()

Consume an arbitrary floating-point value. Might produce weird values like NaN and Inf.

def ConsumeRegularFloat()

Consume an arbitrary numeric floating-point value; never produces a special type like NaN or Inf.

def ConsumeProbability()

Consume a floating-point value in the range [0, 1].

def ConsumeFloatInRange(min: float, max: float)

Consume a floating-point value in the range [min, max].

def ConsumeFloatList(count: int)

Consume a list of count arbitrary floating-point values. Might produce weird values like NaN and Inf.

def ConsumeRegularFloatList(count: int)

Consume a list of count arbitrary numeric floating-point values; never produces special types like NaN or Inf.

def ConsumeProbabilityList(count: int)

Consume a list of count floats in the range [0, 1].

def ConsumeFloatListInRange(count: int, min: float, max: float)

Consume a list of count floats in the range [min, max]

def PickValueInList(l: list)

Given a list, pick a random value

def ConsumeBool()

Consume either True or False.

atheris's People

Contributors

aidenrhall avatar babenek avatar carl-smith avatar charfa avatar davidkorczynski avatar disconnect3d avatar fanquake avatar fmeum avatar jmhodges avatar jonathanmetzman avatar jvoisin avatar kapilt avatar kinow avatar ligurio avatar pd-fkie avatar raphaelts3 avatar rchen152 avatar rwgk avatar ryroe avatar theshiftedbit avatar volker-weissmann avatar zac-hd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

atheris's Issues

GCOV instrumentation `.gcda` fails when run by atheris

We found that:

  1. A Python library whose kernel is implemented in C/C++ with GCOV instrumentation;
  2. Running it natively produces the coverage data .gcda;
  3. Running it with Atheris will not produce the coverage .gcda;

Since atheris is a fuzzing tool and coverage information is very important as feedback, can we somehow try to support having dumped .gcda while running atheris? Thanks!

The re-producible is shown here:

https://colab.research.google.com/drive/1LQ69TIQqDZeuSC7FYOQxnAIGAwNFNQ6P?usp=sharing

Unable to cast Python instance to C++

Hi,

I am exploring Atheris; I followed all the steps described in the Repo. However, when executing: python3 example_fuzzers/custom_mutator_example.py appears:

Traceback (most recent call last):
  File "/Users/alduck/Documents/GitHub/Exploring-atheris/example_fuzzers/custom_mutator_example.py", line 75, in <module>
    atheris.Setup(sys.argv, TestOneInput, custom_mutator='CustomMutator')
RuntimeError: Unable to cast Python instance to C++ type (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for details)

I googled around and found one answer . It suggests pinning the Atheris version OSS-Fuzz to the latest; however, I am not using OSS-Fuzz. I am just running the example from the Atheris repo.

Fails to import base64

INFO: Instrumenting base64
Traceback (most recent call last):
  File "/home/hex/fuzz/test.py", line 5, in <module>
    import base64
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 879, in exec_module
  File "/home/hex/.local/lib/python3.10/site-packages/atheris/import_hook.py", line 196, in get_code
    return patch_code(code, self._trace_dataflow)
  File "/home/hex/.local/lib/python3.10/site-packages/atheris/instrument_bytecode.py", line 756, in patch_code
    inst.consts[i] = patch_code(inst.consts[i], trace_dataflow, nested=True)
  File "/home/hex/.local/lib/python3.10/site-packages/atheris/instrument_bytecode.py", line 758, in patch_code
    return inst.to_code()
  File "/home/hex/.local/lib/python3.10/site-packages/atheris/instrument_bytecode.py", line 457, in to_code
    self._check_state()
  File "/home/hex/.local/lib/python3.10/site-packages/atheris/instrument_bytecode.py", line 372, in _check_state
    listing[i].check_state()
  File "/home/hex/.local/lib/python3.10/site-packages/atheris/instrument_bytecode.py", line 208, in check_state
    assert jump_arg_bytes(self.arg) == self.reference
AssertionError
import sys
import atheris

with atheris.instrument_imports():
  import base64

def testinput(data):
  base64.encode(data)

atheris.Setup(sys.argv, testinput)
atheris.Fuzz()

Installation doesn't work with clang 15

First it fails to find libFuzzer because libFuzzer's new location is clang/15.0.0/lib/x86_64-unknown-linux-gnu/libclang_rt.fuzzer_no_main.a
Even after passing this as LIBFUZZER_LIB, there's another failure compiling (I think because a replace in setup.py fails):

 ...
   /usr/bin/ld: /tmp/tmp.K7iUCoxQjJ/sanitizer.a(fuzzer_no_main.o): in function `__sanitizer_cov_8bit_counters_init':                                                                                     [72/1984]
    (.text.__sanitizer_cov_8bit_counters_init+0x0): multiple definition of `__sanitizer_cov_8bit_counters_init'; /usr/local/lib/clang/15.0.0/lib/x86_64-unknown-linux-gnu/libclang_rt.fuzzer_no_main.a(fuzzer_no_ma
in.o):(.text.__sanitizer_cov_8bit_counters_init+0x0): first defined here                                                                                                                                           
    /usr/bin/ld: /tmp/tmp.K7iUCoxQjJ/sanitizer.a(fuzzer_no_main.o): in function `__sanitizer_weak_hook_strcasestr':                                                                                                
    (.text.__sanitizer_weak_hook_strcasestr+0x0): multiple definition of `__sanitizer_weak_hook_strcasestr'; /usr/local/lib/clang/15.0.0/lib/x86_64-unknown-linux-gnu/libclang_rt.fuzzer_no_main.a(fuzzer_no_main.o
):(.text.__sanitizer_weak_hook_strcasestr+0x0): first defined here                                                                                                                                                 
    /usr/bin/ld: /tmp/tmp.K7iUCoxQjJ/sanitizer.a(fuzzer_no_main.o): in function `__sanitizer_cov_trace_cmp4':                                                                                                      
    (.text.__sanitizer_cov_trace_cmp4+0x0): multiple definition of `__sanitizer_cov_trace_cmp4'; /usr/local/lib/clang/15.0.0/lib/x86_64-unknown-linux-gnu/libclang_rt.fuzzer_no_main.a(fuzzer_no_main.o):(.text.__s
anitizer_cov_trace_cmp4+0x0): first defined here                                                                                                                                                                   
    /usr/bin/ld: /tmp/tmp.K7iUCoxQjJ/sanitizer.a(fuzzer_no_main.o): in function `LLVMFuzzerRunDriver':                                                                                                             
    (.text.LLVMFuzzerRunDriver+0x0): multiple definition of `LLVMFuzzerRunDriver'; /usr/local/lib/clang/15.0.0/lib/x86_64-unknown-linux-gnu/libclang_rt.fuzzer_no_main.a(fuzzer_no_main.o):(.text.LLVMFuzzerRunDriv
er+0x0): first defined here   
   clang-15: error: linker command failed with exit code 1 (use -v to see invocation)                                                                                                                             
    Command '['/tmp/pip-req-build-57t5_brl/setup_utils/merge_libfuzzer_sanitizer.sh', '/usr/local/lib/clang/15.0.0/lib/x86_64-unknown-linux-gnu/libclang_rt.fuzzer_no_main.a', '/usr/local/lib/clang/15.0.0/lib/x86
_64-unknown-linux-gnu/libclang_rt.fuzzer_no_main.a', 'ubsan_init_standalone_preinit.cc.o ubsan_init_standalone_preinit.cpp.o']' returned non-zero exit status 1

To reproduce you can install atheris in this docker image: gcr.io/oss-fuzz-base/base-builder-testing-roll-clang

Regarding Documentation

Summary:

Is there any documentation related to " How can we integrate Django Project with atheris"

Generate wheels for all platforms via cibuildwheel

What

Use https://github.com/pypa/cibuildwheel/ to build wheels for major platforms.

Why

This would make it easier to install and use this project, when you're not on a Linux machine and trying to fuzz Python code.

In my case, I'm trying to fuzz the piece of code that serves a a foundational piece of the Python packaging ecosystem (like parsing requirements within pip): https://github.com/pypa/packaging/. I'd like for MacOS/Linux wheels that cover all the supported Python versions. :)

Coverage for Tensorflow not show up

Hi, I am working on using Atheris to find the test coverage of tensorflow.
For testcode here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/security/fuzzing/abs_fuzz.py
By using command below:

python3 -m coverage run abs_fuzz.py -atheris_runs=10000
python3 -m coverage report

The results only shows as below:

Name                Stmts   Miss  Cover
---------------------------------------
abs_fuzz.py            15      0   100%
python_fuzzing.py      52     14    73%
---------------------------------------
TOTAL                  67     14    79%

Why does that coverage report does not show how abs function covered the tensorflow library itself(like going though thousands of lines)? Even I try to instrumented all, it does not show up
Is there any way to make tensorflow's coverage showing up?
Thank you!!!

Aggregate all string literals during instrumentation

Hi,

as we discussed yesterday on my talk about fuzzing in Python it may be good if Atheris extracted all string literals to use them later during mutating of an input.

Apparently Atheris currently extract literals that are directly compared with a variable like this: if x == "abc" but it doesn't extract literals that are used in other ways like: if x.startswith("some string".

On the other hand, one downside to extracting all string literals are logging string formatting messages which may not be that useful for fuzzing. But I am not sure how big of a problem that is, and, maybe there should be an option to inspect the extracted strings and influence them.

How to keep fuzzing after finding one bug/exception?

I am interested in starting a fuzzing process that does not stop after coming across one bug and that just keep on running, looking for other possible defects. Is there a way to accomplish this?

I am not very familiar with python so if the answer I overlooked is fairly obvious, my apologies.

Coverage from regexp

I'm playing with using atheris on some code that uses regexp (stdlib re) for control flow. This is a common pattern for things like HTTP routing in Python.

Unfortunately, back the _sre module is in C (and isn't compiled with -fsanitize=fuzzer-no-link) there's no coverage for it. This means that atheris really struggles to make progress through it, effectively just throwing darts at it.

If atheris was somehow "regexp aware" that could alleviate this situation. I don't have a well thought out proposal, but off the cuff something that might work is:

In the tracer, if you are calling a method and self is a regexp instance, grab the .pattern attribute off of it, pass that to sre_parse.parse() and then take any repeated sequences of LITERAL opcodes and feed them into the dictionary. psuedocode:

def handle_re(r):
    ops = sre_parse.parse(r.pattern)
    idx = 0
    while idx < len(ops):
        op, val = ops[idx]
        if op == LITERAL:
            end_idx = find_next_non_literal_op(ops[idx+1:])
            insert_to_corpus("".join(v for _, v in ops[idx:end_idx]))
        elif op == SUBPATTERN:
            do_this_algorithm(val[3])

Atheris for micropython

I have the following open question:
If I would compile micropython for Ubuntu (the port/unix) using clang with -fsanitizer passed in CFLAGS and LDFLAGS (linked with libFuzzer).
would it be possible to use Atheris to test the API of modules compiled into this micropython for Ubuntu?
If so, what would be the approach to have a small POC.

Thanks

PS I'm not very knowledgeable on micropython but from blogs it is said to have CPython 3.4's features.
There's a sys.settrace(), but I don't know whether there's the opcode tracing like in CPython 3.8.

Potential file conflict with other Python packages

When packaging this for Arch Linux, I noticed that there is the potential for file conflicts with other Python packages.

Here's the contents of the 3.10 wheel from PyPi (which corresponds with a distribution-level Python package):

atheris-2.0.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
atheris_no_libfuzzer.py
ubsan_with_fuzzer.so
asan_with_fuzzer.so
ubsan_cxx_with_fuzzer.so
libclang_rt.fuzzer_no_main-x86_64.a
atheris-2.0.12.dist-info/WHEEL
atheris-2.0.12.dist-info/RECORD
atheris-2.0.12.dist-info/top_level.txt
atheris-2.0.12.dist-info/LICENSE
atheris-2.0.12.dist-info/entry_points.txt
atheris-2.0.12.dist-info/METADATA
atheris/instrument_bytecode.py
atheris/native.cpython-310-x86_64-linux-gnu.so
atheris/custom_mutator_and_crossover_fuzz_test.py
atheris/pyinstaller_coverage_test.py
atheris/import_hook.py
atheris/version_dependent.py
atheris/fuzzed_data_provider_test.py
atheris/coverage_test.py
atheris/utils.py
atheris/fuzz_test_lib.py
atheris/custom_mutator_fuzz_test.py
atheris/custom_mutator.cpython-310-x86_64-linux-gnu.so
atheris/custom_crossover.cpython-310-x86_64-linux-gnu.so
atheris/regex_match_generation_test.py
atheris/function_hooks.py
atheris/fuzz_test.py
atheris/__init__.py
atheris/core_without_libfuzzer.cpython-310-x86_64-linux-gnu.so
atheris/hook-atheris.py
atheris/custom_crossover_fuzz_test.py
atheris/coverage_test_helper.py
atheris/core_with_libfuzzer.cpython-310-x86_64-linux-gnu.so

Of interest are these particular files at the package root:

  • ubsan_with_fuzzer.so
  • asan_with_fuzzer.so
  • ubsan_cxx_with_fuzzer.so
  • libclang_rt.fuzzer_no_main-x86_64.a

I'm wondering if these are leftover build artifacts, are you able to confirm if these are supposed to be present?

Wrong crash line in new python version

Hi! I've been fuzzing with atheris and found the problem with wrong crash line in new python versions.

For example I will use this simple wrapper fuzz.py:

import atheris                                                                   
                                                                                 
with atheris.instrument_imports():                                               
    import sys                                                                   
    import module                                                                
                                                                                 
def TestOneInput(data):                                                          
    module.crash(data)                                                           
                                                                                 
def main():                                                                      
    atheris.Setup(sys.argv, TestOneInput)                                        
    atheris.Fuzz()                                                               
                                                                                 
if __name__ == "__main__":                                                       
    main()

with this simple module module.py:

def crash(data):                                                                 
    print("wrong line!")                                                         
    return 1/0

I will use an empty file as a crash file.

Output If I use python3.8 ./fuzz.py crash command is:

INFO: Instrumenting module
INFO: Using built-in libfuzzer
WARNING: Failed to find function "__sanitizer_acquire_crash_state".
WARNING: Failed to find function "__sanitizer_print_stack_trace".
WARNING: Failed to find function "__sanitizer_set_death_callback".
INFO: Seed: 663688448
./fuzz.py: Running 1 inputs 1 time(s) each.
Running: crash
wrong line!

 === Uncaught Python exception: ===
ZeroDivisionError: division by zero
Traceback (most recent call last):
  File "./fuzz.py", line 8, in TestOneInput
    module.crash(data)
  File "/home/hkctkuy/atheris/module.py", line 3, in crash
    return 1/0

==3581292== ERROR: libFuzzer: fuzz target exited
SUMMARY: libFuzzer: fuzz target exited

Output If I use python3.10 ./fuzz.py crash command is:

INFO: Instrumenting module
INFO: Using built-in libfuzzer
WARNING: Failed to find function "__sanitizer_acquire_crash_state".
WARNING: Failed to find function "__sanitizer_print_stack_trace".
WARNING: Failed to find function "__sanitizer_set_death_callback".
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 32997823
./fuzz.py: Running 1 inputs 1 time(s) each.
Running: crash
wrong line!

 === Uncaught Python exception: ===
ZeroDivisionError: division by zero
Traceback (most recent call last):
  File "/home/hkctkuy/atheris/./fuzz.py", line 8, in TestOneInput
    module.crash(data)
  File "/home/hkctkuy/atheris/module.py", line 2, in crash
    print("wrong line!")
ZeroDivisionError: division by zero

==3581304== ERROR: libFuzzer: fuzz target exited
SUMMARY: libFuzzer: fuzz target exited

As you can see I have one line offset in Traceback.

I have larger offsets in more complex projects.

WARNING: Failed to find function "__sanitizer_acquire_crash_state".

I tried to reproduce the example, but failed:

$ย pip install atheris                                                                  
Defaulting to user installation because normal site-packages is not writeable
Collecting atheris
  Using cached atheris-2.0.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (29.2 MB)
Installing collected packages: atheris
Successfully installed atheris-2.0.12
$ย python myfuzz.py                                                                     
INFO: Using built-in libfuzzer
WARNING: Failed to find function "__sanitizer_acquire_crash_state".
WARNING: Failed to find function "__sanitizer_print_stack_trace".
WARNING: Failed to find function "__sanitizer_set_death_callback".
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 925967955
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2	INITED exec/s: 0 rss: 35Mb
ERROR: no interesting inputs were found. Is the code instrumented for coverage? Exiting.
$    

Building from source shows the same problem.

System Information:

OS: ArchLinux

$ python --version
Python 3.10.6
$ llvm-config --version                                                                
14.0.6
$ clang --version
clang version 14.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

The llvm fuzzer seems to be working:

$ clang main.cpp -fsanitize=fuzzer,undefined,address                                   
$ย ./a.out                                                                              
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 2166017756
INFO: Loaded 1 modules   (1 inline 8-bit counters): 1 [0x56553ef3fbe0, 0x56553ef3fbe1), 
INFO: Loaded 1 PC tables (1 PCs): 1 [0x56553ef3fbe8,0x56553ef3fbf8), 
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2	INITED cov: 1 ft: 1 corp: 1/1b exec/s: 0 rss: 31Mb
^J==18337== libFuzzer: run interrupted; exiting
$

Pass the target to Fuzz() instead of Setup()

For my projects, I implemented a generic fuzzer based on Atheris that can target any function with

$ myfuzzer [-libFuzzer_args ...] --target=package.module:function [corpus ...]

The options with a single dash are consumed by Atheris (for libFuzzer), the options with two dashes are used by my fuzzer and the positional arguments are the corpus (used by both libFuzzer and my fuzzer). Depending on some options, my fuzzer will either call atheris.Fuzz() or perform an other action using the target and the corpus files. Here is a simplified version that only calls atheris.Fuzz():

#!/usr/bin/env python
import argparse
import sys

import atheris


@atheris.instrument_func
def fuzz_target(data):
    """Dummy target."""
    return data


@atheris.instrument_func
def fuzz_proxy(buffer):
    fuzz_target(buffer)


def load_target(spec):
    module_name, _, function_name = spec.partition(":")
    with atheris.instrument_imports():
        import importlib
        module = importlib.import_module(module_name)
    import functools
    return functools.reduce(getattr, function_name.split("."), module)


def main(args=None):
    global fuzz_target
    parser = argparse.ArgumentParser()
    parser.add_argument("--target", type=load_target, required=True)
    parser.add_argument("--action", default="fuzz")
    parser.add_argument("corpus", nargs="*")
    args = parser.parse_args(args)
    if args.action == "fuzz":
        fuzz_target = args.target
        atheris.Fuzz()


if __name__ == '__main__':
    main(atheris.Setup(sys.argv, fuzz_proxy)[1:])

The separation between Setup() and Fuzz() is useful for my use case. However, I cannot understand why the target must be passed to Setup(). I have read the code of Setup() and Fuzz() and found nothing explaining why the target is required as early as Setup(): the target is simply stored by Setup() until Fuzz() is called.

As seen above, I have to jump through hoops to fuzz the real target by setting up a proxy target. By passing the target to Fuzz() instead of Setup(), the code would be greatly simplified:

#!/usr/bin/env python
import argparse
import sys

import atheris


def load_target(spec):
    module_name, _, function_name = spec.partition(":")
    with atheris.instrument_imports():
        import importlib
        module = importlib.import_module(module_name)
    import functools
    return functools.reduce(getattr, function_name.split("."), module)


def main(args=None):
    parser = argparse.ArgumentParser()
    parser.add_argument("--target", type=load_target, required=True)
    parser.add_argument("--action", default="fuzz")
    parser.add_argument("corpus", nargs="*")
    args = parser.parse_args(args)
    if args.action == "fuzz":
        atheris.Fuzz(args.target)


if __name__ == '__main__':
    main(atheris.Setup(sys.argv)[1:])

This change wouldn't make things more complicated when the target is hard-coded, and could be introduced in a backward compatible way.

Coverage doesn't increase

I'm running the following fuzzer for mat2 in a virtualenv:

import os
import sys

import atheris

with atheris.instrument_imports():
    from libmat2 import parser_factory

def TestOneInput(data):
    with open('/tmp/mat2_fuzz', 'wb') as f:
        f.write(data)
    try:
        p, _ = parser_factory.get_parser('/tmp/mat2_fuzz')
        if p:
            p.get_meta()
            p.remove_all()
            p, _ = parser_factory.get_parser('/tmp/mat2_fuzz')
            p.get_meta()
    except ValueError:
        pass
    os.remove('/tmp/mat2_fuzz')

atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()

and got the following results:

(ven) jvoisin@grimhilde 18:08 ~/dev/mat2 python3 fuzz.py ./tests/data/
INFO: Instrumenting libmat2
INFO: Instrumenting libmat2.exiftool
INFO: Instrumenting json
INFO: Instrumenting json.decoder
INFO: Instrumenting json.scanner
INFO: Instrumenting json.encoder
INFO: Instrumenting logging
INFO: Instrumenting traceback
INFO: Instrumenting linecache
INFO: Instrumenting tokenize
INFO: Instrumenting token
INFO: Instrumenting weakref
INFO: Instrumenting _weakrefset
INFO: Instrumenting string
INFO: Instrumenting _string
WARNING: It looks like this module is imported by a custom loader. Atheris has experimental support for this. However, it may be incompatible with certain libraries. If you experience unusual errors or poor coverage collection, try atheris.instrument_all() instead, add enable_loader_override=False to instrument_imports(), or file an issue on GitHub.
INFO: Instrumenting threading
INFO: Instrumenting atexit
INFO: Instrumenting shutil
INFO: Instrumenting fnmatch
INFO: Instrumenting errno
INFO: Instrumenting zlib
INFO: Instrumenting bz2
INFO: Instrumenting _compression
INFO: Instrumenting lzma
INFO: Instrumenting pwd
INFO: Instrumenting grp
INFO: Instrumenting subprocess
INFO: Instrumenting signal
INFO: Instrumenting _posixsubprocess
INFO: Instrumenting select
INFO: Instrumenting selectors
INFO: Instrumenting math
INFO: Instrumenting libmat2.abstract
INFO: Instrumenting libmat2.bubblewrap
INFO: Instrumenting tempfile
INFO: Instrumenting random
INFO: Instrumenting bisect
INFO: Instrumenting _bisect
INFO: Instrumenting _random
INFO: Instrumenting _sha512
INFO: Instrumenting libmat2.video
INFO: Instrumenting libmat2.parser_factory
INFO: Instrumenting glob
INFO: Instrumenting mimetypes
INFO: Instrumenting urllib
INFO: Instrumenting urllib.parse
INFO: Instrumenting libmat2.images
INFO: Instrumenting imghdr
INFO: Instrumenting cairo
INFO: Instrumenting gi
INFO: Instrumenting pkgutil
INFO: Instrumenting gi._error
INFO: Instrumenting gi.repository
INFO: Instrumenting gi.importer
INFO: Instrumenting gi.module
INFO: Instrumenting gi.types
INFO: Instrumenting gi._constants
INFO: Instrumenting gi.docstring
INFO: Instrumenting gi._propertyhelper
INFO: Instrumenting gi._signalhelper
INFO: Instrumenting gi.overrides
INFO: Instrumenting gi.overrides.GLib
INFO: Instrumenting gi.overrides.GLib
INFO: Instrumenting socket
INFO: Instrumenting _socket
INFO: Instrumenting array
INFO: Instrumenting gi._ossighelper
INFO: Instrumenting __future__
INFO: Instrumenting gi._option
INFO: Instrumenting optparse
INFO: Instrumenting textwrap
INFO: Instrumenting gettext
INFO: Instrumenting locale
INFO: Instrumenting gi.overrides.GObject
INFO: Instrumenting gi.overrides.GObject
INFO: Instrumenting gi.overrides.Gio
INFO: Instrumenting gi.overrides.Gio
INFO: Instrumenting gi.overrides.GdkPixbuf
INFO: Instrumenting gi.overrides.GdkPixbuf
INFO: Instrumenting libmat2.epub
INFO: Instrumenting uuid
INFO: Instrumenting platform
INFO: Instrumenting zipfile
INFO: Instrumenting binascii
INFO: Instrumenting struct
INFO: Instrumenting _struct
INFO: Instrumenting xml
INFO: Instrumenting xml.etree
INFO: Instrumenting xml.etree.ElementTree
INFO: Instrumenting xml.etree.ElementPath
INFO: Instrumenting _elementtree
INFO: Instrumenting copy
INFO: Instrumenting pyexpat
INFO: Instrumenting libmat2.archive
INFO: Instrumenting datetime
INFO: Instrumenting _datetime
INFO: Instrumenting tarfile
INFO: Instrumenting libmat2.office
INFO: Instrumenting libmat2.torrent
INFO: Instrumenting libmat2.harmless
INFO: Instrumenting libmat2.audio
INFO: Instrumenting mutagen
INFO: Instrumenting mutagen._util
INFO: Instrumenting decimal
INFO: Instrumenting numbers
INFO: Instrumenting mutagen._file
INFO: Instrumenting mutagen._tags
INFO: Instrumenting libmat2.pdf
INFO: Instrumenting distutils
INFO: Instrumenting distutils.version
INFO: Instrumenting libmat2.web
INFO: Instrumenting html
INFO: Instrumenting html.entities
INFO: Instrumenting html.parser
INFO: Instrumenting _markupbase
INFO: Using built-in libfuzzer
WARNING: Failed to find function "__sanitizer_acquire_crash_state".
WARNING: Failed to find function "__sanitizer_print_stack_trace".
WARNING: Failed to find function "__sanitizer_set_death_callback".
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 3911405642
INFO: Loaded 1 modules   (14598 inline 8-bit counters): 14598 [0x10d4970, 0x10d8276), 
INFO: Loaded 1 PC tables (14598 PCs): 14598 [0x10f0650,0x11296b0), 
INFO:       50 files found in ./tests/data/
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: seed corpus: files: 50 min: 1b max: 4383613b total: 10698367b rss: 55Mb
#51	INITED cov: 20 ft: 20 corp: 1/1b exec/s: 0 rss: 59Mb
#32768	pulse  cov: 20 ft: 20 corp: 1/1b lim: 325 exec/s: 10922 rss: 59Mb
#65536	pulse  cov: 20 ft: 20 corp: 1/1b lim: 652 exec/s: 9362 rss: 59Mb
#131072	pulse  cov: 20 ft: 20 corp: 1/1b lim: 1300 exec/s: 9362 rss: 59Mb
#262144	pulse  cov: 20 ft: 20 corp: 1/1b lim: 2611 exec/s: 9362 rss: 59Mb
#524288	pulse  cov: 20 ft: 20 corp: 1/1b lim: 5212 exec/s: 9362 rss: 59Mb
โ€ฆ

I tried with enable_loader_override=False, but it didn't change anything.

am I doing something wrong?

User exit callback feature?

Is it possible to give a callback that will be run at exit? I'm using custom mutators and crossovers, and I want to save their (random-number generators') state before Atheris exits. My goal is to be able to resume fuzzing from the saved state.

"Permission denied" when using -merge=1

I was fuzzing an interpreter using the following command successfully:

python fuzz/fuzz_interpreter.py fuzz/interpreter_corpus

I decided to attempt to minimize the corpus, writing it to a new location, using the following command:

mkdir -p fuzz/corpus/interpreter
python fuzz/fuzz_interpreter.py -merge=1 fuzz/corpus/interpreter fuzz/interpreter_corpus

And got a series of errors like

...
sh: 1: fuzz/fuzz_interpreter.py: Permission denied
MERGE-OUTER: attempt 684
sh: 1: fuzz/fuzz_interpreter.py: Permission denied
MERGE-OUTER: attempt 685
sh: 1: fuzz/fuzz_interpreter.py: Permission denied
MERGE-OUTER: the control file has 45216 bytes
MERGE-OUTER: consumed 0Mb (39Mb rss) to parse the control file
MERGE-OUTER: 0 new files with 0 new features added; 0 new coverage edges

After re-running with strace -f, I found the issue is that somewhere there's an attempt to rerun the script directly:

[pid 2581944] execve("fuzz/fuzz_interpreter.py", ["fuzz/fuzz_interpreter.py", "-artifact_prefix=fuzz/artifacts/"..., "fuzz/corpus/interpreter/", "fuzz/interpreter_corpus/", "-merge_control_file=/tmp/libFuzz"..., "-merge_inner=1"], 0x556aa3cfb5c8 /* 72 vars */) = -1 EACCES (Permissรฃo negada)

I've set the shebang line #!/usr/bin/env python and set it to executable, and it worked (reduced from 685 samples to 242!).

If this is intended behavior, than documentation should be updated to call out this requirement, and make sure that it always includes the shebang line in examples.

Are separate calls to `atheris.Setup()` and `atheris.Fuzz()` really necessary?

Given that Fuzz() accepts no arguments, and Setup() is only useful if you then call Fuzz(), do these really need to be exposed as separate functions at the Python level?

Why not (the equivalent of):

def fuzz(
    args: typing.List[str],
    test_one_input: typing.Callable[[bytes], object],
    *,
    enable_python_coverage: bool = True,
    enable_python_opcode_coverage: bool = True,
) -> typing.NoReturn:
    atheris.Setup(args, test_one_input, enable_python_coverage, enable_python_opcode_coverage)
    atheris.Fuzz()

This could even be provided as a convenience method, with the current API left in place for backwards-compatibility.

Please add a tag for the latest release

Hello, package maintainer for Arch Linux here.

I'm just wondering if the maintainers of this fine project would be able to add a git tag for the latest release (2.0.12)?

Integrate Slipcover to Atheris

From the beginning, Atheris used 1 sys.settrace-like instrumentation, same instrumentation used in Coverage.py (2, 3):

Atheris is a native Python extension, and is typically compiled with libFuzzer linked in. When you initialize Atheris, it registers a tracer with CPython to collect information about Python code flow. This tracer can keep track of every line reached and every function executed.

In commit e76f637 sys.settrace has been replaced with bytecode instrumentation 4.

There is a Python library SlipCover that tracks a Python program as it runs and reports on the parts that executed and those that didn't. SlipCover uses just-in-time instrumentation and de-instrumentation. It has proved coverage precise and near-zero overhead 5 6.

I propose to reuse slipcover source code in Atheris.

Footnotes

  1. https://security.googleblog.com/2020/12/how-atheris-python-fuzzer-works.html โ†ฉ

  2. https://explog.in/notes/settrace.html โ†ฉ

  3. https://nedbatchelder.com/text/trace-function.html โ†ฉ

  4. https://github.com/google/atheris/blob/33a1322dadaf4d561f7c99b5641cb33748d4036c/hooking.md?plain=1#L14 โ†ฉ

  5. https://github.com/plasma-umass/slipcover โ†ฉ

  6. SlipCover: Near Zero-Overhead Code Coverage for Python -- Juan Altmayer Pizzorno, Emery D. Berger โ†ฉ

publish wheels for macOS Python 3.10

The PyPI package is missing macOS wheels for Python 3.10 (but includes versions up to 3.9)

Please can you build an publish Python 3.10 wheels for macOS users.

Instrumenting time long

Hi, I am currently working on collecting tensorflow library coverage with different tests by Atheris. However, each time it took long time to instrumenting the tf library.
Is there a way that I could set all test1.py, test2.py ... files to instrument same library so that I only need to run instrumentation once for all test.py files?

Atheris in unittests?

Hi guys,

Is there any way to integrate the Atheris into python unit tests?
For example, Django tests, pytest etc.
May be you could provide with some examples how to integrate it into test frameworks?
Or maybe are there plans to implement the fuzzer running via test frameworks?

Thanks in advance,
Andrei

Possible performance hack

Hi, only just noticed this project. I built something similar a couple of years ago (https://github.com/risicle/cpytraceafl) and thought I'd point out the performance hack it uses, seeing as atheris will likely get far more use than cpytraceafl ever will.

The trick it uses is to simply rewrite the bytecode's lnotab so that the "line number" increments only at the beginning of every new BB. Then it's just a matter of adding the tiny native trace function via sys.settrace() (which internally uses the lnotab to detect the beginning of new "lines").

It's ugly, but it means zero extra bytecode needs to be executed at a trace point, no attributes looked up, and the entire instrumentation process is done in 165 lines.

All based on a Ned Batchelder trick of course.

Bytecode instrumentation vs. settrace

Hello atheris team,

I would like to propose an improvement of atheris
that can at least double the execution speed of your fuzzer.

In atheris you use sys.settrace for coverage
collection but this is not the fastest approach.
It is possible to instrument .py files the way AFL
instruments .c files in order to get rid of all the runtime overhead.
At the beginning of each basic
block a call to atheris.log(idx) can be inserted
that like __afl_maybe_log(idx) treats idx as an
offset into the counter-region and increments the
corresponding byte. This way the number of instrumented
locations is known at compile-time and a call to
atheris.reg(num) can be inserted at the very beginning
of the module that tells atheris how many counters
this module needs.
At the beginning of a fuzz target where all modules
are imported atheris collects the atheris.reg calls
and keeps track of the overall number of counters needed.
In atheris.Fuzz() a memory region of a suitable size
can be allocated and used as the region for counters.

This is perfectly compatible with your
approach of fuzzing C/C++ extensions. The extensions
just have to be built with -fsanitize-coverage=inline-8bit-counters.

However this is not perfect. It does not support

  1. data-flow guided fuzzing
  2. differential fuzzing since the counters of each module start at 0

But I can image these limitations are not hard to overcome.

Here is a POC I've built that implements the concept described above: python-fuzz-poc
I've used the POC to fuzz some libraries and found quite some bugs with it and the results are very promising. On average I get a 5x performance boost in contrast to sys.settrace.

I am very interested in your opinion about this approach. What do you think of this idea?

Several issues/questions

I have spent some time trying to fuzz a native library with Atheris, however, I seem to have some issues.

Consider the PR here google/oss-fuzz#4754

Some of the questions I have:

  1. When fuzzing an extension where the native code is not hit in the first two iterations of libFuzzer because the python code I target does some initial processing on the data such that the native extension is not hit, then I get a complaint from libFuzzer that there is no coverage and thus it exits. I feel this is somewhat of a limitation and we should allow the fuzzer to run for a while, i.e. naturally explore the python code and reach the native code eventually. Am not sure if I am completely off here, but this has caused issues for me for a while.

  2. What is expected behaviour of providing command line arguments to atheris, in particular providing corpus and seed files?

  3. Finally, in relation to a compilation on OSS-Fuzz, what is the expected linking approach? Do we need to do the final linking of the native code with clang++?

MAP_ANONYMOUS not available on older MacOS

Thanks for atheris, and congrats on the 2.0.8 release (and a tag on the repo pointed to it)!

It looks like the use of MAP_ANONYMOUS in counters.cc seems to raise the minimum supported OSX to 10.10. While this doesn't strictly bother me, downstream on conda-forge we generally try to keep the compatibility window as wide as possible. To get something out the door, this patch uses the apparently equivalent MAP_ANON and passes the simple smoke test from the repo, but we only apply it on OSX.

I guess it would be good to either:

  • handle these somewhat older OSX versions (not sure what it would take to support both)
  • declare the minimum OSX (in which case I can bump the minimum deployment target)

Thanks again!

Instrumentation fails for the "calendar" module from the standard library

Code:

import atheris

with atheris.instrument_imports():
    import calendar

Output:

INFO: Instrumenting calendar
Traceback (most recent call last):
  File "/home/cepe/repos/croniter-fuzzing/atheris_test.py", line 4, in <module>
    import calendar
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 879, in exec_module
  File "/home/cepe/venvs/croniter-fuzzing/lib/python3.10/site-packages/atheris/import_hook.py", line 196, in get_code
    return patch_code(code, self._trace_dataflow)
  File "/home/cepe/venvs/croniter-fuzzing/lib/python3.10/site-packages/atheris/instrument_bytecode.py", line 756, in patch_code
    inst.consts[i] = patch_code(inst.consts[i], trace_dataflow, nested=True)
  File "/home/cepe/venvs/croniter-fuzzing/lib/python3.10/site-packages/atheris/instrument_bytecode.py", line 758, in patch_code
    return inst.to_code()
  File "/home/cepe/venvs/croniter-fuzzing/lib/python3.10/site-packages/atheris/instrument_bytecode.py", line 457, in to_code
    self._check_state()
  File "/home/cepe/venvs/croniter-fuzzing/lib/python3.10/site-packages/atheris/instrument_bytecode.py", line 372, in _check_state
    listing[i].check_state()
  File "/home/cepe/venvs/croniter-fuzzing/lib/python3.10/site-packages/atheris/instrument_bytecode.py", line 208, in check_state
    assert jump_arg_bytes(self.arg) == self.reference
AssertionError

More useful output on NEW_FUNC -- include function names

libFuzzer will print a handy NEW_FUNC output when running to show the first time it executes a new function. This is very helpful when developing a fuzzer to get a sense of the coverage you're achieving. Unfortunate with atheris, it seems to always be address only -- no function name:

#201019 NEW    cov: 7502 ft: 11769 corp: 91/1040b lim: 32 exec/s: 665 rss: 44Mb L: 24/32 MS: 3 ChangeBinInt-ChangeBit-ShuffleBytes-
        NEW_FUNC[1/2]: 0x238e9e5
        NEW_FUNC[2/2]: 0x238e9e9

I imagine this will require some wiring up to get libFuzzer to know about Python function names, but if there were a way to make it work, that'd be a boon for fuzzer development.

Issue in installing package atheris-libprotobuf-mutator on python base image

Hi Team ,

We are trying to install atheris-libprotobuf-mutator package on python base image. but we are getting issue's while installing the same. we have already installed atheris and bazel packages.

Command :- pip3 install atheris-libprotobuf-mutator

Below is the error:-
[184 / 197] Compiling src/google/protobuf/descriptor.cc; 10s processwrapper-sandbox ... (12 actions running)
#0 38.09 [190 / 197] Compiling src/google/protobuf/descriptor.cc; 11s processwrapper-sandbox ... (6 actions running)
#0 38.49 ERROR: /root/.cache/bazel/_bazel_root/6bd533e1255e6b9a4a9b8d4c9dccd8c0/external/com_google_pybind11_protobuf/pybind11_protobuf/BUILD:45:15: Compiling pybind11_protobuf/native_proto_caster.cc failed: (Exit 1): gcc failed: error executing command (from target @com_google_pybind11_protobuf//pybind11_protobuf:native_proto_caster) /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 60 arguments skipped)

subprocess.CalledProcessError: Command '['/usr/bin/bazel', 'build', '--define', "PYTHON_BIN_PATH='/usr/local/bin/python'", '-c', 'opt', '--cxxopt=-std=c++17', '//:_mutator.so']' returned non-zero exit status 1.
#0 38.70 error: subprocess-exited-with-error
#0 38.70
#0 38.70 ร— Building wheel for atheris-libprotobuf-mutator (pyproject.toml) did not run successfully.
#0 38.70 โ”‚ exit code: 1
#0 38.70 โ•ฐโ”€> See above for output.

#0 38.71 ERROR: Could not build wheels for atheris-libprotobuf-mutator, which is required to install pyproject.toml-based projects

Help with installation on ARM

Hi,

I'm just wondering if anyone might be able to point me in the right direction, with installing
atheris on ARM (using a raspberry pi 4, with the 32 bit OS).

I installed llvm/clang 11.

And am doing:

LIBFUZZER_LIB=/usr/lib/llvm-11/lib/libFuzzer.a CLANG_BIN=/usr/bin/clang CC=/usr/bin/clang CXX=/usr/bin/clang++ pip3 install .

But get:

Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Processing /home/pi/Repositories/atheris
Building wheels for collected packages: atheris
  Building wheel for atheris (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-rsosxi4i/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-rsosxi4i/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-tvtqegpo
       cwd: /tmp/pip-req-build-rsosxi4i/
  Complete output (70 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-armv7l-3.9
  copying atheris_no_libfuzzer.py -> build/lib.linux-armv7l-3.9
  creating build/lib.linux-armv7l-3.9/atheris
  copying src/coverage_test_helper.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/version_dependent.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/regex_match_generation_test.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/coverage_test.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/fuzzed_data_provider_test.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/custom_mutator_fuzz_test.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/fuzz_test_lib.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/hook-atheris.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/import_hook.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/function_hooks.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/__init__.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/fuzz_test.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/utils.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/pyinstaller_coverage_test.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/custom_mutator_and_crossover_fuzz_test.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/instrument_bytecode.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/custom_crossover_fuzz_test.py -> build/lib.linux-armv7l-3.9/atheris
  running build_ext
  Your libFuzzer version is too old, but it's possible to attempt an in-place upgrade. Trying that now.
  Your libFuzzer is up-to-date.
  creating tmp
  /usr/bin/clang -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.9 -c /tmp/tmppjc2uk99.cpp -o tmp/tmppjc2uk99.o -std=c++14
  building 'atheris.native' extension
  creating build/temp.linux-armv7l-3.9
  creating build/temp.linux-armv7l-3.9/src
  creating build/temp.linux-armv7l-3.9/src/native
  /usr/bin/clang -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO='2.0.12' -DATHERIS_MODULE_NAME=native -I/tmp/pip-req-build-rsosxi4i/.eggs/pybind11-2.10.0-py3.9.egg/pybind11/include -I/usr/include/python3.9 -c src/native/atheris.cc -o build/temp.linux-armv7l-3.9/src/native/atheris.o -Wno-deprecated-declarations -Wno-attributes -std=c++14
  /usr/bin/clang -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO='2.0.12' -DATHERIS_MODULE_NAME=native -I/tmp/pip-req-build-rsosxi4i/.eggs/pybind11-2.10.0-py3.9.egg/pybind11/include -I/usr/include/python3.9 -c src/native/fuzzed_data_provider.cc -o build/temp.linux-armv7l-3.9/src/native/fuzzed_data_provider.o -Wno-deprecated-declarations -Wno-attributes -std=c++14
  src/native/fuzzed_data_provider.cc:161:23: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned int') [-Wsign-compare]
      for (int i = 0; i < bytes; ++i) {
                      ~ ^ ~~~~~
  src/native/fuzzed_data_provider.cc:212:23: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned int') [-Wsign-compare]
      for (int i = 0; i < bytes; ++i) {
                      ~ ^ ~~~~~
  src/native/fuzzed_data_provider.cc:284:44: warning: implicit conversion from 'unsigned long long' to 'const double' changes value from 18446744073709551615 to 18446744073709551616 [-Wimplicit-const-int-float-conversion]
  const double kUInt64ToProbabilityDivisor = std::numeric_limits<uint64_t>::max();
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  src/native/fuzzed_data_provider.cc:370:21: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned int') [-Wsign-compare]
    for (int i = 0; i < count; ++i) {
                    ~ ^ ~~~~~
  4 warnings generated.
  /usr/bin/clang -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO='2.0.12' -DATHERIS_MODULE_NAME=native -I/tmp/pip-req-build-rsosxi4i/.eggs/pybind11-2.10.0-py3.9.egg/pybind11/include -I/usr/include/python3.9 -c src/native/util.cc -o build/temp.linux-armv7l-3.9/src/native/util.o -Wno-deprecated-declarations -Wno-attributes -std=c++14
  /usr/bin/clang++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-z,relro -g -fwrapv -O2 -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-armv7l-3.9/src/native/atheris.o build/temp.linux-armv7l-3.9/src/native/fuzzed_data_provider.o build/temp.linux-armv7l-3.9/src/native/util.o -o build/lib.linux-armv7l-3.9/atheris/native.cpython-39-arm-linux-gnueabihf.so
  building 'atheris.core_with_libfuzzer' extension
  /usr/bin/clang -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO='2.0.12' -DATHERIS_MODULE_NAME=core_with_libfuzzer -I/tmp/pip-req-build-rsosxi4i/.eggs/pybind11-2.10.0-py3.9.egg/pybind11/include -I/usr/include/python3.9 -c src/native/core.cc -o build/temp.linux-armv7l-3.9/src/native/core.o -Wno-deprecated-declarations -Wno-attributes -std=c++14
  /usr/bin/clang -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO='2.0.12' -DATHERIS_MODULE_NAME=core_with_libfuzzer -I/tmp/pip-req-build-rsosxi4i/.eggs/pybind11-2.10.0-py3.9.egg/pybind11/include -I/usr/include/python3.9 -c src/native/counters.cc -o build/temp.linux-armv7l-3.9/src/native/counters.o -Wno-deprecated-declarations -Wno-attributes -std=c++14
  /usr/bin/clang -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO='2.0.12' -DATHERIS_MODULE_NAME=core_with_libfuzzer -I/tmp/pip-req-build-rsosxi4i/.eggs/pybind11-2.10.0-py3.9.egg/pybind11/include -I/usr/include/python3.9 -c src/native/timeout.cc -o build/temp.linux-armv7l-3.9/src/native/timeout.o -Wno-deprecated-declarations -Wno-attributes -std=c++14
  src/native/timeout.cc:122:6: error: non-constant-expression cannot be narrowed from type 'long long' to '__time_t' (aka 'long') in initializer list [-Wc++11-narrowing]
      {timeout_secs / 2 + 1, 0}, { timeout_secs / 2 + 1, 0 }
       ^~~~~~~~~~~~~~~~~~~~
  src/native/timeout.cc:122:6: note: insert an explicit cast to silence this issue
      {timeout_secs / 2 + 1, 0}, { timeout_secs / 2 + 1, 0 }
       ^~~~~~~~~~~~~~~~~~~~
       static_cast<__time_t>( )
  src/native/timeout.cc:122:34: error: non-constant-expression cannot be narrowed from type 'long long' to '__time_t' (aka 'long') in initializer list [-Wc++11-narrowing]
      {timeout_secs / 2 + 1, 0}, { timeout_secs / 2 + 1, 0 }
                                   ^~~~~~~~~~~~~~~~~~~~
  src/native/timeout.cc:122:34: note: insert an explicit cast to silence this issue
      {timeout_secs / 2 + 1, 0}, { timeout_secs / 2 + 1, 0 }
                                   ^~~~~~~~~~~~~~~~~~~~
                                   static_cast<__time_t>( )
  2 errors generated.
  error: command '/usr/bin/clang' failed with exit code 1
  ----------------------------------------
  ERROR: Failed building wheel for atheris
  Running setup.py clean for atheris

By changing:

-    {timeout_secs / 2 + 1, 0}, { timeout_secs / 2 + 1, 0 }
+    {static_cast<__time_t>(timeout_secs / 2 + 1), 0}, { static_cast<__time_t>(timeout_secs / 2 + 1), 0 }

The error seems to go away, but I get a linker err:

Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Processing /home/pi/Repositories/atheris
Building wheels for collected packages: atheris
  Building wheel for atheris (setup.py) ... /
/
error
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-d8qayg97/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-d8qayg97/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-k04l70wj
       cwd: /tmp/pip-req-build-d8qayg97/
  Complete output (87 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-armv7l-3.9
  copying atheris_no_libfuzzer.py -> build/lib.linux-armv7l-3.9
  creating build/lib.linux-armv7l-3.9/atheris
  copying src/coverage_test_helper.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/version_dependent.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/regex_match_generation_test.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/coverage_test.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/fuzzed_data_provider_test.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/custom_mutator_fuzz_test.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/fuzz_test_lib.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/hook-atheris.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/import_hook.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/function_hooks.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/__init__.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/fuzz_test.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/utils.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/pyinstaller_coverage_test.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/custom_mutator_and_crossover_fuzz_test.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/instrument_bytecode.py -> build/lib.linux-armv7l-3.9/atheris
  copying src/custom_crossover_fuzz_test.py -> build/lib.linux-armv7l-3.9/atheris
  running build_ext
  Your libFuzzer version is too old, but it's possible to attempt an in-place upgrade. Trying that now.
  Your libFuzzer is up-to-date.
  creating tmp
  /usr/bin/clang -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.9 -c /tmp/tmpk1dbls7t.cpp -o tmp/tmpk1dbls7t.o -std=c++14
  building 'atheris.native' extension
  creating build/temp.linux-armv7l-3.9
  creating build/temp.linux-armv7l-3.9/src
  creating build/temp.linux-armv7l-3.9/src/native
  /usr/bin/clang -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO='2.0.12' -DATHERIS_MODULE_NAME=native -I/tmp/pip-req-build-d8qayg97/.eggs/pybind11-2.10.0-py3.9.egg/pybind11/include -I/usr/include/python3.9 -c src/native/atheris.cc -o build/temp.linux-armv7l-3.9/src/native/atheris.o -Wno-deprecated-declarations -Wno-attributes -std=c++14
  /usr/bin/clang -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO='2.0.12' -DATHERIS_MODULE_NAME=native -I/tmp/pip-req-build-d8qayg97/.eggs/pybind11-2.10.0-py3.9.egg/pybind11/include -I/usr/include/python3.9 -c src/native/fuzzed_data_provider.cc -o build/temp.linux-armv7l-3.9/src/native/fuzzed_data_provider.o -Wno-deprecated-declarations -Wno-attributes -std=c++14
  src/native/fuzzed_data_provider.cc:161:23: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned int') [-Wsign-compare]
      for (int i = 0; i < bytes; ++i) {
                      ~ ^ ~~~~~
  src/native/fuzzed_data_provider.cc:212:23: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned int') [-Wsign-compare]
      for (int i = 0; i < bytes; ++i) {
                      ~ ^ ~~~~~
  src/native/fuzzed_data_provider.cc:284:44: warning: implicit conversion from 'unsigned long long' to 'const double' changes value from 18446744073709551615 to 18446744073709551616 [-Wimplicit-const-int-float-conversion]
  const double kUInt64ToProbabilityDivisor = std::numeric_limits<uint64_t>::max();
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  src/native/fuzzed_data_provider.cc:370:21: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned int') [-Wsign-compare]
    for (int i = 0; i < count; ++i) {
                    ~ ^ ~~~~~
  4 warnings generated.
  /usr/bin/clang -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO='2.0.12' -DATHERIS_MODULE_NAME=native -I/tmp/pip-req-build-d8qayg97/.eggs/pybind11-2.10.0-py3.9.egg/pybind11/include -I/usr/include/python3.9 -c src/native/util.cc -o build/temp.linux-armv7l-3.9/src/native/util.o -Wno-deprecated-declarations -Wno-attributes -std=c++14
  /usr/bin/clang++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-z,relro -g -fwrapv -O2 -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-armv7l-3.9/src/native/atheris.o build/temp.linux-armv7l-3.9/src/native/fuzzed_data_provider.o build/temp.linux-armv7l-3.9/src/native/util.o -o build/lib.linux-armv7l-3.9/atheris/native.cpython-39-arm-linux-gnueabihf.so
  building 'atheris.core_with_libfuzzer' extension
  /usr/bin/clang -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO='2.0.12' -DATHERIS_MODULE_NAME=core_with_libfuzzer -I/tmp/pip-req-build-d8qayg97/.eggs/pybind11-2.10.0-py3.9.egg/pybind11/include -I/usr/include/python3.9 -c src/native/core.cc -o build/temp.linux-armv7l-3.9/src/native/core.o -Wno-deprecated-declarations -Wno-attributes -std=c++14
  /usr/bin/clang -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO='2.0.12' -DATHERIS_MODULE_NAME=core_with_libfuzzer -I/tmp/pip-req-build-d8qayg97/.eggs/pybind11-2.10.0-py3.9.egg/pybind11/include -I/usr/include/python3.9 -c src/native/counters.cc -o build/temp.linux-armv7l-3.9/src/native/counters.o -Wno-deprecated-declarations -Wno-attributes -std=c++14
  /usr/bin/clang -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO='2.0.12' -DATHERIS_MODULE_NAME=core_with_libfuzzer -I/tmp/pip-req-build-d8qayg97/.eggs/pybind11-2.10.0-py3.9.egg/pybind11/include -I/usr/include/python3.9 -c src/native/timeout.cc -o build/temp.linux-armv7l-3.9/src/native/timeout.o -Wno-deprecated-declarations -Wno-attributes -std=c++14
  /usr/bin/clang -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO='2.0.12' -DATHERIS_MODULE_NAME=core_with_libfuzzer -I/tmp/pip-req-build-d8qayg97/.eggs/pybind11-2.10.0-py3.9.egg/pybind11/include -I/usr/include/python3.9 -c src/native/tracer.cc -o build/temp.linux-armv7l-3.9/src/native/tracer.o -Wno-deprecated-declarations -Wno-attributes -std=c++14
  src/native/tracer.cc:65:21: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned int') [-Wsign-compare]
    for (int i = 0; i < n; ++i) {
                    ~ ^ ~
  1 warning generated.
  /usr/bin/clang -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION_INFO='2.0.12' -DATHERIS_MODULE_NAME=core_with_libfuzzer -I/tmp/pip-req-build-d8qayg97/.eggs/pybind11-2.10.0-py3.9.egg/pybind11/include -I/usr/include/python3.9 -c src/native/util.cc -o build/temp.linux-armv7l-3.9/src/native/util.o -Wno-deprecated-declarations -Wno-attributes -std=c++14
  /usr/bin/clang++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-z,relro -g -fwrapv -O2 -g -ffile-prefix-map=/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-armv7l-3.9/src/native/core.o build/temp.linux-armv7l-3.9/src/native/counters.o build/temp.linux-armv7l-3.9/src/native/timeout.o build/temp.linux-armv7l-3.9/src/native/tracer.o build/temp.linux-armv7l-3.9/src/native/util.o -o build/lib.linux-armv7l-3.9/atheris/core_with_libfuzzer.cpython-39-arm-linux-gnueabihf.so /tmp/tmp.wm7rnraAGA.a
  /usr/bin/ld: /tmp/tmp.wm7rnraAGA.a(FuzzerTracePC.o)(.text+0x18fc): R_ARM_TLS_LE32 relocation not permitted in shared object
  /tmp/tmp.wm7rnraAGA.a(FuzzerTracePC.o): in function `fuzzer::TracePC::RecordInitialStack()':
  (.text+0x18fc): dangerous relocation: unsupported relocation
  /usr/bin/ld: /tmp/tmp.wm7rnraAGA.a(FuzzerTracePC.o)(.text+0x1920): R_ARM_TLS_LE32 relocation not permitted in shared object
  /tmp/tmp.wm7rnraAGA.a(FuzzerTracePC.o): in function `fuzzer::TracePC::GetMaxStackOffset() const':
  (.text+0x1920): dangerous relocation: unsupported relocation
  /usr/bin/ld: /tmp/tmp.wm7rnraAGA.a(FuzzerTracePC.o)(.text._ZTW21__sancov_lowest_stack[_ZTW21__sancov_lowest_stack]+0x14): R_ARM_TLS_LE32 relocation not permitted in shared object
  /tmp/tmp.wm7rnraAGA.a(FuzzerTracePC.o): in function `TLS wrapper function for __sancov_lowest_stack':
  (.text._ZTW21__sancov_lowest_stack[_ZTW21__sancov_lowest_stack]+0x14): dangerous relocation: unsupported relocation
  /usr/bin/ld: /tmp/tmp.wm7rnraAGA.a(FuzzerLoop.o)(.text+0x7e0): R_ARM_TLS_LE32 relocation not permitted in shared object
  /tmp/tmp.wm7rnraAGA.a(FuzzerLoop.o): in function `fuzzer::Fuzzer::Fuzzer(int (*)(unsigned char const*, unsigned int), fuzzer::InputCorpus&, fuzzer::MutationDispatcher&, fuzzer::FuzzingOptions)':
  (.text+0x7e0): dangerous relocation: unsupported relocation
  /usr/bin/ld: /tmp/tmp.wm7rnraAGA.a(FuzzerLoop.o)(.text+0xd54): R_ARM_TLS_LE32 relocation not permitted in shared object
  /tmp/tmp.wm7rnraAGA.a(FuzzerLoop.o): in function `fuzzer::Fuzzer::AlarmCallback()':
  (.text+0xd54): dangerous relocation: unsupported relocation
  /usr/bin/ld: /tmp/tmp.wm7rnraAGA.a(FuzzerLoop.o)(.text+0x2294): R_ARM_TLS_LE32 relocation not permitted in shared object
  /tmp/tmp.wm7rnraAGA.a(FuzzerLoop.o): in function `fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned int)':
  (.text+0x2294): dangerous relocation: unsupported relocation
  /usr/bin/ld: /tmp/tmp.wm7rnraAGA.a(FuzzerLoop.o)(.text+0x2518): R_ARM_TLS_LE32 relocation not permitted in shared object
  /tmp/tmp.wm7rnraAGA.a(FuzzerLoop.o): in function `fuzzer::Fuzzer::GetCurrentUnitInFuzzingThead(unsigned char const**) const':
  (.text+0x2518): dangerous relocation: unsupported relocation
  /usr/bin/ld: /tmp/tmp.wm7rnraAGA.a(FuzzerLoop.o)(.text._ZTWN6fuzzer6Fuzzer10IsMyThreadE[_ZTWN6fuzzer6Fuzzer10IsMyThreadE]+0x14): R_ARM_TLS_LE32 relocation not permitted in shared object
  /tmp/tmp.wm7rnraAGA.a(FuzzerLoop.o): in function `TLS wrapper function for fuzzer::Fuzzer::IsMyThread':
  (.text._ZTWN6fuzzer6Fuzzer10IsMyThreadE[_ZTWN6fuzzer6Fuzzer10IsMyThreadE]+0x14): dangerous relocation: unsupported relocation
  clang: error: linker command failed with exit code 1 (use -v to see invocation)
  error: command '/usr/bin/clang++' failed with exit code 1
  ----------------------------------------
  ERROR: Failed building wheel for atheris
  Running setup.py clean for atheris
Failed to build atheris

Any suggestions would be most appreciated!

Hypothesis produces poor results

I'm in the process of writing end-to-end tests to make sure Python coverage is high-quality. In doing so, I discovered that Hypothesis structured fuzzing causes really poor fuzz quality - even the example in the readme doesn't work:

import atheris
from hypothesis import given, strategies as st

@given(st.from_regex(r"\w+!?", fullmatch=True))
@atheris.instrument_func
def test(string):
  assert string != "bad"

atheris.Setup(sys.argv, atheris.instrument_func(test.hypothesis.fuzz_one_input))
atheris.Fuzz()

I checked, and this isn't caused by the new coverage method - this works poorly with old coverage too. Doing this with regular Atheris, however, works excellently.

@Zac-HD, as the original contributor of the Hypothesis examples: do you have any suggestions here? I was thinking something along the lines of an external mutator for libFuzzer might work to fix the issues here. That's how libprotobuf-mutator for C++ works.
@nedwill your input might also be helpful here.

Unable to execute target in fork mode

I am trying to achieve parallel fuzzing with -fork=1. When the invocation looks like

/usr/bin/python3.9 /fuzz_harness.py -shuffle=1 -fork=1 -ignore_crashes=1 -ignore_timeouts=1 -ignore_oom=1

I am getting the following error:

INFO: Loaded 1 PC tables (2722 PCs): 2722 [0xfbf410,0xfc9e30), 
INFO: -fork=1: fuzzing in separate process(s)
INFO: -fork=1: 0 seed inputs, starting to fuzz in /tmp/libFuzzerTemp.FuzzWithFork8.dir
#0: cov: 0 ft: 0 corp: 0 exec/s 0 oom/timeout/crash: 0/0/0 time: 0s job: 1 dft_time: 0
INFO: log from the inner process:
sh: 1: fuzz_harness.py: not found
INFO: exiting: 127 time: 0s

This is due to sys.argv being set to ['fuzz_harness.py'] in atheris.Setup(sys.argv, TestOneInput). The error is even worse with -ignore_crashes=1 where the fuzzer keeps on restarting and reporting 0 stats.
I was able to work around this by making the python file executable by adding a shebang line at the top. Can this be handled in some other way?

Order of calls to LF messes output

Atheris calls __sanitizer_cov_8bit_counters_init() for bytecode instrumentation counters from TestOneInput().

Libfuzzer excepts counters to be initialized before the fuzzing loop.

As a result, libfuzzer's output misses "INFO:" section about modules (if no native and instrumented module is loaded before by Python code). Possibly there are some more consequences, but they are not known to me. Lack of this output means we do not know how much counters are in use.

Output in question:

INFO: Loaded 2 modules   (5643 inline 8-bit counters): 243 [0x7f7930a96b93, 0x7f7930a96c86), 5400 [0x7f79306b0000, 0x7f79306b1518), 
INFO: Loaded 2 PC tables (5643 PCs): 243 [0x7f7930a96c88,0x7f7930a97bb8), 5400 [0x7f792f6b0000,0x7f792f6c5180), 

This happens due to TracePC::NumModules increase in https://github.com/llvm/llvm-project/blob/cfb702676cc181877482a282fe7e07109a24dc9d/compiler-rt/lib/fuzzer/FuzzerTracePC.cpp#L39 not happening before a call to https://github.com/llvm/llvm-project/blob/cfb702676cc181877482a282fe7e07109a24dc9d/compiler-rt/lib/fuzzer/FuzzerTracePC.cpp#L80 that is invoked from __sanitizer_cov_8bit_counters_init().

[afl++ atheris-crashes-4809e3f9-hgbmt] /workdir # gdb /usr/bin/python3
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
(gdb) run target.py
INFO: Instrumenting pathlib
INFO: Instrumenting fnmatch
INFO: Instrumenting ntpath
INFO: Instrumenting urllib
INFO: Instrumenting urllib.parse
INFO: Using built-in libfuzzer
WARNING: Failed to find function "__sanitizer_acquire_crash_state".
WARNING: Failed to find function "__sanitizer_print_stack_trace".
WARNING: Failed to find function "__sanitizer_set_death_callback".
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 2374636383
[New Thread 0x7ffff5cbc640 (LWP 34934)]
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2      INITED cov: 2 ft: 2 corp: 1/1b exec/s: 0 rss: 39Mb

8<=================================

(gdb) b __sanitizer_cov_8bit_counters_init
Breakpoint 1 at 0x7ffff7428df0: file /root/llvm-project/compiler-rt/lib/fuzzer/FuzzerTracePC.cpp, line 40.
(gdb) run target.py
INFO: Instrumenting pathlib
INFO: Instrumenting fnmatch
INFO: Instrumenting ntpath
INFO: Instrumenting urllib
INFO: Instrumenting urllib.parse
INFO: Using built-in libfuzzer
WARNING: Failed to find function "__sanitizer_acquire_crash_state".
WARNING: Failed to find function "__sanitizer_print_stack_trace".
WARNING: Failed to find function "__sanitizer_set_death_callback".
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 1189053150
[New Thread 0x7ffff5cbc640 (LWP 36037)]
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes

Thread 1 "python3" hit Breakpoint 1, __sanitizer_cov_8bit_counters_init (Start=0x7ffff53bc000 "", Stop=0x7ffff53bc5b2 "") at /root/llvm-project/compiler-rt/lib/fuzzer/FuzzerTracePC.cpp:465
465     /root/llvm-project/compiler-rt/lib/fuzzer/FuzzerTracePC.cpp: No such file or directory.
(gdb) bt
#0  __sanitizer_cov_8bit_counters_init (Start=0x7ffff53bc000 "", Stop=0x7ffff53bc5b2 "") at /root/llvm-project/compiler-rt/lib/fuzzer/FuzzerTracePC.cpp:465
#1  0x00007ffff73d8165 in atheris::TestOneInput (data=0x555555ca0be0 "\360,\347\367\377\177", size=0) at src/native/core.cc:138
#2  0x00007ffff740f4a5 in fuzzer::Fuzzer::ExecuteCallback (this=this@entry=0x555555c8a090, Data=Data@entry=0x7fffffffc6df "", Size=Size@entry=0) at /root/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:617
#3  0x00007ffff741557e in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora (this=this@entry=0x555555c8a090, CorporaFiles=std::vector of length 0, capacity 0) at /root/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:810
#4  0x00007ffff7415c07 in fuzzer::Fuzzer::Loop (this=this@entry=0x555555c8a090, CorporaFiles=std::vector of length 0, capacity 0) at /root/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:870
#5  0x00007ffff73fdac7 in fuzzer::FuzzerDriver (argc=<optimized out>, argv=<optimized out>, Callback=<optimized out>) at /root/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:912
#6  0x00007ffff73d8c75 in atheris::start_fuzzing(std::vector<std::string, std::allocator<std::string> > const&, std::function<void (pybind11::bytes)> const&) (args=std::vector of length 0, capacity 2, test_one_input=...) at src/native/core.cc:226

8<===============================================

#36 0x000055555577f225 in _start ()
(gdb) b TracePC::PrintModuleInfo()
Breakpoint 2 at 0x7ffff7427bd0: file /root/llvm-project/compiler-rt/lib/fuzzer/FuzzerTracePC.cpp, line 81.
(gdb) run target.py
INFO: Instrumenting pathlib
INFO: Instrumenting fnmatch
INFO: Instrumenting ntpath
INFO: Instrumenting urllib
INFO: Instrumenting urllib.parse
INFO: Using built-in libfuzzer
WARNING: Failed to find function "__sanitizer_acquire_crash_state".
WARNING: Failed to find function "__sanitizer_print_stack_trace".
WARNING: Failed to find function "__sanitizer_set_death_callback".
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 1525613820

Breakpoint 2, fuzzer::TracePC::PrintModuleInfo (this=this@entry=0x7ffff7454400 <fuzzer::TPC>) at /root/llvm-project/compiler-rt/lib/fuzzer/FuzzerTracePC.cpp:81
81      /root/llvm-project/compiler-rt/lib/fuzzer/FuzzerTracePC.cpp: No such file or directory.
(gdb) bt
#0  fuzzer::TracePC::PrintModuleInfo (this=this@entry=0x7ffff7454400 <fuzzer::TPC>) at /root/llvm-project/compiler-rt/lib/fuzzer/FuzzerTracePC.cpp:81
#1  0x00007ffff740d5c8 in fuzzer::Fuzzer::Fuzzer (this=0x555555c8a090, CB=<optimized out>, Corpus=..., MD=..., Options=...) at /root/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:153
#2  0x00007ffff73fbe2f in fuzzer::FuzzerDriver (argc=<optimized out>, argv=<optimized out>, Callback=0x7ffff73d8100 <atheris::TestOneInput(unsigned char const*, unsigned long)>) at /root/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:813
#3  0x00007ffff73d8c75 in atheris::start_fuzzing(std::vector<std::string, std::allocator<std::string> > const&, std::function<void (pybind11::bytes)> const&) (args=std::vector of length 0, capacity 2, test_one_input=...) at src/native/core.cc:226

Cannot Install Atheris using PIP on Ubuntu, Alpine, ArchLinux and Windows

I've tried installing(and building from source) Atheris using pip on fresh installs of all of the platforms listed above and get one of 2 errors:

  • error: [Errno 2] No such file or directory: '/tmp/pip-install-hhn1h51j/atheris_b195119dd32f487a9665296255a04b99/setup_utils/find_libfuzzer.sh'(with different hashes)
  • error: [WinError 193] %1 is not a valid Win32 application

While I could take time to figure out these errors and find workarounds for them, why should I?

Preloaded libFuzzer doesn't allow to use custom mutator.

Hi! We've been fuzzing with atheris and faced the problem that when we use LD_PRELOAD=/path/to/preload/asan_with_fuzzer.so for fuzzing with C extensions, the custom mutator that is written in python code is not linked and as a result is not used.

I will demonstrate the problem on this atheris example.
When I run the target like this: /custom_mutator_example.py, I get the following:

INFO: Using built-in libfuzzer 
WARNING: Failed to find function "__sanitizer_acquire_crash_state".
WARNING: Failed to find function "__sanitizer_print_stack_trace".
WARNING: Failed to find function "__sanitizer_set_death_callback".    
INFO: found LLVMFuzzerCustomMutator (0x7ffff767d9b0). Disabling -len_control by default.
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 352984491
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2	INITED cov: 2 ft: 2 corp: 1/1b exec/s: 0 rss: 35Mb

And the LLVMFuzzerCustomMutator is found.

When I run LD_PRELOAD="/usr/local/lib/python3.8/dist-packages/asan_with_fuzzer.so" /custom_mutator_example.py, I get this:

INFO: Using preloaded libfuzzer
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 129126802
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2	INITED cov: 2 ft: 2 corp: 1/1b exec/s: 0 rss: 44Mb

And the LLVMFuzzerCustomMutator is not found and not used.

How can I use a custom mutator while using asan_with_fuzzer.so for external C extensions?

Atheris 2.2.2 fails to instrument a while loop in Python 3.11.0

Trying to instrument function like this fails with an assertion error in check_state on Python 3.11:

def foo(data):
    while data:
        pass

Interestingly, a function like this instruments correctly:

def foo(data):
    while False:
        pass

Full reproduction:

~/tmp/python/mylib$ python --version
Python 3.11.0
~/tmp/python/mylib$ pip list
Package    Version
---------- -------
atheris    2.2.2
pip        23.0.1
setuptools 65.5.0
~/tmp/python/mylib$ cat foo.py 
import atheris


@atheris.instrument_func
def foo(data):
    while data:
        pass
~/tmp/python/mylib$ python foo.py 
Traceback (most recent call last):
  File "~/tmp/python/mylib/foo.py", line 4, in <module>
    @atheris.instrument_func
     ^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.pyenv/versions/tmp/lib/python3.11/site-packages/atheris/instrument_bytecode.py", line 886, in instrument_func
    func.__code__ = patch_code(func.__code__, True, True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.pyenv/versions/tmp/lib/python3.11/site-packages/atheris/instrument_bytecode.py", line 878, in patch_code
    return inst.to_code()
           ^^^^^^^^^^^^^^
  File "~/.pyenv/versions/tmp/lib/python3.11/site-packages/atheris/instrument_bytecode.py", line 529, in to_code
    self._check_state()
  File "~/.pyenv/versions/tmp/lib/python3.11/site-packages/atheris/instrument_bytecode.py", line 435, in _check_state
    listing[i].check_state()
  File "~/.pyenv/versions/tmp/lib/python3.11/site-packages/atheris/instrument_bytecode.py", line 236, in check_state
    assert (
AssertionError

Why are Instance methods not instrumented

# Bound methods can't be instrumented - instrument the real func instead

I have such observation that code instrumentation does not apply to instance methods. I wonder if this is a TODO of this project or there are some reasons that such functions should not be instrumented. I am trying to learn testing program and this project. Many thanks for any explainations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.