Git Product home page Git Product logo

Comments (5)

kspalaiologos avatar kspalaiologos commented on August 24, 2024

According to Ilya:

This input is not valid BWT string. I attempted to decode it with different algorithm (see below) and it failed as well. libsais uses variation of BWT with sentinel symbol ($) as oppose to variation with cyclic permutation (like in bzip2). This BWT variations are not compatible and behavior of libsais_unbwt is not defined for such inputs. (#10)

Hence I doubt that the undefined behaviour in libsais on invalid inputs will be fixed.

from libsais.

IlyaGrebnov avatar IlyaGrebnov commented on August 24, 2024

I am currently dedicating my efforts on implementing further enhancements for libcubwt. However, I would like to assure that I will investigate this issue, likely within the next few weeks.

from libsais.

IlyaGrebnov avatar IlyaGrebnov commented on August 24, 2024

@kspalaiologos and @asarubbo I would like to provide an update on the fuzz testing using libFuzzer I conducted on libsais.

Correctness Issues

After extensive testing, no correctness issues were identified for libsais. It was able to successfully encode and decode any arbitrary string generated by the fuzzer.

Undefined Behaviors

However, during the testing, I did identify two undefined behaviors with libsais:

  1. Invalid input for BWT Decoding: Not all string plus index pairs are valid inputs for BWT decoding because there are no strings that could generate them. As a result, libsais will crash on this inputs.

  2. Uninitialized read errors: Sometimes, libsais attempts to read and prefetch look-ahead values that might not be populated yet. While this is not a correctness issue since prefetch instructions are non-faulting and merely a hardware hint, memory sanitizer triggers uninitialized read errors.

Recommendation

Based on fuzzy testing, both of these undefined behaviors could be eliminated by clearing the input SA and A arrays before calling libsais.

Investigation into bzip3 Source Code

In addition to the above, I also looked into the bzip3 source code to determine the root cause of the discovered bugs. After analyzing the code, it appears that calls to libsais are made with less allocated space than required by the algorithm. Specifically, the SA array needs to be at least n bytes and A array at least n + 1 bytes. However, from my investigation, it seems that arrays are allocated by block_size + 128 bytes, while the lzp_size is bound by input_size + input_size / 50 + 32 bytes. Assuming that block size and input size are the same, the LZP could inflate the input by 2%, which leads to the inflate size being larger than the allocated space for libsais (input_size + 2% > block_size + 128), ultimately resulting in the crash.

I hope this details is sufficient to improve the reliability of bzip3. Please let me know if you have any questions or concerns.

from libsais.

kspalaiologos avatar kspalaiologos commented on August 24, 2024

Thanks for all the valuable input! I have implemented your recommendations and now I am fuzzing bzip3 with AFL on my machines. If there's anything left related to libsais I will make sure to let you know.

from libsais.

asarubbo avatar asarubbo commented on August 24, 2024

@kspalaiologos and @asarubbo I would like to provide an update on the fuzz testing using libFuzzer I conducted on libsais.

Thanks for the update and for you valuable work. If you want to share the code used for fuzz testing, I (and other people) can use it too to report further issues, or you can consider https://github.com/google/oss-fuzz

from libsais.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.