Git Product home page Git Product logo

trufflehog_v3_loc_bug's Introduction

Introduction

This repository documents line of code calculation bugs which is present in TruffleHog secrets scanner, as reported in the following issues in the TruffleHog GitHub issue tracker:

Bug #2502 - Line of code calculation is wrong for sequential identical secrets

Description

When the same secret occurs multiple times in a contiguous sequence, TruffleHog incorrectly calculates the line of code value for each subsequent instance of a secret as having the same value as the first occurrence. This results in the first instance of a secret having the correct value for the line number, but all following instances having the same value, which is incorrect.

Affected versions

Analysis indicates that this bug was introduced in PR #520 which was merged on May 04 2022, and first appeared in TruffleHog v3.4.3 on May 05 2022.

Root Cause

Data chunking is used internally in TruffleHog to optimise for performance, by setting a maximum amount of data that a secret detector will process at a time. However, the implementation of this methodology loses some context including the occurrence number (index) of the found secret.

This loss of context information leads to a scenario such that if the exact same raw secret value is present sequentially in a data chunk, then when the line of code is calculated, the first instance of the raw secret value is used as a reference point for calculation instead of the actual location where the secret may have actually been found.

Source

In the file /pkg/engine.go, in the FragmentLineOffset method, on line number 900 , the data chunk is broken up into three pieces (before, after, and found) using the bytes.Cut function:

code screenshot

The behaviour of this function is such that it will split the data upon encountering the first occurrence of the supplied prefix, in this case that is the raw secret value (result.Raw). The consequence of this is that each time the engine calculates the line of code value, it is calculating it from the same line each time regardless of which instance of the occurrence that this calculation is intended for.

Previous to TruffleHog version 3.28.0 similar behaviour was first introduced in v3.4.3 on line 234 using the bytes.Split function:

code screenshot 2

Bug #2504 - Presence of 'line of code' values are inconsistently presented in results, depending upon the data source configured

Description

Line of code values reported by TruffleHog are inconsistently present depending upon the data source selected. It has been observed that the filesystem data source does not always present the line of code values in results as presented by TruffleHog.

In the below screenshot, which shows /results/filesystem_loc_inaccurate.json , we can see that of the 10 results presented (lines 2-11), only 5 of the results contain a line number, despite the findings being produced in the same file, using the same regular expression pattern, with the similar 'raw secret' results:

code screenshot 3

Comparing the above screenshot, to the below, which is taken from /results/git_loc_inaccurate.json we can see that each finding produced with the 'git' data source contains line of code values. These values are incorrect due to bug #2502 described above, however they are present in that data source but not the 'filesystem' data source, which is issue at hand with bug #2504.

code screenshot 4

Affected versions

Observed in TruffleHog v3.68.0, no other versions have been tested at this point.

Root Cause

Unknown.

trufflehog_v3_loc_bug's People

Contributors

0x736e avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.