Comments (7)
There are also some AKIAblahblah
keys that do not have high-entropy enough, but I suppose that should be a different issue :)
from detect-secrets.
I wanted to work on this issue. On doing a little bit of digging, I came up with two proposed solutions and would really appreciate any comments regarding the same:
- Can we expect the client to include a hyphen within the charset? If yes, then I believe we just need to use
re.escape(charset)
instead ofcharset
on this line - If clients should continue to use the existing charset, we need to either enrich just the regex, or both the regex and charset (internally). Unless we append the hyphen to the charset in the constructor, the entropy calculation will not use hyphens. So should hyphens be included in the entropy calculation?
Finally, I believe tests should go here, right?
Thanks!
from detect-secrets.
Those are good questions @mohit-surana! The short answers are:
- No (we should be able to support both, holistically)
- Yes?
Based on the entropy algorithm, it seems that the more characters in the charset
, the higher the entropy can be.
Following this logic, it would suggest that a more liberal charset
may require a different entropy configuration level, seeing that the same level may produce more false positives.
However, if this is true, then any additions to the charset
would require a completely separate plugin (e.g. adding hyphens and percentage signs -%
), and the maintenance of these potential plugins could get very messy.
Any thoughts on this?
from detect-secrets.
Theoretically, yes. It would increase the false positive rate while reducing false negative rates as well. Ultimately it will be a trade-off between false positive and false negatives. Do we have any statistics regarding the current system's false positive rates?
How can we design good tests to measure the new statistics, that have a large coverage to assess the new FP/FN rates?
As for new plugins, it seems to me that ultimately, making changes to the entropy calculation is a big NO as it may affect current clients. And you can make combinatorial number of plugins if we make one for each small difference. Would it be better to allow clients to pass an additional argument indicating whether they would like to include additional preset/client specified symbols?
Bottom line: If FP increases a lot, we need to have the client make a conscious decision to move into a new version that supports hyphens.
from detect-secrets.
I'm in favor of the additional argument, but I don't know how that might look like with the user interface. Certainly would increase the scope of this issue (and perhaps no longer a "good first issue")! If you still wanted to take it on, we'd more than welcome the contribution!
Otherwise, the AKIA
prefixed issue that @KevinHock mentioned may be a good start. Though it doesn't strictly find the AWS secret, it gives a good indication that there might be a secret there, in the same principle as "where there's smoke, there's fire".
As for testing FP/FN rates, we are building a large internal collection of various different secrets that we use to experiment with our new plugins. We can certainly run your plugin on our corpus, and help tweak its default sensitivity.
from detect-secrets.
Hey @domanchi. I am interested in implementing the additional argument version of the solution. I am a bit caught up with stuff at the moment and I'll get back to it as soon as time permits!
The internal corpus sounds like a really good idea, and in general will help attract more users as well. As for the AKIA
prefix, I will need to think further to understand how we can incorporate patterns along with the entropy calculation. Let me get back to you!
from detect-secrets.
We're going to close this issue as it hasn't received any update in a very long time. Feel free to re-open it if you think it's still relevant.
from detect-secrets.
Related Issues (20)
- Fix README so copy/paste works HOT 1
- Supported languages? HOT 3
- Secret followed by type hint are not detected
- Secrets are not found in Jupyter Notebooks HOT 1
- validity checking of detected secrets ? HOT 5
- Detect a npmrc auth token being checked in HOT 1
- Problem with Python3.11 and pre-commit HOT 4
- False Negative - YAML Parser Stops Reading After First String Value/Does Not Read Lists of Strings
- Getting detect-secrets: command not found error HOT 3
- Request: using a baseline as an allowlist HOT 5
- Pre-commit hook fails with "error: Unable to read baseline." HOT 1
- Reddit HOT 1
- Request: Push a new tag HOT 3
- Bug file with just `APEOptState` seems to be breaking `detect-secrets[word_list]==1.5.0` HOT 1
- detect-secrets-hook read file list from a file, to allow use of exit code HOT 2
- KeywordDetector plugin doesn't detect secrets which start with a symbol
- Configure to only warn, for pre-commit hook config HOT 5
- Bypass any plugin with square brackets HOT 2
- detect-secrets not identifying all Github token occurrences in a file
- very slow processing speed for lines with a large number of consecutive spaces
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from detect-secrets.