doyensec / regexploit Goto Github PK
View Code? Open in Web Editor NEWFind regular expressions which are vulnerable to ReDoS (Regular Expression Denial of Service)
License: Apache License 2.0
Find regular expressions which are vulnerable to ReDoS (Regular Expression Denial of Service)
License: Apache License 2.0
I have two expressions which run too long time:
$ time bin/regexploit
Welcome to Regexploit. Enter your regexes:
(?i:(?:j|&#x?0*(?:74|4A|106|6A);?)(?:\t|&(?:#x?0*(?:9|13|10|A|D);?|tab;|newline;))*(?:a|&#x?0*(?:65|41|97|61);?)(?:\t|&(?:#x?0*(?:9|13|10|A|D);?|tab;|newline;))*(?:v|&#x?0*(?:86|56|118|76);?)(?:\t|&(?:#x?0*(?:9|13|10|A|D);?|tab;|newline;))*(?:a|&#x?0*(?:65|41|97|61);?)(?:\t|&(?:#x?0*(?:9|13|10|A|D);?|tab;|newline;))*(?:s|&#x?0*(?:83|53|115|73);?)(?:\t|&(?:#x?0*(?:9|13|10|A|D);?|tab;|newline;))*(?:c|&#x?0*(?:67|43|99|63);?)(?:\t|&(?:#x?0*(?:9|13|10|A|D);?|tab;|newline;))*(?:r|&#x?0*(?:82|52|114|72);?)(?:\t|&(?:#x?0*(?:9|13|10|A|D);?|tab;|newline;))*(?:i|&#x?0*(?:73|49|105|69);?)(?:\t|&(?:#x?0*(?:9|13|10|A|D);?|tab;|newline;))*(?:p|&#x?0*(?:80|50|112|70);?)(?:\t|&(?:#x?0*(?:9|13|10|A|D);?|tab;|newline;))*(?:t|&#x?0*(?:84|54|116|74);?)(?:\t|&(?:#x?0*(?:9|13|10|A|D);?|tab;|newline;))*(?::|&(?:#x?0*(?:58|3A);?|colon;)).)(regexp)
^C
real 3m34,572s
user 3m33,582s
sys 0m0,016s
as you can see, in this case I terminated the process after 3 min and 34 secs.
$ echo "(?i:(?:v|&#x?0*(?:86|56|118|76);?)(?:\t|&(?:#x?0*(?:9|13|10|A|D);?|tab;|newline;))*(?:b|&#x?0*(?:66|42|98|62);?)(?:\t|&(?:#x?0*(?:9|13|10|A|D);?|tab;|newline;))*(?:s|&#x?0*(?:83|53|115|73);?)(?:\t|&(?:#x?0*(?:9|13|10|A|D);?|tab;|newline;))*(?:c|&#x?0*(?:67|43|99|63);?)(?:\t|&(?:#x?0*(?:9|13|10|A|D);?|tab;|newline;))*(?:r|&#x?0*(?:82|52|114|72);?)(?:\t|&(?:#x?0*(?:9|13|10|A|D);?|tab;|newline;))*(?:i|&#x?0*(?:73|49|105|69);?)(?:\t|&(?:#x?0*(?:9|13|10|A|D);?|tab;|newline;))*(?:p|&#x?0*(?:80|50|112|70);?)(?:\t|&(?:#x?0*(?:9|13|10|A|D);?|tab;|newline;))*(?:t|&#x?0*(?:84|54|116|74);?)(?:\t|&(?:#x?0*(?:9|13|10|A|D);?|tab;|newline;))*(?::|&(?:#x?0*(?:58|3A);?|colon;)).)(regexp)" | time bin/regexploit
16.84user 0.00system 0:16.84elapsed 99%CPU (0avgtext+0avgdata 12892maxresident)k
0inputs+0outputs (0major+1834minor)pagefaults 0swaps
so this one took 16 seconds, which is much more than the others - I run about 300 different checks, except these two expressions all of them run within few (ten/hundred) milliseconds.
The second one wouldn't be problem, but it's a bit weird. The first one is problematic, and would be fine to find the reason of that behavior - I would like to integrate this into a CI/CD environment.
Could you help me to investigate these?
Thanks.
Why not detect (a+)+
Welcome to Regexploit. Enter your regexes:
(a+)+
No ReDoS found.
While scanning a large set of regular expressions I found some particularly nasty lines which hung my scripted scan. Perhaps a timeout flag could be added?
Hello,
Thank you for the amazing tool.
I want to cite your work
However, I cannot find the
cff file for this repository.
Is there an entry for correctly citing this work?
Thanks a lot!
Best,
Chengsong
Hi,
I ran into a problem with two regular expressions:
$ bin/regexploit
Welcome to Regexploit. Enter your regexes:
(?i)(?:(?:(?:(?:trunc|cre|upd)at|renam)e|d(?:e(?:lete|sc)|rop)|(?:inser|selec)t|alter|load)\s*?\(\s*?space\s*?\(|,.*?[)\da-f\"'`][\"'`](?:[\"'`].*?[\"'`]|(?:\r?\n)?\z|[^\"'`]+)|\Wselect.+\W*?from)(regexp)
Error parsing: (?i)(?:(?:(?:(?:trunc|cre|upd)at|renam)e|d(?:e(?:lete|sc)|rop)|(?:inser|selec)t|alter|load)\s*?\(\s*?space\s*?\(|,.*?[)\da-f\"'`][\"'`](?:[\"'`].*?[\"'`]|(?:\r?\n)?\z|[^\"'`]+)|\Wselect.+\W*?from)(regexp) bad escape \z at position 164
No ReDoS found.
$ bin/regexploit
Welcome to Regexploit. Enter your regexes:
(?i)(?:[\"'`]\s*?(?:(?:n(?:and|ot)|(?:x?x)?or|between|\|\||and|div|&&)\s+[\s\w]+=\s*?\w+\s*?having\s+|like(?:\s+[\s\w]+=\s*?\w+\s*?having\s+|\W*?[\"'`\d])|[^?\w\s=.,;)(]++\s*?[(@\"'`]*?\s*?\w+\W+\w|\*\s*?\w+\W+[\"'`])|(?:union\s*?(?:distinct|[(!@]*?|all)?\s*?[([]*?\s*?select|select\s+?[\[\]()\s\w\.,\"'`-]+from)\s+|\w\s+like\s+[\"'`]|find_in_set\s*?\(|like\s*?[\"'`]%)(regexp)
Error parsing: (?i)(?:[\"'`]\s*?(?:(?:n(?:and|ot)|(?:x?x)?or|between|\|\||and|div|&&)\s+[\s\w]+=\s*?\w+\s*?having\s+|like(?:\s+[\s\w]+=\s*?\w+\s*?having\s+|\W*?[\"'`\d])|[^?\w\s=.,;)(]++\s*?[(@\"'`]*?\s*?\w+\W+\w|\*\s*?\w+\W+[\"'`])|(?:union\s*?(?:distinct|[(!@]*?|all)?\s*?[([]*?\s*?select|select\s+?[\[\]()\s\w\.,\"'`-]+from)\s+|\w\s+like\s+[\"'`]|find_in_set\s*?\(|like\s*?[\"'`]%)(regexp) multiple repeat at position 170
No ReDoS found.
Both of them gives Error parsing
results - but No ReDoS found
.
These expressions are working with PCRE as well.
I tried to explain this behavior in documentation, but didn't find any information about "Error parsing" results.
What does it mean? How reliable the result ("No ReDoS found") after this?
Thanks.
Good Tools!Will it support scan Java in the future?
When a ReDoS vulnerability is found, the exploit string is a regular expression itself. Would it be possible to generate concrete exploit strings, maybe with the possibility to set the max string length?
Would be useful to run this extension automatically on save on supported files.
Should we expect regexploit
to warn about the kind of Regular Expression Backtracking which caused an outage on StackOverflow (^[\s\u200c]+|[\s\u200c]+$
) or is it out of scope of this tool?
Cheeky feature request - could support for JSON or SARIF be added for output? This would allow for easier consumption in continuous integration, and similarly vulnerability management tools
Json example: https://gitlab.com/gitlab-org/security-products/security-report-schemas/-/blob/master/dist/sast-report-format.json
SARIF example: https://www.oasis-open.org/committees/sarif/charter.php
Thanks!
I found this regex ^\w*a\w*b\w*$
, that should lead to a cubic complexity with a string in the form (ab)*-
.
I'm surprised it doesn't get detected, since it is almost identical to the first example in the README
, even if it needs two times more bytes to achieve about the same execution time.
I was scanning a bunch of regular expressions with some of them containing atomic groups. Every expression containing an atomic group triggered a parsing error indicating that ?> is an unknown extension
. Is this desired behavior by regexploit
or will they be supported in further versions?
Welcome to Regexploit. Enter your regexes:
([<>]*(?:(?R)[^<>]*)*)>
Error parsing: ([<>]*(?:(?R)[^<>]*)*)> unknown extension ?R at position 10
No ReDoS found.
Hi, thanks for this library, it's very useful. Having the output in the stdout is great, but for bigger projects it may not be easy to parse for further analysis or some automation.
This is a request for an option like --extract [file.json]
for regexploit-py
(to start, it could be expanded to other scripts). Let me know if you are open to such feature, I can send a PR for it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.