Git Product home page Git Product logo

vuln-regex-detector's Introduction

Summary

This project provides tools to scan your projects for vulnerable regexes. These are regexes that could lead to catastrophic backtracking.

Getting started

Local queries

  1. Set the environment variable VULN_REGEX_DETECTOR_ROOT to wherever you cloned the repo.
  2. Run the configure script to install dependencies and build the detectors.
  3. Use the scripts in bin. See their README for details.

Remote queries

If you don't want to install and run the detectors locally, you can use the vuln-regex-detector npm module. This module uses the src/cache/client/npm code to query a server hosted at Virginia Tech. The server is running the src/cache/server code.

See the corresponding README for more details.

How it works

Scanning a project has three stages:

  1. Regex extraction
  2. Vulnerability detection
  3. Vulnerability validation

Regex extraction

In this stage regexes are statically extracted from the project's source code. See here for more details.

Vulnerability detection

In this stage the regexes are tested for vulnerability. See here for more details.

Testing regexes for vulnerability is expensive. As a result, the default configuration of this repo is to query a server to see if the regex has previously been tested for safety. See here for more details.

If this is a problem you can turn it off or direct queries to your own server by editing src/cache/.config.json in your clone. The source for the server is included in src/cache.

Vulnerability validation

In this stage the results of the vulnerability tests are validated.

The vulnerability detectors are not always correct. Happily, each emits evil input it believes will trigger catastrophic backtracking. We have vulnerability validators to check their recommendation in the language(s) in which you will use the regexes.

See here for more details.

Pipelining

  1. The extraction stage produces a list of regexes. Each regex should be fed to the detection stage.
  2. The detection stage produces evil input from each detector. Each evil input should be fed in turn to the validation phase.

The scripts in bin/ implement this pipeline.

Caveats

In brief, let's review how the analysis works:

  1. Identify all statically-declared regexes used anywhere in your source code.
  2. Ask detectors what they think about each regex.
  3. For any regexes that any detectors flagged as vulnerable, validate in the appropriate language.

Here are the shortcomings of the analysis.

  1. Regex extraction: It is static. If you dynamically define regexes (e.g. new Regex(patternAsAVariable)) we do not know about it.
  2. Regex extraction: It is input agnostic, so it detects vulnerable regexes whether or not they are currently exploitable. As long as a vulnerable regex is only used on trusted input, it will not be exploited. If a vulnerable regex is only used in test code, then it is not currently a problem. Judge for yourself how comfortable you feel about keeping non-exploitable vulnerable regexes in your code.
  3. Vulnerability detection: It is detector dependent. All of the detectors have their flaws, and none has received careful testing. Thanks to the validation stage we only report truly vulnerable regexes (high precision/no false positives), but there may be unreported vulnerabilities (risk of low recall/false negatives) e.g. due to bugs or timeouts in the detection stage.

Supported OSes

The configuration code supports Ubuntu directly (tested on Ubuntu 16), for other distros/OSes a container can be used (see Docker below). Everything else should work on any Linux. Open an issue if you want other distros/OSes and we can discuss.

Docker

A Dockerfile is provided to make the code easier to configure on non-Ubuntu systems. The image can be built and used as follows:

$ docker build -t vuln-regex-detector .
$ docker run --rm -v /tmp/query:/query vuln-regex-detector bin/check-regex.pl /query/unsafe-1.json

where /tmp/query/unsafe-1.json contains the pattern to be checked.

Contributing

Contributions welcome!

  • If you find a bug, please open an issue.
  • If you want to add a feature, open an issue to discuss first and to "claim the territory".

Enhancing the scan

If you want to enhance the scan, here are the instructions.

  1. If you want to add support for a new language, here are the instructions for regex extraction and for vulnerability validation.
  2. If you want to add a new vulnerability detector, see the instructions.

Related projects

  1. https://github.com/olivo/redos-detector
  2. https://github.com/substack/safe-regex
  3. https://github.com/google/re2

vuln-regex-detector's People

Contributors

davisjam avatar du201 avatar ewmson avatar jamesdonoh avatar meekdenzo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.