Git Product home page Git Product logo

entropy's Introduction

Its just a prototype of a WAF core which makes of mathematical algorithms to determine if the input is malicious.
Play with it by entering payloads (prefer XSS for now) and let me know about your experience.

Detection Methods

  • Entropy
  • Shannon Entropy
  • Levenshtein Distance
  • Special Character Ratio
  • Some regex (I don't think its necessary but still...)

How it works?

Entropy gets it name from a scientific term "Entropy".

Entropy is basically the measure of randomness of something

But how does it apply to detection of malicious payloads?
Take a look at these two strings and their entropy

String: black pens & red caps
Entropy: 0.000302964443769

String: <svg onload=alert()>
Entropy: 53.4044125463

Does it make sense now?
Let me introduce you to all the algorithms used now

Entropy

Here's how we calculate entropy:

log(score)/log(2)) * len(payload)

Where score is the number of special characters in the string.
Higher the entropy, higher is the probablity of string to be malicious.

Shannon Entropy
for number in range(256):
    result = float(payload.count(chr(number)))/len(payload)
    if result != 0:
        entropy = entropy - result * log(result, 2)

For a better understanding take a look the source code.
But what shannon entropies does is that considers patterns unlike the normal entropy.
Take a look at these three strings and their shannon entropies:

String: s0md3v
Entropy: 2.58496250072

String: ../../../../
Entropy: 0.918295834054

String: //////////////
Entropy: 0.0

The first string has no repeating pattern and hence has the highest value of shannon entropy while the second string however has a repeating pattern which lowers it entropy to nearly one. The last string only consists a single character and has no randomness and hence has 0 shannon entropy.
So again, higher the shannon entropy, higher is the probablity of string to be malicious.

Special Char ratio
(len(payload) - score) <= len(payload)/2

Where score is again the number of special characters in the string.
We are just checking if the string's 50% part or more is made of special characters.
Higher is the special char ration, higher is the probablity of string to be malicious.

Levenshtein Distance

Most of the WAFs check if the input matches a regex or payload in their signature database. But instead of looking for same payloads in signature database, Entropy looks for similar payloads using Levenshtein Distance algorithm. Instead of reinventing the wheel and writing the algorithm myself, I used FuzzyWuzzy module but when this project will be further developed, I may use my own code.

Thats all folks.

License & Other Stuff

This project has no license and in that case, according to international standards you are not allowed to modify or redistribute it but as its hosted on Github, you are free to view and use the code ;)
Do you think this is a great idea? Do you know something which can make it better? Mail me at s0md3v(at)gmail(dot)com

entropy's People

Contributors

s0md3v avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.