Git Product home page Git Product logo

entropy's Introduction

#ENT - view string metrics and entropy of arbitrary files

As a command line tool, .NET library or web-application (coming)

author: lo sauer 2011-12; www.lsauer.net
website: https://github.com/lsauer/entropy
license: MIT license
description: quickly plot entropy information and string metrics of arbitrary files or strings from the console Input/Output

Analysing a file:

Cross-Platform:

Plots as Vector graphics (default: .svg):

plot from the 1st screencast

Analysing a twitter-live stream

  • Windows Usage:
    > type file1.ext file2.ext file3.ext | ent -b 2.15
    
    > "teststringdata" | ent -b 2.15 -s
  • Linux, Mac Usage:
    $ cat file1.ext file2.ext file3.ext | ent -b 2.15
    
    $ "sometextdata" | ent -b 64 -s

###Background

In information theory, entropy is a measure of the uncertainty associated with a random variable. Informatically, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message (a specific instance of the random variable), in units of bits. (See: Entropy )

A file with high entropy shows few repeated patterns and is typically compressed/optimized. Such a highly compressed data-stream will feature an entropy index of greater >5 for a binary-base 2.

###Purpose

The purpose of this tool is quickly analyzing arbitrary files, for instance biological sequence files or serialized JSON data. Naturally powerful investigation options are granted to researchers in the form of the R-statistical language or Matlab and Wolfram Mathematica.

Ad hoc investigation however is much faster with a dedicated command line tool along and the features the console environment provides such as autocompletion (pressing Tab). Virtually no startup times are required and plots can be output in web-compatible vector graphics.

Usage

    Usage: shantropy [<filename1> <fname2>...1st param!] [-f fromBy] [-t toBy] [-o <outfile>] [-h help]
    [-e efficiency] [-m 1,2.. 1st,2nd order markov] [-b base-alphabet]
    [-w width plot] [-h height plot] [-z zoom%] [-fp fileposition]
    [-p plot permutation entropy] [-s <string> as last param!]
    Press CTRL+C or Q to Quit!
    Press PAGEUP / PAGEDOWN to zoom in or out of the file
    Press LEFT / RIGHT Arrow to navigate to the next or previous file-segment

Parameters

  • -m 0 zero-order Markov source: default (pratically identical with Shannon entropy when the log-base is 2)
  • -m 1 first-order Markov source: ...number of linked characters is one
  • -m 2 second-order Markov source e.g. ent -m 2
  • -m <n> n order markov source
  • -b <decimal> "b-ary entropy": a different base can be set with e.g. ent -b 2,15 , default is 256 for ASCII; use 64 for literature-text
  • -s <stringdata> arbitrary string passing: -s must be passed as the last argument!
  • -w <int> width of the plot
  • -h <int> height of the plot
  • -f,-t <int> define a file segment in Byte (from/to). Both are optional
  • -z file-segment zoom in percent
  • -fp file-segment position (0-n)
  • -o outfile plot data to a given file and create or append to the file outfile,
  • use ent .... > myfilen.out to capture the entire console output
  • use ent .... > mygraphics.svg to plot to an svg file
  • -e ...plot the efficiency of the data
  • -p ...plot and compute the permutation efficiency

note: files have to be passed as first arguments: To calculate metrics for several files put them in sequence e.g. ent explain.nfo markdownsharp-20100703-v113.7z -b 3,6

###Todo:

  • make and use an console argument hash map or struct params
  • code cleanup

###Fixes

  • slow -> fixed; Readline loop was replaced by ReadAll; up to 200x speedup
  • navigation of the file for chunked data processing
  • incorrect results -> fixed: for text files set -b 64, to get meaninful results

###Example Example for a typical info (.nfo) file: the ordinate(y-axis) shows the entropy and the abscisse (x-axis) shows the file-segment position in percent

result: the text is highly compressible and clearly shows structuring

0,60 |                                  ▓▓▓▓▓▓▓▓▓      ▓▓▓▓▓
0,54 |                                  ▓▓▓▓▓▓▓▓▓    ▓▓▓▓▓▓▓    ▓▓
0,48 |                                  ▓▓▓▓▓▓▓▓▓    ▓▓▓▓▓▓▓    ▓▓▓
0,42 |           ▓    ▓▓  ▓             ▓▓▓▓▓▓▓▓▓▓   ▓▓▓▓▓▓▓    ▓▓▓
0,36 |           ▓    ▓▓▓▓▓    ▓        ▓▓▓▓▓▓▓▓▓▓   ▓▓▓▓▓▓▓    ▓▓▓
0,30 |           ▓▓   ▓▓▓▓▓    ▓ ▓      ▓▓▓▓▓▓▓▓▓▓   ▓▓▓▓▓▓▓    ▓▓▓
0,24 | ▓       ▓ ▓▓ ▓ ▓▓▓▓▓  ▓ ▓ ▓   ▓ ▓▓▓▓▓▓▓▓▓▓▓   ▓▓▓▓▓▓▓ ▓  ▓▓▓
0,18 | ▓       ▓▓▓▓ ▓ ▓▓▓▓▓  ▓▓▓ ▓▓  ▓ ▓▓▓▓▓▓▓▓▓▓▓ ▓ ▓▓▓▓▓▓▓ ▓  ▓▓▓
0,12 | ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓ ▓ ▓▓▓▓▓▓▓ ▓  ▓▓▓
0,06 | ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓ ▓▓ ▓▓▓
0,00 | ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓ ▓▓▓▓▓▓
------------------------------------------------------------------
     0%        16%        33%        50%        66%        83%

###Useful links:

###Case studies:

Fork it on github: https://github.com/lsauer/entropy

Have fun! In fact don't use this program for anything else yet...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.