Git Product home page Git Product logo

benford's Introduction

Benford

Fraud detection with Benford's law in election data

Was just going through a few awk examples when came across Benford law which states that:

In real-world data the most frequent leading digit is 1, then 2, and so on.

So tried it out on some random numeric datasets from recent state elections in India

Repo description:

fabfile.py - Code that can be run with fab analyze --hide=running

Eg.

westbengal_ls2009
{1: 11, 2: 8, 3: 4, 4: 3, 5: 3, 6: 0, 7: 0, 8: 1, 9: 2}
karnataka
{1: 75, 2: 42, 3: 34, 4: 25, 5: 11, 6: 10, 7: 10, 8: 10, 9: 6}
gujrat
{1: 53, 2: 39, 3: 27, 4: 18, 5: 13, 6: 14, 7: 6, 8: 8, 9: 4}
goa
{1: 8, 2: 11, 3: 3, 4: 7, 5: 2, 6: 3, 7: 1, 8: 2, 9: 3}
uttarpradesh_ls2009
{1: 17, 2: 23, 3: 10, 4: 4, 5: 14, 6: 8, 7: 0, 8: 5, 9: 0}
maharastra
{1: 103, 2: 55, 3: 43, 4: 15, 5: 20, 6: 17, 7: 14, 8: 11, 9: 11}

The dictionaries tell the number of times vote margins start with 1, 2, .., 9 respectively.

So for Maharastra, the number of vote margins where the difference is either 1xx, 1xxx or 1xxx is 103. So Benford's law kind of holds true here. Similar observations can be made for other states.

The one liner that does all this analysis is

cat <file> |awk '{print $NF}'|tr -sc [0-9] '\n'|sort -rn|grep ^<i>|wc -l

Note that all the data sources have the number of interest (i.e vote margins) as the last number on each line of the respective files.

Explanation:

  • cat <file> - Prints the file to the standard output
  • awk '{print $NF}' - Pipes the output of cat to awk which prints the last token specified by $NF for every line
  • tr -sc [0-9] '\n' - Pipes output of awk which is the last token in the file to tr which compresses (i.e elimintates repetition) and retains only the tokens that match [0-9] i.e numbers or \n i.e the newline character
  • sort -rn - Reverse, numeric sort
  • grep ^i - Matches all tokens that start with the character i.
  • wc -l - Count of all lines that have matched i

The fabfile is a python fabric utility to avoid writing more bash.

random.txt is a piece of random text from [http://randomtextgenerator.com/]

plots hold the plotted data with/without logarithmic scale on one axes.

Generally it can be observed that 1 has the highest frequency, folowed by 2 and so on. In some cases a rise-fall pattern is observed which is also another derivative of this same law. So if Benford's law is to be held as the sole criteria for election fraud detection, than one can say that the elections analyzed had no fraud as far as the votes were concerned.

Data sources below:

Goa - http://en.wikipedia.org/wiki/Goa_legislative_assembly_election,_2012 Maharastra - http://en.wikipedia.org/wiki/Maharashtra_state_assembly_elections,_2009 Gujrat - http://en.wikipedia.org/wiki/Gujarat_legislative_assembly_election,_2012 Karnataka - http://en.wikipedia.org/wiki/Karnataka_Legislative_Assembly_election,_2013

Could not find the 2012 state election detailed report.

UP Lok Sabha 2009 - http://www.rediff.com/election/ls04up.htm West Bengal Lok Sabha 2009 - http://www.rediff.com/election/ls04wb.htm

benford's People

Contributors

ratpik avatar

Stargazers

 avatar Angie Chang avatar Joel Dean avatar

Watchers

 avatar James Cloos avatar  avatar

Forkers

theangelsofwar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.