Git Product home page Git Product logo

jsp-voters's Introduction

AP,JSP voter data extraction,conversion utilities

To-Do

  • Document the process

Features

  • Connects to election commission website, authenticates and pulls the PDF images
  • Converts downloaded or supplied PDF files to text
  • Converts extracted text file (or supplied one) to voters data as CSV
  • Loads the data into mysql database
  • Output can also be saved to S3/MySQL database (--db --s3 arguments)
  • Basic validation on what data is missing at district or AC level
  • Supports proxybroker to use as white-lable IPs for rotation

Files

Website

  • simple server to upload files for processing or to download

Usage

python3 convert-voters.py --help
usage: convert-voters.py [-h] [--debug] [--district DISTRICT] [--ac AC] [--booths BOOTHS] [--threads THREADS] [--dry-run] [--skip-voters] [--skip-proxy] [--enable-lookups] [--text] [--overwrite] [--skip-cleanup] [--stop-on-error] [--limit LIMIT] [--stdout] [--input INPUT] [--csv] [--xls] [--db] [--output OUTPUT] [--s3 S3] [--list-missing] [--metadata]

Parse voters data from image file to CSV

optional arguments:
  -h, --help           show this help message and exit
  --debug              Enable debug mode
  --district DISTRICT  Specific district to be dumped (default None)
  --ac AC              Specific assembly constituency to be dumped (comma separated, default all constituencies)
  --booths BOOTHS      Limit search to the specific booth IDs, separated by comma= (default None)
  --threads THREADS    Max threads (default 1)
  --dry-run            Dry run to test
  --skip-voters        Skip voters data processing (limit to BOOTH details)
  --skip-proxy         Skip proxy to be used for requests
  --enable-lookups     Enable lookups DB with cache (default False)
  --text               Process input text files (default pdf)
  --overwrite          Overwite if file already exists, if not skip processing
  --skip-cleanup       Skip deleting intermediate files post processing
  --stop-on-error      Skip processing upon an error
  --limit LIMIT        Limit total booths (default all booths)
  --stdout             Write output to stdout instead of CSV file
  --input INPUT        Use the input file specified instead of downloading
  --csv                Create CSV file, default False
  --xls                Create XLS file, default False
  --db                 Write to database, default False
  --output OUTPUT      Output folder to store extracted files (default "output")
  --s3 S3              s3 bucket name to store final csv file
  --list-missing       List missing district, AC or booth data
  --metadata           Parse metadata from first page

jsp-voters's People

Contributors

vanuganti avatar

Watchers

 avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.