Git Product home page Git Product logo

py-retrosheet's Introduction

PY-RETROSHEET

Python scripts for Retrosheet data downloading and parsing.

YE REQUIREMENTS

USAGE

Setup

cp scripts/config.ini.dist scripts/config.ini

Edit scripts/config.ini as needed. See the steps below for what might need to be changed.

Download

python download.py [-y <4-digit-year> | --year <4-digit-year>]

The scripts/download.py script downloads Retrosheet data. Edit the config.ini file to configure what types of files should be downloaded. Optionally set the year to download via the command line argument.

  • download > dl_eventfiles determines if Retrosheet Event Files should be downloaded or not. These are the only files that can be processed by parse.py at this time.

  • download > dl_gamelogs determines if Retrosheet Game Logs should be downloaded or not. These are not able to be processed by parse.py at this time.

Parse into SQL

python parse.py [-y <4-digit-year>]

After the files have been downloaded, parse them into SQL with parse.py.

  1. Create database called retrosheet (or whatever).

  2. Add schema to the database w/ the included SQL script (the .postgres.sql one works nicely w/ PG, the other w/ MySQL)

  3. Configure the file config.ini with your appropriate ENGINE, USER, HOST, PASSWORD, and DATABASE values - if you're using postgres, you can optionally define SCHEMA and download directory

    • Valid values for ENGINE are valid sqlalchemy engines e.g. 'mysql', 'postgresql', or 'sqlite',

    • If you have your server configured to allow passwordless connections, you don't need to define USER and PASSWORD.

    • If you are using sqlite3, database in the config should be the path to your database file.

    • Specify directory for retrosheet files to be downloaded to, needs to exist before script runs

  4. Run parse.py to parse the files and insert the data into the database. (optionally use -y YYYY to import just one year)

YE GRATITUDE

Github user jeffcrow made many fixes and additions and added sqlite support

JUST THE DATA

If you're using PostgreSQL (and you should be), you can get a dump of all data up through 2014 (warning: 502MB) here

py-retrosheet's People

Contributors

n8rb avatar jasonmm avatar almartin82 avatar dvj avatar jeffreycrow avatar leemendelowitz avatar ostrowr avatar wellsoliver avatar bdilday avatar

Watchers

Maura Wilder avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.