Git Product home page Git Product logo

baseballdbr's Introduction

BaseballDBR

Build Status CRAN_Status_Badge Project Status: Active - The project has reached a stable, usable state and is being actively developed.

Install

  • Install from CRAN
install.packages("baseballDBR")
  • Or, install the latest development version from GitHub:
devtools::install_github("keberwein/baseballDBR")

Gathering Data

The baseballDBR package requires data that is formatted similar to the Baseball Databank or Sean Lahman's Baseball Database. The package also contains the get_bbdb() function, which allows us to download the most up-to-date tables directly from the Chadwick Bureau's GitHub repository. For example, we can easily load the "Batting" table into our R environment.

library(baseballDBR)

get_bbdb(table = "Batting")
head(Batting)
#>    playerID yearID stint teamID lgID  G  AB  R  H X2B X3B HR RBI SB CS BB
#> 1 abercda01   1871     1    TRO <NA>  1   4  0  0   0   0  0   0  0  0  0
#> 2  addybo01   1871     1    RC1 <NA> 25 118 30 32   6   0  0  13  8  1  4
#> 3 allisar01   1871     1    CL1 <NA> 29 137 28 40   4   5  0  19  3  1  2
#> 4 allisdo01   1871     1    WS3 <NA> 27 133 28 44  10   2  2  27  1  1  0
#> 5 ansonca01   1871     1    RC1 <NA> 25 120 29 39  11   3  0  16  6  2  2
#> 6 armstbo01   1871     1    FW1 <NA> 12  49  9 11   2   1  0   5  0  1  0
#>   SO IBB HBP SH SF GIDP
#> 1  0  NA  NA NA NA   NA
#> 2  0  NA  NA NA NA   NA
#> 3  5  NA  NA NA NA   NA
#> 4  2  NA  NA NA NA   NA
#> 5  1  NA  NA NA NA   NA
#> 6  1  NA  NA NA NA   NA

Use with the Lahman Package

library(Lahman)
library(baseballDBR)

Batting <- Lahman::Batting
head(Batting)
#>    playerID yearID stint teamID lgID  G  AB  R  H X2B X3B HR RBI SB CS BB
#> 1 abercda01   1871     1    TRO   NA  1   4  0  0   0   0  0   0  0  0  0
#> 2  addybo01   1871     1    RC1   NA 25 118 30 32   6   0  0  13  8  1  4
#> 3 allisar01   1871     1    CL1   NA 29 137 28 40   4   5  0  19  3  1  2
#> 4 allisdo01   1871     1    WS3   NA 27 133 28 44  10   2  2  27  1  1  0
#> 5 ansonca01   1871     1    RC1   NA 25 120 29 39  11   3  0  16  6  2  2
#> 6 armstbo01   1871     1    FW1   NA 12  49  9 11   2   1  0   5  0  1  0
#>   SO IBB HBP SH SF GIDP
#> 1  0  NA  NA NA NA   NA
#> 2  0  NA  NA NA NA   NA
#> 3  5  NA  NA NA NA   NA
#> 4  2  NA  NA NA NA   NA
#> 5  1  NA  NA NA NA   NA
#> 6  1  NA  NA NA NA   NA

Adding Basic Metrics

Simple batting metrics can be easily added to any batting data frame. For example, we can add slugging percentage, on-base percentage and on-base plus slugging. Note that OPS and OBP appears as "NA" for the years before IBB was tracked.

library(baseballDBR)

Batting$SLG <- SLG(Batting)

Batting$OBP <- OBP(Batting)

head(Batting, 3)
#>    playerID yearID stint teamID lgID  G  AB  R  H X2B X3B HR RBI SB CS BB
#> 1 abercda01   1871     1    TRO   NA  1   4  0  0   0   0  0   0  0  0  0
#> 2  addybo01   1871     1    RC1   NA 25 118 30 32   6   0  0  13  8  1  4
#> 3 allisar01   1871     1    CL1   NA 29 137 28 40   4   5  0  19  3  1  2
#>   SO IBB HBP SH SF GIDP   SLG OBP
#> 1  0  NA  NA NA NA   NA 0.000  NA
#> 2  0  NA  NA NA NA   NA 0.322  NA
#> 3  5  NA  NA NA NA   NA 0.394  NA

Advanced Metrics

The package includes a suite of advanced metrics such as wOBA, RAA, and FIP, among others. Many of the advanced metrics require multiple tables. For example, the wOBA metric requires the Batting, Pitching, and Fielding tables in order to establish a player's regular defensive position.

library(baseballDBR)

get_bbdb(table = c("Batting", "Pitching", "Fielding"))

Batting$wOBA <- wOBA(Batting, Pitching, Fielding, Fangraphs = T)
head(Batting, 3)
#>    playerID yearID stint teamID lgID  G  AB  R  H X2B X3B HR RBI SB CS BB
#> 1 abercda01   1871     1    TRO <NA>  1   4  0  0   0   0  0   0  0  0  0
#> 2  addybo01   1871     1    RC1 <NA> 25 118 30 32   6   0  0  13  8  1  4
#> 3 allisar01   1871     1    CL1 <NA> 29 137 28 40   4   5  0  19  3  1  2
#>   SO IBB HBP SH SF GIDP      wOBA
#> 1  0  NA  NA NA NA   NA 0.0000000
#> 2  0  NA  NA NA NA   NA 0.2855902
#> 3  5  NA  NA NA NA   NA 0.3078849

The code above uses Fangraphs wOBA values. The default behavior is to uses Tom Tango's adapted SQL formula. Other options include Sep.Leagues, which may act as a buffer to any bias created by the designated hitter.

library(baseballDBR)

get_bbdb(table = c("Batting", "Pitching", "Fielding"))

Batting$wOBA <- wOBA(Batting, Pitching, Fielding, Fangraphs = F, Sep.Leagues = T)
head(Batting, 3)
#>    playerID yearID stint teamID lgID  G  AB  R  H X2B X3B HR RBI SB CS BB
#> 1 abercda01   1871     1    TRO <NA>  1   4  0  0   0   0  0   0  0  0  0
#> 2  addybo01   1871     1    RC1 <NA> 25 118 30 32   6   0  0  13  8  1  4
#> 3 allisar01   1871     1    CL1 <NA> 29 137 28 40   4   5  0  19  3  1  2
#>   SO IBB HBP SH SF GIDP wOBA
#> 1  0  NA  NA NA NA   NA   NA
#> 2  0  NA  NA NA NA   NA   NA
#> 3  5  NA  NA NA NA   NA   NA

We can also produce a data frame that only shows the wOBA multipliers. Notice the Fangraphs wOBA multipliers slightly differ from the Tango multipliers.

library(baseballDBR)

get_bbdb(table = c("Batting", "Pitching", "Fielding"))

fangraphs_woba <- wOBA_values(Batting, Pitching, Fielding, Fangraphs=T)
head(fangraphs_woba, 3)
#>   yearID lg_woba woba_scale   wBB  wHBP   w1B   w2B   w3B   wHR runSB
#> 1   2017   0.320      1.192 0.693 0.723 0.878 1.236 1.558 1.989   0.2
#> 2   2016   0.318      1.212 0.691 0.721 0.878 1.242 1.569 2.015   0.2
#> 3   2015   0.313      1.251 0.687 0.718 0.881 1.256 1.594 2.065   0.2
#>    runCS lg_r_pa lg_r_w  cFIP
#> 1 -0.421   0.121 10.007 3.126
#> 2 -0.410   0.118  9.778 3.147
#> 3 -0.392   0.113  9.421 3.134

tango_woba <- wOBA_values(Batting, Pitching, Fielding, Fangraphs=F)
head(tango_woba, 3)
#> # A tibble: 3 x 35
#> # Groups:   yearID, RperOut, runBB, runHBP, run1B, run2B, run3B, runHR,
#> #   runSB, runCS [3]
#>   yearID    AB     R     H   X2B   X3B    HR    SB    CS    BB    SO   IBB
#>    <int> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <int> <dbl> <dbl>
#> 1   1871 23179  5659  6616   950   495   101   948   270   817   371     0
#> 2   1872 34755  7487 10003  1212   293    88   536   264   477   532     0
#> 3   1873 40346  8487 11832  1308   472   102   395   253   747   552     0
#> # ... with 23 more variables: HBP <dbl>, SF <dbl>, RperOut <dbl>,
#> #   runBB <dbl>, runHBP <dbl>, run1B <dbl>, run2B <dbl>, run3B <dbl>,
#> #   runHR <dbl>, runSB <dbl>, runCS <dbl>, runMinus <dbl>, runPlus <dbl>,
#> #   lg_woba <dbl>, woba_scale <dbl>, wBB <dbl>, wHBP <dbl>, w1B <dbl>,
#> #   w2B <dbl>, w3B <dbl>, wHR <dbl>, wSB <dbl>, wCS <dbl>

Create Local Database

A relational database is not needed to work with these data. However, we may want to store the data to be called more quickly at a later time. We can download all of the tables at once with the get_bbdb() function and then write them to an empty schema in our favorite database. The example uses a newly created PostgreSQL instance, but other database tools can be used assuming an appropriate R package exists.

library(baseballDBR)
library(RPostgreSQL)

# Load all tables into the Global Environment.
get_bbdb(AllTables = TRUE)

# Make a list of all data frames.
dbTables <- names(Filter(isTRUE, eapply(.GlobalEnv, is.data.frame)))

# Load data base drivers and load all data frames in a loop.
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, host= "localhost", dbname= "lahman", user= "YOUR_USERNAME", password = "YOUR_PASSWORD")

for (i in 1:length(dbTables)) { 
    dbWriteTable(con, name =  dbTables[i], value = get0(dbTables[i]), overwrite = TRUE) 
}

# Disconnect from database.
dbDisconnect(con)
rm(con, drv)

baseballdbr's People

Contributors

keberwein avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

baseballdbr's Issues

Master.csv not available

library(baseballDBR)
get_bbdb(table = "Master")
cannot open URL 'https://raw.githubusercontent.com/chadwickbureau/baseballdatabank/master/core/Master.csv': HTTP status was '404 Not Found'Error in file(file, "rt") : cannot open the connection

Other tables I have tried work smoothly

If I download all tables, I get 27 (as stated in get_bbdb() info) but Master is not one of them

> get_bbdb(AllTables = TRUE)
trying URL 'https://github.com/chadwickbureau/baseballdatabank/archive/master.zip'
Content type 'application/zip' length unknown
downloaded 8.5 MB
[1] "Teams"               "Appearances"         "Parks"               "FieldingPost"       
 [5] "Batting"             "AwardsSharePlayers"  "SeriesPost"          "AwardsPlayers"      
 [9] "People"              "FieldingOFsplit"     "Managers"            "AwardsManagers"     
[13] "Pitching"            "BattingPost"         "PitchingPost"        "Fielding"           
[17] "AwardsShareManagers" "HomeGames"           "FieldingOF"          "TeamsHalf"          
[21] "AllstarFull"         "HallOfFame"          "Schools"             "ManagersHalf"       
[25] "TeamsFranchises"     "Salaries"            "CollegePlaying"

On a side note, your tables includes 2016 whilst the latest Lahman package is still 2015. Are they still testing their data before issuing?

Advanced Metrics - Any more in pipeline

I'm pretty ignorant about how some of these are calculated and whether the base data allows you to do more but, for example, a recent article on 538 about Adrian Beltre included RAR (Runs above Replavement)

Your package has the somewhat similar RAA but does not allow me to directly replicate or expand on the 538 analyses

The Lahman "Master" table is called "People" in the BBDB

There was a previous issue that alerted me to possible confusion here. I would like to keep the BBDB naming schema, however we need to incorporate a warning when a user attempts to use the code get_bbdb(table = "Master") . Also, we should probably address this, and any other small differences, in the README.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.