Git Product home page Git Product logo

cfbscrapr's Introduction

cfbscrapR [archived]

This Repository is Archived – Use cfbfastR

A scraping and aggregating package using the CollegeFootballData API

cfbscrapR is an R package for working with CFB data. It is an R API wrapper around https://collegefootballdata.com/. It provides users the capability to retrieve data from a plethora of endpoints and supplement that data with additional information (Expected Points Added/Win Probability added).

Note: The API ingests data from ESPN as well as other sources. For details on those source, please go the website linked above. Sometimes there are inconsistencies in the underlying data itself. Please report issues here or to https://collegefootballdata.com/.

Installation

You can install cfbscrapR from GitHub with:

# Then can install using the devtools package from either of the following:
devtools::install_github(repo = "saiemgilani/cfbscrapR")
# or the following (these are the exact same packages):
devtools::install_github(repo = "meysubb/cfbscrapR")

Documentation

For more information on the package and function reference, please see the cfbscrapR documentation website.

Expected Points and Win Probability models

If you would like to learn more about the Expected Points and Win Probability models, please refer to the cfbscrapR tutorials or for the code repository where the models are built, click here

Expected Points model calibration plots

(1.31% 1.15% 0.94% Calibration Error)

ep_fg_cv_loso_calibration_results.png

Win Probability model calibration plots

(0.89% 0.787% 0.669% Calibration Error)

wp_cv_loso_calibration_results.png

cfbscrapR 1.0.5

cfbscrapR 1.0.4

cfbscrapR 1.0.3

This was a big update!

  • Updated expected points models and win probability models
  • Add player and yardage columns to cfb_pbp_data() pull thanks to a great deal of help from @NickTice
  • Add spread values to the cfb_pbp_data() pull
  • Add drive detailed result with attempts at creating more accurate drive result labels
  • Added series and first down variables
  • Added argumentation to allow for San Jose State to be entered without accent into cfb_pbp_data() function team argument.

cfbscrapr's People

Contributors

ehess avatar kazink36 avatar meysubb avatar natemanzo avatar saiemgilani avatar schooleyaaron avatar spfleming avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cfbscrapr's Issues

Issue with ep_after and EPA variables after failure to convert on 4th down

Hello!

I was working with the play-by-play data, and I think I ran across an issue with how the expected points variable is reported after a failure to convert on 4th down. The EPA came back as positive for almost all failed 4th down conversions, which seemed bizarre.

I have attached a screenshot that I think shows the problem. In the 4th row, Baylor goes for it on 4th down and fails to convert, losing 5 yards. I believe that the ep_after should be -3.756, the negative value of Oklahoma's ep_before on the next play. Instead, it is reported as 1.073, which I am guessing is equivalent to the EP if Baylor had a 1st and 10 from this new location.

cfbscrapr_problem

I hope that made sense, let me know if it doesn't or if you need more info!

Help with Python Port of EPA Model

Hey y'all:

Thanks for putting out this library -- it's really helped me learn more about the game and working with sports data!

I have a bit of an odd request; it's not really a feature request or a bug report -- it's more of a request for code review. I'm the developer behind the iOS version of College Football Coach and as part of an update I'm working on, I'm interested in implementing the EPA model in cfbscrapR in the game. Parker's old EPA model code (thanks for that, btw -- I made sure to give you attribution in the game) helped guide me to the original model I used, but I wanted to update what I had with the latest and greatest.

The tricky part here is that, as far as I can tell, you can only convert Python-based models into the format that Apple wants people to use to integrate pre-trained models into apps. I'm a bit of an R novice, but I've been scouring through the model creation code at Meyappan's repo, and with some tinkering, I was able to cobble together Parker's older code and the cfbscrapR model code from there to produce a third version in Python using XGBoost (using its multi:softprob objective instead of R's NNet, but I'm fairly sure the method I used should produce effectively the same result -- at least, based on what I was reading) that can be converted into the right format for iOS.

The main problem I'm having here is that the average EPA for pass and rush plays looks...weird. I'm planning on cross-checking this with the results from y'all's model in R when I get the chance, but right now, things look pretty wonky -- for example, I'm fairly sure any rush play shouldn't cost a team 0.13 expected points.

Any help y'all's can provide in cross-checking my implementation against y'all's would be much appreciated. If we can get this working, I'll make sure to add proper attribution (as well as whatever else you guys would would deem necessary, of course). Please let me know if this is the right channel for this somewhat oddball request; I can send it wherever necessary if not. Thanks for your help!

Error using cfb_pbp_data

Hey,

I tried using your package in googlecolab for the first time yesterday and could not get it to run after installing.

I tried using this code

pbp_2019 <- data.frame()
for(i in 1:15){
  data <- cfb_pbp_data(year = 2019, season_type = "regular", week = i, epa_wpa = TRUE) %>% mutate(week = i)
  df <- data.frame(data)
  pbp_2019 <- bind_rows(pbp_2019, df)
} 

To get the play by play data. Unfortunately I keep receiving the same error message.

Warning message:
“prediction from discrete bam models prior to 1.8-32 is deprecated, please refit”
Error in 1:dk$nr[i]: NA/NaN argument

Im not quite sure what the error message refers to. Any help would be greatly appreciated.
Thanks alot.

Install failure

Hello, hopefully this is not a problem on my end, but when I run "devtools::install_github(repo="meysubb/cfbscrapR" (or when I use the other option, I get the following error: Failed to install 'cfbsrapR from Github: variable names are limited to 10000 bytes'.

From what I can tell this may be an issue with the variable names in the data. Thanks,

Function needs season_type added

The cfb_stats_game_advanced function appears to actually only have two season type options: 'postseason' and 'both' and doesn't have a 'regular' option.

The following
cfb_stats_game_advanced( 2019, week = 1, team = NULL, opponent = NULL, excl_garbage_time = FALSE, season_type = "regular" )
produced this error
Error: Enter valid season_type (String): regular, postseason, or both
Traceback:
1. cfb_stats_game_advanced(2019, week = 1, team = NULL, opponent = NULL,
. excl_garbage_time = FALSE, season_type = "regular")
2. assertthat::assert_that(season_type %in% c("postseason", "both"),
. msg = "Enter valid season_type (String): regular, postseason, or both")

Some column types in pbp_players_pos_2020.rds don't match earlier RDS files

I cloned the cfbscrapR-data library to avoid reading the play-by-play files over the internet, which takes a bit. When trying to bind all of the play-by-play files together, I get an error that the column types don't match.

Here's the code that generates the errors (I combined several runs to show all the errors generated as I fixed them one-by-one):

pbp <- tibble(year = 2014:2020) %>%
+   mutate(pbp_years = map(year, ~ readRDS(paste0("/path/to/clone/cfbscrapR-data/data/rds/pbp_players_pos_", .x, ".rds")))) %>%
+   select(-c(year)) %>%
+   unnest(pbp_years) %>%
+   tibble::as.tibble()
Error: Can't combine `..1$id_play` <double> and `..7$id_play` <character>.
Error: Can't combine `..1$half` <integer> and `..7$half` <factor<dc2fe>>.
Error: Can't combine `..1$down_end` <integer> and `..7$down_end` <factor<9364a>>.
Error: Can't combine `..1$ppa` <double> and `..7$ppa` <character>.
Error: Can't combine `..1$id_drive` <double> and `..7$id_drive` <character>.

The 2020 dataset is the one causing the errors — all the others have the same data type.

I fixed this manually by casting the relevant column to the proper data type before unnesting. Can the 2020 column types be made consistent with the other years so they can all be combined automatically?

Thanks for the consideration. You've done great work on this package — I'm learning a lot about advanced CFB stats.

R 4.0.2, MacOS 10.15.7.

Season Team Stats

Describe the bug
Using the cfb_stats_season_team() function doesn't include a "Team" column like the Season Advanced Stats function does.

Expected behavior
I expected there to be a Team column so that if I want to compare, say, 4th down conversion percentage between 2 teams, I know which teams I'm comparing.

Desktop (please complete the following information):
RStudio on a Macbook Air

Additional context
Add any other context about the problem here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.