Git Product home page Git Product logo

saiemgilani / cfbscrapr Goto Github PK

View Code? Open in Web Editor NEW
33.0 7.0 7.0 131.48 MB

A scraping and aggregating package using the CollegeFootballData API

Home Page: https://saiemgilani.github.io/cfbscrapR/

License: Other

R 99.82% CSS 0.18%
college-football college-football-players football-data sports-analytics sports-data sports-stats sportsanalytics football footballdata sports-betting sport sports

cfbscrapr's Issues

Help with Python Port of EPA Model

Hey y'all:

Thanks for putting out this library -- it's really helped me learn more about the game and working with sports data!

I have a bit of an odd request; it's not really a feature request or a bug report -- it's more of a request for code review. I'm the developer behind the iOS version of College Football Coach and as part of an update I'm working on, I'm interested in implementing the EPA model in cfbscrapR in the game. Parker's old EPA model code (thanks for that, btw -- I made sure to give you attribution in the game) helped guide me to the original model I used, but I wanted to update what I had with the latest and greatest.

The tricky part here is that, as far as I can tell, you can only convert Python-based models into the format that Apple wants people to use to integrate pre-trained models into apps. I'm a bit of an R novice, but I've been scouring through the model creation code at Meyappan's repo, and with some tinkering, I was able to cobble together Parker's older code and the cfbscrapR model code from there to produce a third version in Python using XGBoost (using its multi:softprob objective instead of R's NNet, but I'm fairly sure the method I used should produce effectively the same result -- at least, based on what I was reading) that can be converted into the right format for iOS.

The main problem I'm having here is that the average EPA for pass and rush plays looks...weird. I'm planning on cross-checking this with the results from y'all's model in R when I get the chance, but right now, things look pretty wonky -- for example, I'm fairly sure any rush play shouldn't cost a team 0.13 expected points.

Any help y'all's can provide in cross-checking my implementation against y'all's would be much appreciated. If we can get this working, I'll make sure to add proper attribution (as well as whatever else you guys would would deem necessary, of course). Please let me know if this is the right channel for this somewhat oddball request; I can send it wherever necessary if not. Thanks for your help!

Issue with ep_after and EPA variables after failure to convert on 4th down

Hello!

I was working with the play-by-play data, and I think I ran across an issue with how the expected points variable is reported after a failure to convert on 4th down. The EPA came back as positive for almost all failed 4th down conversions, which seemed bizarre.

I have attached a screenshot that I think shows the problem. In the 4th row, Baylor goes for it on 4th down and fails to convert, losing 5 yards. I believe that the ep_after should be -3.756, the negative value of Oklahoma's ep_before on the next play. Instead, it is reported as 1.073, which I am guessing is equivalent to the EP if Baylor had a 1st and 10 from this new location.

cfbscrapr_problem

I hope that made sense, let me know if it doesn't or if you need more info!

Install failure

Hello, hopefully this is not a problem on my end, but when I run "devtools::install_github(repo="meysubb/cfbscrapR" (or when I use the other option, I get the following error: Failed to install 'cfbsrapR from Github: variable names are limited to 10000 bytes'.

From what I can tell this may be an issue with the variable names in the data. Thanks,

Season Team Stats

Describe the bug
Using the cfb_stats_season_team() function doesn't include a "Team" column like the Season Advanced Stats function does.

Expected behavior
I expected there to be a Team column so that if I want to compare, say, 4th down conversion percentage between 2 teams, I know which teams I'm comparing.

Desktop (please complete the following information):
RStudio on a Macbook Air

Additional context
Add any other context about the problem here.

Error using cfb_pbp_data

Hey,

I tried using your package in googlecolab for the first time yesterday and could not get it to run after installing.

I tried using this code

pbp_2019 <- data.frame()
for(i in 1:15){
  data <- cfb_pbp_data(year = 2019, season_type = "regular", week = i, epa_wpa = TRUE) %>% mutate(week = i)
  df <- data.frame(data)
  pbp_2019 <- bind_rows(pbp_2019, df)
} 

To get the play by play data. Unfortunately I keep receiving the same error message.

Warning message:
“prediction from discrete bam models prior to 1.8-32 is deprecated, please refit”
Error in 1:dk$nr[i]: NA/NaN argument

Im not quite sure what the error message refers to. Any help would be greatly appreciated.
Thanks alot.

Some column types in pbp_players_pos_2020.rds don't match earlier RDS files

I cloned the cfbscrapR-data library to avoid reading the play-by-play files over the internet, which takes a bit. When trying to bind all of the play-by-play files together, I get an error that the column types don't match.

Here's the code that generates the errors (I combined several runs to show all the errors generated as I fixed them one-by-one):

pbp <- tibble(year = 2014:2020) %>%
+   mutate(pbp_years = map(year, ~ readRDS(paste0("/path/to/clone/cfbscrapR-data/data/rds/pbp_players_pos_", .x, ".rds")))) %>%
+   select(-c(year)) %>%
+   unnest(pbp_years) %>%
+   tibble::as.tibble()
Error: Can't combine `..1$id_play` <double> and `..7$id_play` <character>.
Error: Can't combine `..1$half` <integer> and `..7$half` <factor<dc2fe>>.
Error: Can't combine `..1$down_end` <integer> and `..7$down_end` <factor<9364a>>.
Error: Can't combine `..1$ppa` <double> and `..7$ppa` <character>.
Error: Can't combine `..1$id_drive` <double> and `..7$id_drive` <character>.

The 2020 dataset is the one causing the errors — all the others have the same data type.

I fixed this manually by casting the relevant column to the proper data type before unnesting. Can the 2020 column types be made consistent with the other years so they can all be combined automatically?

Thanks for the consideration. You've done great work on this package — I'm learning a lot about advanced CFB stats.

R 4.0.2, MacOS 10.15.7.

Function needs season_type added

The cfb_stats_game_advanced function appears to actually only have two season type options: 'postseason' and 'both' and doesn't have a 'regular' option.

The following
cfb_stats_game_advanced( 2019, week = 1, team = NULL, opponent = NULL, excl_garbage_time = FALSE, season_type = "regular" )
produced this error
Error: Enter valid season_type (String): regular, postseason, or both
Traceback:
1. cfb_stats_game_advanced(2019, week = 1, team = NULL, opponent = NULL,
. excl_garbage_time = FALSE, season_type = "regular")
2. assertthat::assert_that(season_type %in% c("postseason", "both"),
. msg = "Enter valid season_type (String): regular, postseason, or both")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.