saiemgilani / cfbscrapr Goto Github PK

A scraping and aggregating package using the CollegeFootballData API

Home Page: https://saiemgilani.github.io/cfbscrapR/

License: Other

R 99.82% CSS 0.18%

college-football college-football-players football-data sports-analytics sports-data sports-stats sportsanalytics football footballdata sports-betting sport sports

cfbscrapr's Issues

Help with Python Port of EPA Model

Hey y'all:

Thanks for putting out this library -- it's really helped me learn more about the game and working with sports data!

I have a bit of an odd request; it's not really a feature request or a bug report -- it's more of a request for code review. I'm the developer behind the iOS version of College Football Coach and as part of an update I'm working on, I'm interested in implementing the EPA model in cfbscrapR in the game. Parker's old EPA model code (thanks for that, btw -- I made sure to give you attribution in the game) helped guide me to the original model I used, but I wanted to update what I had with the latest and greatest.

The tricky part here is that, as far as I can tell, you can only convert Python-based models into the format that Apple wants people to use to integrate pre-trained models into apps. I'm a bit of an R novice, but I've been scouring through the model creation code at Meyappan's repo, and with some tinkering, I was able to cobble together Parker's older code and the cfbscrapR model code from there to produce a third version in Python using XGBoost (using its multi:softprob objective instead of R's NNet, but I'm fairly sure the method I used should produce effectively the same result -- at least, based on what I was reading) that can be converted into the right format for iOS.

The main problem I'm having here is that the average EPA for pass and rush plays looks...weird. I'm planning on cross-checking this with the results from y'all's model in R when I get the chance, but right now, things look pretty wonky -- for example, I'm fairly sure any rush play shouldn't cost a team 0.13 expected points.

Any help y'all's can provide in cross-checking my implementation against y'all's would be much appreciated. If we can get this working, I'll make sure to add proper attribution (as well as whatever else you guys would would deem necessary, of course). Please let me know if this is the right channel for this somewhat oddball request; I can send it wherever necessary if not. Thanks for your help!

Issue with ep_after and EPA variables after failure to convert on 4th down

Hello!

I was working with the play-by-play data, and I think I ran across an issue with how the expected points variable is reported after a failure to convert on 4th down. The EPA came back as positive for almost all failed 4th down conversions, which seemed bizarre.

I have attached a screenshot that I think shows the problem. In the 4th row, Baylor goes for it on 4th down and fails to convert, losing 5 yards. I believe that the ep_after should be -3.756, the negative value of Oklahoma's ep_before on the next play. Instead, it is reported as 1.073, which I am guessing is equivalent to the EP if Baylor had a 1st and 10 from this new location.

I hope that made sense, let me know if it doesn't or if you need more info!

Install failure

Hello, hopefully this is not a problem on my end, but when I run "devtools::install_github(repo="meysubb/cfbscrapR" (or when I use the other option, I get the following error: Failed to install 'cfbsrapR from Github: variable names are limited to 10000 bytes'.

From what I can tell this may be an issue with the variable names in the data. Thanks,

Season Team Stats

Describe the bug
Using the cfb_stats_season_team() function doesn't include a "Team" column like the Season Advanced Stats function does.

Expected behavior
I expected there to be a Team column so that if I want to compare, say, 4th down conversion percentage between 2 teams, I know which teams I'm comparing.

Desktop (please complete the following information):
RStudio on a Macbook Air

Additional context
Add any other context about the problem here.

No link provided in "intro tutorial" for R and R studio

cfbscrapR/docs/articles/intro.html

Line 166 in c0ee549

 <p>First things first, you need to install R and download R Studio. I’ll direct you to this link LINK HERE. Go do that now, then come back.</p> 

The intro.html file mentions "LINK HERE" with no hyper link.

Switch all GET requests to RETRY requests for robustness

Is your feature request related to a problem? Please describe.
Suggestion for best-practice - RETRY is more robust to accidental API unavailability - per https://blog.r-hub.io/2020/04/07/retry-wheel/

Describe the solution you'd like
Should be as simple as swapping out all instances of httr::GET(...) with httr::RETRY("GET",...) - is drop-in replacement

Error using cfb_pbp_data

Hey,

I tried using your package in googlecolab for the first time yesterday and could not get it to run after installing.

I tried using this code

pbp_2019 <- data.frame()
for(i in 1:15){
  data <- cfb_pbp_data(year = 2019, season_type = "regular", week = i, epa_wpa = TRUE) %>% mutate(week = i)
  df <- data.frame(data)
  pbp_2019 <- bind_rows(pbp_2019, df)
}

To get the play by play data. Unfortunately I keep receiving the same error message.

Warning message:
“prediction from discrete bam models prior to 1.8-32 is deprecated, please refit”
Error in 1:dk$nr[i]: NA/NaN argument

Im not quite sure what the error message refers to. Any help would be greatly appreciated.
Thanks alot.

Some column types in pbp_players_pos_2020.rds don't match earlier RDS files

I cloned the cfbscrapR-data library to avoid reading the play-by-play files over the internet, which takes a bit. When trying to bind all of the play-by-play files together, I get an error that the column types don't match.

Here's the code that generates the errors (I combined several runs to show all the errors generated as I fixed them one-by-one):

pbp <- tibble(year = 2014:2020) %>%
+   mutate(pbp_years = map(year, ~ readRDS(paste0("/path/to/clone/cfbscrapR-data/data/rds/pbp_players_pos_", .x, ".rds")))) %>%
+   select(-c(year)) %>%
+   unnest(pbp_years) %>%
+   tibble::as.tibble()
Error: Can't combine `..1$id_play` <double> and `..7$id_play` <character>.
Error: Can't combine `..1$half` <integer> and `..7$half` <factor<dc2fe>>.
Error: Can't combine `..1$down_end` <integer> and `..7$down_end` <factor<9364a>>.
Error: Can't combine `..1$ppa` <double> and `..7$ppa` <character>.
Error: Can't combine `..1$id_drive` <double> and `..7$id_drive` <character>.

The 2020 dataset is the one causing the errors — all the others have the same data type.

I fixed this manually by casting the relevant column to the proper data type before unnesting. Can the 2020 column types be made consistent with the other years so they can all be combined automatically?

Thanks for the consideration. You've done great work on this package — I'm learning a lot about advanced CFB stats.

R 4.0.2, MacOS 10.15.7.

Function needs season_type added

The cfb_stats_game_advanced function appears to actually only have two season type options: 'postseason' and 'both' and doesn't have a 'regular' option.

The following
cfb_stats_game_advanced( 2019, week = 1, team = NULL, opponent = NULL, excl_garbage_time = FALSE, season_type = "regular" )
produced this error
Error: Enter valid season_type (String): regular, postseason, or both
Traceback:
1. cfb_stats_game_advanced(2019, week = 1, team = NULL, opponent = NULL,
. excl_garbage_time = FALSE, season_type = "regular")
2. assertthat::assert_that(season_type %in% c("postseason", "both"),
. msg = "Enter valid season_type (String): regular, postseason, or both")

CFBData API key required by April 1

Is your feature request related to a problem? Please describe.

As per Bill’s latest:
https://blog.collegefootballdata.com/api-keys-available/

Some handling needs to happen here for the new API keys, on/before April 1 (ideally sooner).

Describe the solution you'd like
A bit of refactoring required. I can chip in if that’s helpful.

saiemgilani / cfbscrapr Goto Github PK

cfbscrapr's Issues

Help with Python Port of EPA Model

Issue with ep_after and EPA variables after failure to convert on 4th down

Install failure

Season Team Stats

No link provided in "intro tutorial" for R and R studio

Switch all GET requests to RETRY requests for robustness

Error using cfb_pbp_data

Some column types in pbp_players_pos_2020.rds don't match earlier RDS files

Function needs season_type added

CFBData API key required by April 1

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent