Git Product home page Git Product logo

hoopr's Introduction

hoopR

CRAN status CRAN downloads Version-Number R-CMD-check Lifecycle:maturing

Contributors

hoopR is an R package for working with men’s basketball data.

The package has functions to access live play by play and box score data from ESPN with shot locations when available. As of version 1.3.0, hoopR is also a full NBA Stats API wrapper with 127 functions added in this release.

It is additionally a scraping and aggregating interface for Ken Pomeroy’s men’s college basketball statistics website, kenpom.com. It provides users with an active subscription the capability to scrape the website tables and analyze the data for themselves.

Installation

You can install the CRAN version of hoopR with:

install.packages("hoopR")

You can install the released version of hoopR from GitHub with:

# You can install using the pacman package using the following code:
if (!requireNamespace('pacman', quietly = TRUE)){
  install.packages('pacman')
}
pacman::p_load_current_gh("sportsdataverse/hoopR", dependencies = TRUE, update = TRUE)

Quick Start

NBA full play-by-play seasons (2002-2024) ~ 1-2 minutes

tictoc::tic()
progressr::with_progress({
  nba_pbp <- hoopR::load_nba_pbp()
})
tictoc::toc()
## 1.02 sec elapsed

## 109783 rows of NBA play-by-play data from 230 games.

Men’s college basketball full play-by-play seasons (2006-2024) ~ 2-3 minutes

tictoc::tic()
progressr::with_progress({
  mbb_pbp <-  hoopR::load_mbb_pbp()
})
tictoc::toc()
## 1.62 sec elapsed

## 360517 rows of men's college basketball play-by-play data from 1074 games.

Documentation

For more information on the package and function reference, please see the hoopR documentation website.

Breaking Changes

Full News on Releases

Follow the SportsDataverse (@SportsDataverse) on Twitter and star this repo

GitHub stars

Our Authors

  • Saiem Gilani (@saiemgilani)

@saiemgilani

Our Contributors (they’re awesome)

  • Jason Lee (@theFirmAISports)

    @papagorgio23

  • Billy Fryer (@_b4billy_)

    @billyfryer

  • Ross Drucker (@rossdrucker9)

    @rossdrucker

  • Vladislav Shufinskiy (@vshufinskiy)

    @shufinskiy

Citations

To cite the hoopR R package in publications, use:

BibTex Citation

@misc{gilani_2021_hoopR,
  author = {Gilani, Saiem},
  title = {hoopR: The SportsDataverse's R Package for Men's Basketball Data.},
  url = {https://hoopR.sportsdataverse.org},
  year = {2021}
}

hoopr's People

Contributors

andreweatherman avatar john-b-edwards avatar jpatak16 avatar rossdrucker avatar saiemgilani avatar shufinskiy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hoopr's Issues

Features for 2.0

kp_winprob Error:

Error with kp_winprob:
I am running into an error when using kp_winprob. The function worked as recently as early last month. Running the latest {hoopR} version (1.3).

Error message:

Error: Problem with mutate() column pd. pd = stringr::str_remove(.data$pd, "'Pd':"). x Column pd not found in .data

To Reproduce
Steps to reproduce the behavior:

  1. library(hoopR)
  2. kp_winprob(4316, 2019) (or any mix of game and year)

Doesn't build DB when using with RJDBC

When trying to run the function as defaults, it runs perfectly, but when trying to write to my database via JDBC, it breaks with the following error:

Error in .local(conn, statement, ...) : 
  execute JDBC update query failed in dbSendUpdate
  JDBC ERROR: [250006:SQL_EXECUTION_BIND_FAILED] Unmatch bind parameter count, expect=55, actual=0 on executing statement (prepared)  (sql="INSERT INTO hoopr VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)") (db='public') (user='admin') (clientId='af24157a-6ffe-45eb-871a-a8567ff0ad14:4') (clientNd='{clientId=71, address=192.168.1.206:38050}') (address=192.168.1.206:20001, partitionId=9381)
  Statement: INSERT INTO hoopr VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)

Edit: another error that comes up if running with debug on:

Error in .jcall(bx, "V", "addStrings", as.character(o)) : 
  Java Exception <no description because toString() failed>.local(conn, statement, ...)new("jobjRef", jobj = <pointer: 0x55e61f91c110>, jclass = "java/lang/Throwable")

I suspect this is the actual issue with this trying to run. Something not playing nice with rjava?

Some more context: I am using GridDB with its JDBC jar file.

And this is what I am calling:

drv <- JDBC("com.toshiba.mwcloud.gs.sql.Driver",
            "/usr/share/java/gridstore-jdbc-5.0.0.jar",
             identifier.quote = "`")

conn <- dbConnect(drv, "jdbc:gs://239.0.0.1:41999/defaultCluster/public", "admin", "admin")

load_nba_pbp(
  seasons = 2006,
  tablename = "hoopr",
  dbConnection = conn
)

I've tried to figure out how to change up the prepared statement but I can't even figure out what this error is trying to tell me is happening

Missing multiple seasons of team box-score data

Appears as though the hoopR data repository is missing the 2006-2011 data files for team box scores:

hoopR::load_mbb_team_box(2003:hoopR:::most_recent_mbb_season())
#> Warning in readRDS(con): cannot open URL 'https://raw.githubusercontent.com/
#> saiemgilani/hoopR-data/master/mbb/team_box/rds/team_box_2006.rds': HTTP status
#> was '404 Not Found'
#> Warning: Failed to readRDS from <https://raw.githubusercontent.com/saiemgilani/
#> hoopR-data/master/mbb/team_box/rds/team_box_2006.rds>
#> Warning in readRDS(con): cannot open URL 'https://raw.githubusercontent.com/
#> saiemgilani/hoopR-data/master/mbb/team_box/rds/team_box_2007.rds': HTTP status
#> was '404 Not Found'
#> Warning: Failed to readRDS from <https://raw.githubusercontent.com/saiemgilani/
#> hoopR-data/master/mbb/team_box/rds/team_box_2007.rds>
#> Warning in readRDS(con): cannot open URL 'https://raw.githubusercontent.com/
#> saiemgilani/hoopR-data/master/mbb/team_box/rds/team_box_2008.rds': HTTP status
#> was '404 Not Found'
#> Warning: Failed to readRDS from <https://raw.githubusercontent.com/saiemgilani/
#> hoopR-data/master/mbb/team_box/rds/team_box_2008.rds>
#> Warning in readRDS(con): cannot open URL 'https://raw.githubusercontent.com/
#> saiemgilani/hoopR-data/master/mbb/team_box/rds/team_box_2009.rds': HTTP status
#> was '404 Not Found'
#> Warning: Failed to readRDS from <https://raw.githubusercontent.com/saiemgilani/
#> hoopR-data/master/mbb/team_box/rds/team_box_2009.rds>
#> Warning in readRDS(con): cannot open URL 'https://raw.githubusercontent.com/
#> saiemgilani/hoopR-data/master/mbb/team_box/rds/team_box_2010.rds': HTTP status
#> was '404 Not Found'
#> Warning: Failed to readRDS from <https://raw.githubusercontent.com/saiemgilani/
#> hoopR-data/master/mbb/team_box/rds/team_box_2010.rds>
#> Warning in readRDS(con): cannot open URL 'https://raw.githubusercontent.com/
#> saiemgilani/hoopR-data/master/mbb/team_box/rds/team_box_2011.rds': HTTP status
#> was '404 Not Found'
#> Warning: Failed to readRDS from <https://raw.githubusercontent.com/saiemgilani/
#> hoopR-data/master/mbb/team_box/rds/team_box_2011.rds>
#> # A tibble: 121,440 x 74
#>    team.shortDisplayName team.uid  team.alternateCo~ team.color team.displayName
#>    <chr>                 <chr>     <chr>             <chr>      <chr>           
#>  1 Saint Mary's          s:40~l:4~ d02f35            003768     Saint Mary's Ga~
#>  2 Gonzaga               s:40~l:4~ cfd4d8            002967     Gonzaga Bulldogs
#>  3 N Michigan            s:40~l:4~ <NA>              000000     Northern Michig~
#>  4 Marquette             s:40~l:4~ ffffff            083963     Marquette Golde~
#>  5 Colorado              s:40~l:4~ ffd200            d1c57e     Colorado Buffal~
#>  6 Cal Poly              s:40~l:4~ eed897            1E4D2B     Cal Poly Mustan~
#>  7 Nicholls              s:40~l:4~ f0f0f0            C41230     Nicholls Colone~
#>  8 Florida State         s:40~l:4~ ceb888            782F40     Florida State S~
#>  9 Troy                  s:40~l:4~ 88898c            AE0210     Troy Trojans    
#> 10 LSU                   s:40~l:4~ fdd023            2B0D57     LSU Tigers      
#> # ... with 121,430 more rows, and 69 more variables: team.name <chr>,
#> #   team.logo <chr>, team.location <chr>, team.id <chr>,
#> #   team.abbreviation <chr>, team.slug <chr>,
#> #   fieldGoalsMade-fieldGoalsAttempted <chr>, fieldGoalPct <chr>,
#> #   threePointFieldGoalsMade-threePointFieldGoalsAttempted <chr>,
#> #   threePointFieldGoalPct <chr>, freeThrowsMade-freeThrowsAttempted <chr>,
#> #   freeThrowPct <chr>, totalRebounds <chr>, offensiveRebounds <chr>, ...

Fortunately it looks like these are the only missing files, I poked around the hoopR-data repository and these were the only MBB files missing.

espn_mbb_scoreboard() does not work for 2022 season

Describe the bug
Only able to load pending games for 2022 using "2021" for season. Using 2022 or most_recent_mbb_season() results in error: season must be between 2001 and 2021. Using individual dates for season for 2022 season games (i.e. season= "20211202") works as expected

hoopR 1.5

joining data frames with different player ID conventions

It seems like some data frames use ESPN player IDs, and some have NBA player IDs. Is there a table in the package that has both IDs and helps with joining these data frames? Here is an example:

#### Player list has PERSON_ID ####

p = nba_commonallplayers(
  is_only_current_season = 0,
  league_id = "00")
p=p$CommonAllPlayers
p %>% 
  filter(DISPLAY_FIRST_LAST=='LeBron James') %>% 
  as.data.frame() ## PERSON_ID 2544, no other ID columns


#### Player bio info has PERSON_ID ####

dd = nba_commonplayerinfo(league='00',player_id = 2544)
dd$CommonPlayerInfo %>% as.data.frame() ## PERSON_ID 2544, no other ID columns


#### Player box score has athlete_id ####

pbox = load_nba_player_box()
pbox %>% 
  filter(athlete_display_name=='LeBron James') %>%
  head(1) %>% 
  as.data.frame() ## athlete_id 1966, no PERSON_ID or other ID columns

Object 'kenpom' not found

I have a KenPom account and I have followed the instructions for how to login via R, It even recognizes that I do have an account. However, when I try to load any of the KenPom packages I get an error along the lines of:

"object 'kenpom' not found"

Is there a reason why I can't load any of the KenPom information?

gameplan function error

Describe the bug
get_gameplan function kicks an error:

get_gameplan(browser, "Iowa", year = 2020)
Error in manip_at(.tbl, .vars, .funs, enquo(.funs), caller_env(), .include_group_vars = TRUE, :
argument ".funs" is missing, with no default

To Reproduce
Steps to reproduce the behavior:

  1. run get_gameplan with any team in 2020 / 2021, returns same error

Expected behavior
Expect function to return data table

kp_box 'not found' error

Very similar to the scheduling bug that I reported a few days ago:

When running the kp_box function for last season, I run into the following error with numerous games:

Error in kp_box(game_id = game_id, year = 2021) : object 'kenpom' not found

Again, I'm not sure what is causing this error as the url that the function is calling does redirect to a legitimate box score on KenPom. I filtered out games (working with a long list of game IDs for last season) against non-D1 opponents, too, thinking that might solve the issue, but it didn't.

The function calls the following url:

url <- paste0("https://kenpom.com/box.php?", "g=", game_id,
"&y=", year)

Running an example game, game_id=6, through the function yields the error mentioned above, kp_box(game_id=6, year=2021), however, the url that the function should be redirecting to does exist: https://kenpom.com/box.php?g=6&y=2021.

The function is working for some games (e.g. kp_box(game_id=1, year=2021)), but I cannot figure out why it is returning an error for others. The missing games do not go into OT, nor are their box scores formatted differently. It's not an issue with specific teams, either. I'm not sure how many box scores this affects, but the issue is certainly not isolated.

Unable to access KenPom data in R

Describe the bug
When using 'login' function, it returns:

Error in read_html.response(x$response, ..., base_url = x$url) :
Forbidden (HTTP 403).
In addition: Warning message:
In session_set_response(x, resp) : Forbidden (HTTP 403).

To Reproduce
Steps to reproduce the behavior:

  1. Go to R
  2. load hoopR
  3. try 'login' function with Kenpom credential

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Microsoft Windows 10
  • R version: 4.2.2

object 'kenpom' not found

Describe the bug
It seems like something is broken with KenPom functions in hoopR (or I'm doing something wrong). I'm also noticing that documentation on the hoopR website is displaying the error too. Screenshot below.

To Reproduce
Steps to reproduce the behavior:

> library(hoopR)
> hoopR::kp_hca()
Error in hoopR::kp_hca() : object 'kenpom' not found
> hoopR::kp_conf(2021, conf = "B10")
Error in hoopR::kp_conf(2021, conf = "B10") : object 'kenpom' not found

Expected behavior
It does look like values do exist - https://kenpom.com/hca.php - and I would expect that those values would be returned.

Screenshots
image

image

nba_schedule() issue with game_date field

Describe the bug
nba_schedule() does not always return a schedule due to parsing of game_date field with lubridate::ymd()

To Reproduce

library(hoopR)
nba_schedule(1984)
#> Request failed [500]. Retrying in 1.5 seconds...
#> Request failed [500]. Retrying in 1.1 seconds...
#> Error: The API returned an error
nba_schedule(1985)
#> # A tibble: 1,014 × 18
#>    game_id    season_t…¹ seaso…² visit…³ visit…⁴ visit…⁵ visit…⁶ visit…⁷ visit…⁸
#>    <chr>      <chr>      <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
#>  1 0028500004 2          Regula… 161061… Boston  Celtics Boston  BOS     Boston…
#>  2 0028500005 2          Regula… 161061… Washin… Bullets Washin… WAS     Washin…
#>  3 0028500006 2          Regula… 161061… Los An… Clippe… L.A. C… LAC     Los An…
#>  4 0028500007 2          Regula… 161061… Clevel… Cavali… Clevel… CLE     Clevel…
#>  5 0028500008 2          Regula… 161061… Denver  Nuggets Denver  DEN     Denver…
#>  6 0028500009 2          Regula… 161061… Phoenix Suns    Phoenix PHX     Phoeni…
#>  7 0028500010 2          Regula… 161061… Milwau… Bucks   Milwau… MIL     Milwau…
#>  8 0028500011 2          Regula… 161061… Houston Rockets Houston HOU     Housto…
#>  9 0028500012 2          Regula… 161061… Philad… 76ers   Philad… PHL     Philad…
#> 10 0028500013 2          Regula… 161061… Atlanta Hawks   Atlanta ATL     Atlant…
#> # … with 1,004 more rows, 9 more variables: home_team_id <chr>,
#> #   home_city <chr>, home_nickname <chr>, home_name_short <chr>,
#> #   home_abbr <chr>, home_team_name_full <chr>, game_date <date>,
#> #   game_start_time <chr>, day <chr>, and abbreviated variable names
#> #   ¹​season_type_id, ²​season_type_description, ³​visitor_team_id, ⁴​visitor_city,
#> #   ⁵​visitor_nickname, ⁶​visitor_name_short, ⁷​visitor_abbr,
#> #   ⁸​visitor_team_name_full

Created on 2023-08-24 by the reprex package (v2.0.1)

Expected behavior

> nba_schedule(1984)
# A tibble: 1,025 × 47
   game_date    game_id game_…¹ game_…² game_…³ game_…⁴ game_…⁵ game_…⁶ game_…⁷ game_…⁸ game_…⁹ game_…˟ away_…˟ home_…˟ day  
   <chr>        <chr>   <chr>     <int> <chr>     <int> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>
#>  1 09/03/1984 … 001840… 198409…       1 7:00 p…       1 1984-0… 1900-0… 1984-0… 1984-0… 1900-0… 1984-0… 0001-0… 0001-0… Mon  
#>  2 09/03/1984 … 001840… 198409…       1 7:00 p…       2 1984-0… 1900-0… 1984-0… 1984-0… 1900-0… 1984-0… 0001-0… 0001-0… Mon  
#>  3 09/05/1984 … 001840… 198409…       1 7:00 p…       1 1984-0… 1900-0… 1984-0… 1984-0… 1900-0… 1984-0… 0001-0… 0001-0… Wed  
#>  4 09/05/1984 … 001840… 198409…       1 7:00 p…       2 1984-0… 1900-0… 1984-0… 1984-0… 1900-0… 1984-0… 0001-0… 0001-0… Wed  
#>  5 09/07/1984 … 001840… 198409…       1 7:00 p…       1 1984-0… 1900-0… 1984-0… 1984-0… 1900-0… 1984-0… 0001-0… 0001-0… Fri  
#>  6 09/07/1984 … 001840… 198409…       1 7:00 p…       2 1984-0… 1900-0… 1984-0… 1984-0… 1900-0… 1984-0… 0001-0… 0001-0… Fri  
#>  7 09/09/1984 … 001840… 198409…       1 7:00 p…       1 1984-0… 1900-0… 1984-0… 1984-0… 1900-0… 1984-0… 0001-0… 0001-0… Sun  
#>  8 09/09/1984 … 001840… 198409…       1 7:00 p…       2 1984-0… 1900-0… 1984-0… 1984-0… 1900-0… 1984-0… 0001-0… 0001-0… Sun  
#>  9 09/11/1984 … 001840… 198409…       1 7:00 p…       1 1984-0… 1900-0… 1984-0… 1984-0… 1900-0… 1984-0… 0001-0… 0001-0… Tue  
#> 10 09/11/1984 … 001840… 198409…       1 7:00 p…       2 1984-0… 1900-0… 1984-0… 1984-0… 1900-0… 1984-0… 0001-0… 0001-0… Tue  
# … with 1,015 more rows, 32 more variables: month_num <int>, week_number <int>, week_name <chr>, if_necessary <chr>,
#   series_game_number <chr>, series_text <chr>, arena_name <chr>, arena_state <chr>, arena_city <chr>,
#   postponed_status <chr>, branch_link <chr>, game_subtype <chr>, home_team_id <int>, home_team_name <chr>,
#   home_team_city <chr>, home_team_tricode <chr>, home_team_slug <chr>, home_team_wins <int>, home_team_losses <int>,
#   home_team_score <int>, home_team_seed <int>, away_team_id <int>, away_team_name <chr>, away_team_city <chr>,
#   away_team_tricode <chr>, away_team_slug <chr>, away_team_wins <int>, away_team_losses <int>, away_team_score <int>,
#   away_team_seed <int>, season <chr>, league_id <chr>, and abbreviated variable names ¹​game_code, ²​game_status, …
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Screenshots
N/A

Desktop (please complete the following information):

  • OS: macOS catalina
  • Browser: Chrome
  • Version 116.0.5845.96

Additional context
N/A

Not able to use the nba_boxscoresummaryv2() function

When I attempt to use the nba_boxscoresummaryv2() function I get an error that says the object 'df_list' is not found. I've entered multiple games ids from each of the past two seasons and it's the same error for all of them. I am working on a capstone thesis and need the officials for each game, so help here would be much appreciated!

Problem with matchups data, it returns me this error

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

KenPom 2022 Scheduling Error

When running the kp_team_schedule function for the upcoming season (year=2022), I run into the following error:

Error in (page %>% xml2::read_html() %>% rvest::html_elements(css = "#schedule-table") %>% : subscript out of bounds

I'm not quite sure what's causing this error. I assume that the year variable is throwing something off, but the url that the function is using does redirect to the correct page:

url <- paste0("https://kenpom.com/team.php?", "team=", team_name,
"&y=", year)

Throwing in year=2022 results in the following url, https://kenpom.com/team.php?team=Duke&y=2022, which does work. Nothing in the KenPom scheduling page has changed, as far as I know, so I'm not sure why this isn't working.

nba_leaguegamelog function 400 error when running on Mac

Describe the bug
When running the nba_leaguegamelog function on a Mac using version 1.7.0 of hoopR, I get an error when using "Regular Season" in the season_type argument.

This function works when I run on Windows using both "Regular Season" or "Playoffs" in the sesaon_type argument.

It also works on the Mac when I pass in "Playoffs" in the season_type argument (but doesn't work when using "Regular Season".)

I've tried running the function using chrome and then safari as default browser. I've also tried clearing browser cache and cookies. I'm running the script in RStudio.

This is the error shown when running the function on Mac:

Request failed [400]. Retrying in 1.9 seconds...
Request failed [400]. Retrying in 2.1 seconds...
2022-07-30 14:11:14: Invalid arguments or no league game log data for 2021 available!
Error in nba_leaguegamelog(season = 2021, season_type = "Regular Season", :
object 'df_list' not found

To Reproduce

library(hoopR)

test <- nba_leaguegamelog(season = 2021, season_type = "Regular Season", 
                          player_or_team = "T") 

Thanks,

Bill

kp_teamstats Defensive Stats Not Populating Correctly

Describe the bug
When using the kp_teamstats function, the data for the offensive data is correct, but the defensive data is pulling for the wrong team.
For the example of Boston College, they are pulling Wyoming's defensive stats, as seen

KP Offense
Screen Shot 2021-11-11 at 1 02 05 PM
KP Defense
Screen Shot 2021-11-11 at 1 07 39 PM
R Offense
Screen Shot 2021-11-11 at 1 10 18 PM

R Defense
Screen Shot 2021-11-11 at 1 07 11 PM

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]

Additional context
Add any other context about the problem here.

nba_playerindex()

https://stats.nba.com/stats/playerindex?College=&Country=&DraftPick=&DraftRound=&DraftYear=&Height=&Historical=1&LeagueID=00&Season=2021-22&SeasonType=Regular%20Season&TeamID=0&Weight=

headers:

GET /stats/playerindex?College=&Country=&DraftPick=&DraftRound=&DraftYear=&Height=&Historical=1&LeagueID=00&Season=2021-22&SeasonType=Regular%20Season&TeamID=0&Weight= HTTP/1.1
Host: stats.nba.com
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="98", "Google Chrome";v="98"
sec-ch-ua-mobile: ?0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36
sec-ch-ua-platform: "Windows"
Accept: */*
Origin: https://www.nba.com
Sec-Fetch-Site: same-site
Sec-Fetch-Mode: cors
Sec-Fetch-Dest: empty
Referer: https://www.nba.com/
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9

Would love possession_number column in nba pbp

Could we get a possession_number column in the NBA pbp data? I'm currently seeking to answer the question: "How many points is a Personal Foul worth?" and I'm hoping to measure average points scored on possessions where there are:

  1. An offensive foul
  2. A defensive foul
  3. No fouls

And being able to group by possession seems the easiest way to do this. Am I missing an easy way to morph the data for this?

a few duplicated/incorrect mbb box scores

Describe the bug
Hi there! There appears to be a bug that reads in some incorrect/duplicated mbb box scores when using the load_mbb_player_box() function. Thankfully, these incorrect/duplicated box scores all appear to be within a single game (game id = 401253901).

To Reproduce
library(HoopR)
xx <- load_mbb_player_box(2021)
View(xx[which(xx$game_id == 401253901), ])

You can see that Cade Cunningham, Rondel Walker, and Avery Anderson III have their box scores duplicated, with these duplicated box scores have their listed team as their opponent:

Screen Shot 2021-09-28 at 10 54 21 AM

Is there a fix on my end for this? Thanks in advance for the help!

nba_homepageleaders not working for defense

Describe the bug
When changing the stat_category to 'Defense', we no longer get a response. I don't see this issue for any of the other stat_category as they all get a response.

Screenshots
Screen Shot 2022-01-07 at 1 15 07 PM

Inconsistency in Game Spreads

Hi there - I have noticed there are some inconsistencies with the game spreads.

Here is an example:

In the first 3 rows, powerhouses like Gonzaga, Kansas, and Baylor are the home team and heavily favored. The home_team_spread column is negative.

In rows 4-6, Michigan, Duke, and Illinois are the home team and heavy favorites, but this time the home_team_spread column is positive.

All the home teams in this subset were favored but are being labeled differently in the spread columns.

image

Is there an easy fix for this?

Inconsistent NBA Game IDs in Play by Play Data

I am using the function load_nba_pbp to get seasons worth of play by play data. However, when I try to merge certain subsets of this data with other tables I have by "game_id", I realized that the game_id's in your play by play data is not consistent with the formulation of game_id that the NBA Stats API uses. Is there any reason for this or documentation for how these changes were made?

Additionally, is there any way you would be able to add an extra variable that could indicate the NBA Stats API Game ID, if game_id in your data is a sportsdataverse ID.

Schedule not updating

The results of the games in the schedule from the load_nba_schedule function have not updated since April second, and the schedule is not being updated along with the nba playoff and play-in schedule.

Was working smoothly up until 4/2.

`nba_commonallplayers` no longer gives `BIRTH_DATE`

Previously nba_commonallplayers would return BIRTH_DATE for each player. Now it doesn't. It might be because the NBA API changed slightly. This site https://www.nba.com/stats/players/bio does not have date of birth (I think it used to, but don't quote me on that).

Individual player pages do have DOB (e.g. https://www.nba.com/stats/player/201935). I tried using nba_leaguedashplayerbiostats but that does not have DOB either. Perhaps it could be added there if it can't be pulled with the bio info?

`load_mbb_schedule` does not contain March Madness games

Only appears to contain NIT games *for the postseason? Sample year in the reprex, seems to occur in every year however.

hoopR::load_mbb_schedule(2019) |>
    dplyr::filter(date >= "2019-03-19") |>
    dplyr::pull(notes) |>
    unique()
#> [1] "[{'type': 'event', 'headline': 'NIT - CHAMPIONSHIP'}]"
#> [2] "[{'type': 'event', 'headline': 'NIT - SEMIFINAL'}]"   
#> [3] "[{'type': 'event', 'headline': 'NIT - QUARTERFINAL'}]"
#> [4] "[{'type': 'event', 'headline': 'NIT - 2ND ROUND'}]"   
#> [5] "[{'type': 'event', 'headline': 'NIT - 1ST ROUND'}]"

NBA gamelog functions not pulling in regular season data

The when trying to load in regular season data the NBA_leageugamelog function will return an error. For example, calling both
nba_leaguegamelog(season = '2021-22',
player_or_team = "T")

or
nba_leaguegamelog(season = '2021-22',
player_or_team = "T",
season_type = "Regular Season"
)

return the following error messages: "Request failed [400]. Retrying in 1.4 seconds...
Request failed [400]. Retrying in 2.9 seconds...
2023-01-10 14:44:55: Invalid arguments or no league game log data for 2021-22 available!
Error in nba_leaguegamelog(season = "2021-22", player_or_team = "T", ) :
object 'df_list' not found"

This error does not occur when using the same function to pull in playoff data, as the following runs just fine:

nba_leaguegamelog(season = '2021-22',
player_or_team = "T",
season_type = "Playoffs"
)

This error with getting regular season data seems to also exist for other NBA log functions such as nba_playergamelog which also returns an error for regular season data but works for playoff data.

Thanks

Trouble loading KenPom player page

I get the following error running kp_team_players(tm_name, year = season):

Error in kp_team_players(tm_name, year = season) :
object 'kenpom' not found

Digging in a little further, it looks like when I try to pull roster pages, I get the following:

Warning message:
In session_set_response(x, resp) : Too Many Requests (RFC 6585) (HTTP 429).

Error in "update_nba_db": could not find function "my_time"

library(hoopR)
future::plan("multisession")
update_nba_db(dbname = 'nba.sqlite',
              tblname = 'pbp',
              force_rebuild = TRUE)

Results in:

Error in my_time() : could not find function "my_time"

> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] hoopR_1.0.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6        compiler_4.0.5    pillar_1.6.1      tools_4.0.5       bit_4.0.4        
 [6] digest_0.6.27     memoise_2.0.0     RSQLite_2.2.7     jsonlite_1.7.2    lubridate_1.7.10 
[11] lifecycle_1.0.0   tibble_3.1.1      pkgconfig_2.0.3   rlang_0.4.11      DBI_1.1.1        
[16] parallel_4.0.5    xfun_0.22         fastmap_1.1.0     furrr_0.2.2       httr_1.4.2       
[21] stringr_1.4.0     dplyr_1.0.6       xml2_1.3.2        fs_1.5.0          generics_0.1.0   
[26] vctrs_0.3.8       tictoc_1.0.1      globals_0.14.0    bit64_4.0.5       tidyselect_1.1.1 
[31] glue_1.4.2        data.table_1.14.0 listenv_0.8.0     R6_2.5.0          fansi_0.4.2      
[36] parallelly_1.25.0 blob_1.2.1        tidyr_1.1.3       purrr_0.3.4       magrittr_2.0.1   
[41] usethis_2.0.1     codetools_0.2-18  ellipsis_0.3.2    assertthat_0.2.1  rvest_1.0.0      
[46] future_1.21.0     tinytex_0.31      utf8_1.2.1        stringi_1.5.3     cachem_1.0.4     
[51] crayon_1.4.1 

BUG: espn pbp, team, and player box scores are not working

Describe the bug
When I try to access the ESPN boxscores for a game, which has worked all season so far, it now returns an error that games dont exist, even for the example game_id 401256760 used in the documentation

To Reproduce

hoopR::espn_mbb_game_all(401256760)
2022-11-17 19:10:15: Invalid arguments or no play-by-play data for 401256760 available!
2022-11-17 19:10:15: Invalid arguments or no team box score data for 401256760 available!
2022-11-17 19:10:15: Invalid arguments or no player box score data for 401256760 available!
Error in hoopR::espn_mbb_game_all(401256760) :
object 'plays_df' not found

Expected behavior
returns pbp, team, and player box score for a game

Desktop (please complete the following information):

  • OS: Windows
  • IDE: RStudio (and also google colab through r2py)

espn_nba_teams function error

Describe the bug
Using version 1.5.0 of hoopR, running espn_nba_teams() function in RStudio, I get following error msg:

2022-05-21 17:35:04: Invalid arguments or no teams data available!
Error in espn_nba_teams() : object 'teams' not found

To Reproduce
Steps to reproduce the behavior:
See above.

Expected behavior
See above.

Screenshots
See above.

Desktop (please complete the following information):
error on windows and mac

Smartphone (please complete the following information):
na

Additional context
Great package and appreciate any feedback.

ESPN mbb game data (pbp, team & player box) returning error

Describe the bug
hoopR::espn_mbb_team_box & hoopR::espn_mbb_game_all are returning errors

To Reproduce

# Example Game ID
hoopR::espn_mbb_game_all(game_id = 401256760)
2023-02-22 14:22:11: Invalid arguments or no play-by-play data for 401256760 available!
2023-02-22 14:22:11: Invalid arguments or no team box score data for 401256760 available!
2023-02-22 14:22:11: Invalid arguments or no player box score data for 401256760 available!
Error in hoopR::espn_mbb_game_all(game_id = 401256760) : 
  object 'plays_df' not found
hoopR::espn_mbb_team_box(game_id = 401025888)
2023-02-22 14:22:20: Invalid arguments or no team box score data for 401025888 available!
Error in hoopR::espn_mbb_team_box(game_id = 401025888) : 
  object 'team_box_score' not found

Desktop (please complete the following information):

  • OS: Windows 10
  • R: v4.2.2
  • RStudio: v2022.12.0+353

load_nba_schedule() returns a broken data.table object

Describe the bug
load_nba_schedule() returns somehow broken data.table object. The object throws an error on tail()

To Reproduce
load_nba_schedule(2019:2022) %>% tail()

Screenshots
image

there as_tibble() removes data.table among the object classes.
UPD: the same problem is found for load_nba_team_box() and load_nba_player_box()

Cannot load KenPom data with subscription.

Describe the bug
I am trying use the Login function for kenpom thru R and it always outputs "Error in .kp_headers() : could not find function ".kp_headers"". I loaded the envirnoment to be KP_USER = [email protected] KP_PW = XXX-YOUR-PASSWORD-XXX but idk why it is giving me the error message here. Any help would be greatly appreciated!

To Reproduce
Steps to reproduce the behavior:

  1. Go to R
  2. load hoopR
  3. try 'login' function with Kenpom login info
  4. Error outputs

Desktop (please complete the following information):
OS: Microsoft Windows 11 parallels
R version: 4.2.2

Append empty DF to existing data when requesting future seasons data?

With the load_xxx family of functions when requesting a future season with no data yet, would it be useful to append an empty dataframe for the missing season instead of erroring? As an example, when I use load_nba_pbp and most_recent_nba_season in conjunction, attempting to load a future season that doesn't exist causes an error:

hoopR:::most_recent_nba_season()
#> [1] 2022
hoopR::load_nba_pbp(2021:hoopR:::most_recent_nba_season())
#> i It is recommended to use parallel processing when trying to load multiple seasons.
#>   Please consider running `future::plan("multisession")`!
#>   Will go on sequentially...
#> Warning in readRDS(con): cannot open URL 'https://raw.githubusercontent.com/
#> saiemgilani/hoopR-data/master/nba/pbp/rds/play_by_play_2022.rds': HTTP status
#> was '404 Not Found'
#> Error in readRDS(con): cannot open the connection

I think there are some benefits to this behavior:

  • Ease of usage -- existing scripts relying on most_recent_nba_season do not error out when the season flips over to a new year.
  • With a warning, we are informing the user that the data they are requesting does not exist
  • Though we are also not erroring out before we can return the data that they requested that does exist

Even if we still want to cause an error when requesting I think some kind of test that a user is requesting a season with existing data prior to fetching the data would be an improvement over the existing behavior.

ESPN WP

  • write espn_nba_wp()
  • write espn_mbb_wp()
    ya bum

`kp_fanmatch()` future dates issue

Describe the bug
During the ETL process,

      suppressWarnings(
        x <- x %>%
          dplyr::filter(!is.na(as.numeric(.data$Poss)))
      )

This is super problematic for any future dates since there are no possessions, obviously. We should work to eliminate/check these types of filters I am doing.

To Reproduce
Current Date = '2022-01-06'
kp_fanmatch(date='2022-01-07')
returns zero rows

Approaches for games with missing box scores/pbp

So something I noticed: several games that are missing team box scores and player box scores do have PBP data available. For example:

hoopR::load_mbb_schedule(2003:2021) |>
    dplyr::filter(game_id == '290722440') |>
    dplyr::select(PBP, team_box, player_box)
#> # A tibble: 1 x 3
#>   PBP   team_box player_box
#>   <lgl> <lgl>    <lgl>     
#> 1 TRUE  FALSE    FALSE

Additionally, these games do have box scores available, just not on ESPN. For the above game, for example, one can find the box score on Louisiana Tech's website here.

Some games are altogether missing PBP data as well, which makes incorporating them into hoopR difficult.

I am curious if there would be a good way to either parse the existing data or hunt down missing play by play data and incorporate it into hoopR. This is obviously both a long-term and ongoing endeavor, so I am raising it as a suggestion for additional work to improve the scope of hoopR.

Three approaches seem wise:

  1. Write a function to parse existing hoopR PBP data into box-score format. Either do this on an ad-hoc basis or incorporate it into the existing data repositories.
  2. Come up with a guide contributors to add PBP/box score data and write tests for them to merge it into the hoopR data repository (ex. must be in .json format, must contain these values, and so on). Contributors can add games to the hoopR repository as they come across them.
  3. Explore alternate routes for acquiring PBP/box score data. Statbroadcast may be a good place to start, as well as team websites, though scraping individual team websites is a fairly daunting task beyond the scope of the sportsdataverse.

I have no strong thoughts on this, simply vomiting thoughts on hoopR out into the void. Additionally, this is by no means a pressing issue or anything that takes a quick fix. Would simply be curious as to how valuable acquiring even more data would be for hoopR, or how much we can squeeze out of leveraging the data we already have access to -- just hoping to spark some discussion on this topic!

nba_commonallplayers() is not working

Problem
The valid url for the nba_commonallplayers() function does not seem to load for me on the NBA Stats API

To Reproduce
Steps to reproduce the behavior:

  1. Try loading [this link](NBA Stats API seems to hang (at least for me) when querying ).
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Desktop (please complete the following information):

  • OS: [OS X]
  • Browser [Safari]
  • Version [14.1.2]

Incorrect game_date when using hoopR::load_mbb_player_box

Describe the bug
The game_date seems to be one day off in some cases when I use the hoopR::load_mbb_player_box function.

To Reproduce

> library(dplyr)
> library(hoopR)
> 
> player.df <- hoopR::load_mbb_player_box(seasons = 2021:2022)
> 
> player.df %>%
+   filter(team_abbreviation == "FIU", game_id == 401373765) %>%
+   select(season, game_date, athlete_display_name, min, game_id) %>%
+   arrange(season, game_date, athlete_display_name)
   season  game_date athlete_display_name min   game_id
1    2022 2022-03-04          Aquan Smart  14 401373765
2    2022 2022-03-04         Clevon Brown  24 401373765
3    2022 2022-03-04       Daniel Parrish   2 401373765
4    2022 2022-03-04         Dante Wilcox   5 401373765
5    2022 2022-03-04         Denver Jones  28 401373765
6    2022 2022-03-04          Eric Lovett  28 401373765
7    2022 2022-03-04         Isaiah Banks  25 401373765
8    2022 2022-03-04     Javaunte Hawkins  23 401373765
9    2022 2022-03-04       Mohamed Sanogo  12 401373765
10   2022 2022-03-04     Petar Krivokapic  14 401373765
11   2022 2022-03-04         Seth Pinkney  17 401373765
12   2022 2022-03-04          Victor Hart   8 401373765

Expected behavior

Based on what I see in the KenPom boxscore, I expect the game_date to be 2022-03-03.

Screenshots

Screenshot from 2022-03-04 14-20-03

Additional context

Another example is game_id == 401372788 where I'd expect the game date to be 2022-02-26. Could it be that I'm loading the data and it defaults to a timezone that happens to move the game into a new day? I briefly played with lubridate but couldn't tell if this was the case. Thanks!

nba_boxscorematchups(game_id = "0042200401") it gives me this error: Request failed [500]. Retrying in 1.1 seconds... Request failed [500]. Retrying in 3.3 seconds... 2023-06-04 17:54:20: Invalid arguments or no matchups boxscore data for 0042200401 available! Error in nba_boxscorematchups(game_id = "0042200401") : object 'df_list' not found

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.