jthomasmock / espnscraper Goto Github PK

View Code? Open in Web Editor NEW

51.0 51.0 10.0 14.11 MB

Scrapes Or Collects NFL Data From ESPN

Home Page: https://jthomasmock.github.io/espnscrapeR/

License: Other

R 100.00%

espnscraper's Issues

keep venue in get_schedule_data

In this line of get_schedule_data

espnscrapeR/R/get_nfl_schedule.R

Line 46 in 754b6b6

 select(!any_of(c("timeValid", "neutralSite", "conferenceCompetition","recent", "venue", "type"))) %>% 

you remove venue. Can you keep that instead? Or is venue info pulled by any other functions in the package?

get_college_qbr() only pulling 2020 data

Hi Tom,

Thanks for updating the package, and it's working great for 2020, but the 2020 season is the only data I can pull. If I'm reading it right, it looks like 2020 is hardcoded into the function here:

Thanks!

Unable to pull 2020 College QBR

When I try to pull the 2020 College QBR numbers I get this error:

As you can see from the image, I'm able to get the 2019 numbers fine, it just seems to be an issue with 2020.

Thanks!

Wrong team in ESPN API?

There are some team errors in the 2019 data. qbr in the below code looks like this:

In this example it's week 4 but the teams are wrong for the whole season. I don't know if those are all or if there is more.

qbr <- get_nfl_qbr("2019", season_type = "Regular", week = 4) %>%
  group_by(team_short_name) %>% 
  filter(n()>1) %>%
  select(short_name, team_short_name) %>%
  arrange(team_short_name)

Defense type in scrape_espn_stats()

Any chance we could get a Defense stat type in the scrape_espn_stats() function? This would be particularly helpful to me because ESPN is somewhat proprietary in how they define "Stuffs" compared to NFL standard Tackle for Loss, for example. I'm interested in correlations. Thanks!

Installation Assistance

Hello, This may be a bad question, but how exactly do I install this? I'm having trouble understanding how to install this program onto a machine (Debian VM).

Is someone able to in more detail explain how I install this? I tried to run the remote.... command in the terminal, and I get this output: syntax error near unexpected token `('

What am I doing wrong?

Thanks!

Two stray commas causing get_nfl_schedule to fail

These lines have commas at the end with nothing after them, and it was causing an error in get_nfl_schedule:

espnscrapeR/R/get_nfl_schedule.R

Line 72 in 025a743

away_record = list(2, "records", 1, "summary"),

espnscrapeR/R/get_nfl_schedule.R

Line 105 in 025a743

rec_leader_pos = list(3, "leaders", 1, "athlete", "position", "abbreviation"),

I updated purrr, dplyr, and tidyr and it now works, but thought I'd mention it anyway, in case others don't realize a package update will take care of it.

get_nfl_teams() Error 403

Hi Tom,
I'm trying to use the function get_nfl_teams() and not working, send me the next error

Surely espn's api is blocked
Thank u

Some boxscores do not appear

I have tried to pull boxscore data and have been successful for the vast majority of games but games like
get_nfl_boxscore(game_id = "301223023")

do not work on rstudio.

Connection Error

Hi Tom,

I keep getting a connection error whenever I try get_college_qbr. Example below:

It happens for any combination of year or week I run. I tried updating espnscrapeR to see if that'd help but the same error keeps popping up.

Thanks,
Jerrick

2022 Pass Win Rates Update

Currently the 2022 season does not work with the latest version of the package, I added it here.

scrape_espn_win_rate <- function(season = 2022) {
  if (!(as.numeric(season) %in% c(2019:2022))) {
    stop("Data available for 2021-22")
  }
  pbwr_url <- "https://www.espn.com.au/nfl/story/_/id/34536376/2022-nfl-pass-rushing-run-stopping-blocking-leaderboard-win-rate-rankings"
  pbwr_2021 <- "https://www.espn.com/nfl/story/_/id/32176833/2021-nfl-pass-rushing-run-stopping-blocking-leaderboard-win-rate-rankings"
  pbwr_2020 <- "https://www.espn.com/nfl/story/_/id/29939464/2020-nfl-pass-rushing-run-stopping-blocking-leaderboard-win-rate-rankings"
  pbwr_2019 <- "https://www.espn.com/nfl/story/_/id/27584726/nfl-pass-blocking-pass-rushing-rankings-2019-pbwr-prwr-leaderboard#prwrteam"
  pbwr_2018 <- "https://www.espn.com/nfl/story/_/id/25074144/nfl-pass-blocking-pass-rushing-stats-final-leaderboard-pass-block-win-rate-pass-rush-win-rate"
  stats_in <- c(
    "Pass Rush Win Rate", "Run Stop Win Rate",
    "Pass Block Win Rate", "Run Block Win Rate"
  )
  stat_2019 <- c("Pass Rush Win Rate", "Pass Block Win Rate")
  raw_html <- rvest::read_html(case_when(
    season == 2019 ~ pbwr_2019,
    season == 2020 ~ pbwr_2020,
    season == 2021 ~ pbwr_2021,
    season == 2022 ~ pbwr_url
  ))
  date_updated <- raw_html %>%
    rvest::html_node("#article-feed > article:nth-child(1) > div > div.article-body > div.article-meta > span > span") %>%
    rvest::html_text()
  raw_text <- raw_html %>%
    rvest::html_nodes("#article-feed > article:nth-child(1) > div > div.article-body > p") %>%
    rvest::html_text()
  tibble::enframe(raw_text) %>%
    filter(str_detect(value, "1. ")) %>%
    mutate(name = if_else(season == 2019, list(stat_2019),
      list(stats_in)
    )[[1]]) %>%
    mutate(value = str_split(
      value,
      "\n"
    )) %>%
    unnest_longer(value) %>%
    separate(value, into = c(
      "rank",
      "team", "win_pct"
    ), sep = "\\. |, ") %>%
    mutate(
      rank = as.integer(rank),
      win_pct = str_remove(win_pct, "%"), win_pct = as.double(win_pct),
      date_updated = date_updated, season = season
    ) %>%
    rename(
      stat = name,
      stat_rank = rank
    )
}

`scrape_team_stats_nfl` gives error when `role="defense"`

When I try this

scrape_team_stats_nfl(season = 2022, stats = "passing", role = "defense")

I get this error

Error in `purrr::set_names()`:
! The size of `nm` (15) must be compatible with the size of `x` (11).

A similar error occurs when stats is changed to rushing, scoring, or downs. When stats = "receiving", there is a different error. In particular, this code

scrape_team_stats_nfl(season = season, stats = "receiving", role = "defense")

gives this error

Error in rvest::html_table(raw_html, fill = TRUE)[[1]] : 
  subscript out of bounds

The code works well for all 5 choices of stats when role="offense". The errors only happen when role="defense".

get_nfl_boxscore_players(game_id = "401220131")

Hi,

I'm trying to use the function get_nfl_boxscore_players with game_id = "401220131" but it's not working. This is the 9/13/2020 MIA vs. NE game. Other functions recognize this game_id (e.g. get_nfl_pbp(game_id="401220131") works.

Here's the error that I am getting:
player_game = get_nfl_boxscore_players(game_id = "401220131");
Error: Problem with mutate() input ..1.
x 'list' object cannot be coerced to type 'double'
i Input ..1 is across(c(pass_yds:punt_long), ~suppressWarnings(as.double(.x))).
Run rlang::last_error() to see where the error occurred.

get_athlete() Error: Not Found (HTTP 404)

get_athlete() function fails and returns the following: "Error: Not Found (HTTP 404)"

After browsing the source code the error is thrown after running this section of code:

  raw_get  <- base_url %>%
    glue::glue() %>%
    httr::GET()
  
  httr::stop_for_status(raw_get)

#Error: Not Found (HTTP 404).

It appears that the link for the API request is invalid, perhaps due to a change on ESPN's end. This function was working properly for me about two weeks ago, but has since stopped working.

Error in player name for players on multiple teams throughout the year

When running the scrape_espn_stats() function, I noticed that players who were on multiple teams have an error. For example, Kenyan Drake's name shows up as "Kenyan DrakeMIA/" with "ARI" showing up in the "team" field. Same thing for Josh Gordon's name showing up as "Josh GordonNE/" with "SEA" appearing in the "team" field.

ESPN Data inconsistent

Note: This Issue isn't a code problem! It is just for information to the users and to make the developer aware of it.

ESPN is writing on it's Total QBR website

To qualify, a player must play a minimum of 20 action plays

which always was my explanation when a player was missing in the data. But it gets very confusing now. I am doing this example for the 2018 playoffs and didn't check it for other years.

2018 Wildcard weekend had the following games (winners bold):

IND @ HOU
SEA @ DAL
LAC @ BAL
PHI @ CHI

Running

qbr_week <- get_nfl_qbr("2018", season_type = "Playoffs", week = 1) %>%
  select(short_name, team_short_name, qbr_total, qb_plays)

leads to 3 entries

But running

qbr_all <- get_nfl_qbr("2018", season_type = "Playoffs", week = NA)%>%
  select(short_name, team_short_name, qbr_total, qb_plays)

leads to this

In the total data there are not only more qbs from the wildcard weekend (Watson, Wilson, Trubisky), there is also another total qbr given for Lamar Jackson...
It is unclear which dataset to trust and the problem is that we can only combine qbs that lost because the overall dataset mixes the games of qbs who played more than one game.

List of Player ID's

Hi! Is there a way to get a list of player ID's from a season or multiple seasons? (i.e. all player IDs) I would like to use get_athlete() to get player info for all athletes from a season. Thank you!

Add 2023 win rates?

espnscrapeR/R/scrape_espn_win_rate.R

Line 19 in fb98b55

if(!(as.numeric(season) %in% c(2019:2022))) stop("Data available for 2019-22")

ESPN released 2023 win rates.

Would be awesome to have them added to espnscrapeR

Intragame win probability

Espn posts win probabilities that are updated live with each play/ clock tick during games. Have you looked at scraping this and/or is there a repo of anything interesting, eg time stamped probability data anywhere?

College QBR Data missing

Hi Tom,

I wanted to scrape some College QBR to analyse the 2021 prospects, so I was looking for QBR since 2017.
Unfortunately the QBR via get_college_qbr() data seems to be missing for all years before 2020 (except week 1). I attached my code where I tested with week 1 and 2.
I cross checked the ESPN website and it seems to be an issue with ESPN.

Any chance we can solve this problem?

Best regards,
Christian

get_qbr <- function(weeks, years, silent = F){
  qbr_raw <- NULL
  try(qbr_raw <- espnscrapeR::get_college_qbr(season=years, week = weeks))
  if (is.null(qbr_raw)) return(tibble::tibble())
  qbr_raw
}

all_qbr <- purrr::pmap_dfr(purrr::transpose(
  purrr::cross2(1:2,2017:2020)), get_qbr)
#> Scraping QBR for week 1 of 2017!
#> Scraping QBR for week 2 of 2017!
#> Error : Can't subset columns that don't exist.
#> x Column `firstName` doesn't exist.
#> Scraping QBR for week 1 of 2018!
#> Scraping QBR for week 2 of 2018!
#> Error : Can't subset columns that don't exist.
#> x Column `firstName` doesn't exist.
#> Scraping QBR for week 1 of 2019!
#> Scraping QBR for week 2 of 2019!
#> Error : Can't subset columns that don't exist.
#> x Column `firstName` doesn't exist.
#> Scraping QBR for week 1 of 2020!
#> Scraping QBR for week 2 of 2020!

jthomasmock / espnscraper Goto Github PK

espnscraper's Issues

Recommend Projects

Recommend Topics

Recommend Org