jthomasmock / espnscraper Goto Github PK
View Code? Open in Web Editor NEWScrapes Or Collects NFL Data From ESPN
Home Page: https://jthomasmock.github.io/espnscrapeR/
License: Other
Scrapes Or Collects NFL Data From ESPN
Home Page: https://jthomasmock.github.io/espnscrapeR/
License: Other
In this line of get_schedule_data
espnscrapeR/R/get_nfl_schedule.R
Line 46 in 754b6b6
you remove venue
. Can you keep that instead? Or is venue info pulled by any other functions in the package?
There are some team errors in the 2019 data. qbr in the below code looks like this:
In this example it's week 4 but the teams are wrong for the whole season. I don't know if those are all or if there is more.
qbr <- get_nfl_qbr("2019", season_type = "Regular", week = 4) %>%
group_by(team_short_name) %>%
filter(n()>1) %>%
select(short_name, team_short_name) %>%
arrange(team_short_name)
Any chance we could get a Defense stat type in the scrape_espn_stats()
function? This would be particularly helpful to me because ESPN is somewhat proprietary in how they define "Stuffs" compared to NFL standard Tackle for Loss, for example. I'm interested in correlations. Thanks!
Hello, This may be a bad question, but how exactly do I install this? I'm having trouble understanding how to install this program onto a machine (Debian VM).
Is someone able to in more detail explain how I install this? I tried to run the remote.... command in the terminal, and I get this output: syntax error near unexpected token `('
What am I doing wrong?
Thanks!
These lines have commas at the end with nothing after them, and it was causing an error in get_nfl_schedule
:
espnscrapeR/R/get_nfl_schedule.R
Line 72 in 025a743
espnscrapeR/R/get_nfl_schedule.R
Line 105 in 025a743
I updated purrr
, dplyr
, and tidyr
and it now works, but thought I'd mention it anyway, in case others don't realize a package update will take care of it.
I have tried to pull boxscore data and have been successful for the vast majority of games but games like
get_nfl_boxscore(game_id = "301223023")
do not work on rstudio.
Currently the 2022 season does not work with the latest version of the package, I added it here.
scrape_espn_win_rate <- function(season = 2022) {
if (!(as.numeric(season) %in% c(2019:2022))) {
stop("Data available for 2021-22")
}
pbwr_url <- "https://www.espn.com.au/nfl/story/_/id/34536376/2022-nfl-pass-rushing-run-stopping-blocking-leaderboard-win-rate-rankings"
pbwr_2021 <- "https://www.espn.com/nfl/story/_/id/32176833/2021-nfl-pass-rushing-run-stopping-blocking-leaderboard-win-rate-rankings"
pbwr_2020 <- "https://www.espn.com/nfl/story/_/id/29939464/2020-nfl-pass-rushing-run-stopping-blocking-leaderboard-win-rate-rankings"
pbwr_2019 <- "https://www.espn.com/nfl/story/_/id/27584726/nfl-pass-blocking-pass-rushing-rankings-2019-pbwr-prwr-leaderboard#prwrteam"
pbwr_2018 <- "https://www.espn.com/nfl/story/_/id/25074144/nfl-pass-blocking-pass-rushing-stats-final-leaderboard-pass-block-win-rate-pass-rush-win-rate"
stats_in <- c(
"Pass Rush Win Rate", "Run Stop Win Rate",
"Pass Block Win Rate", "Run Block Win Rate"
)
stat_2019 <- c("Pass Rush Win Rate", "Pass Block Win Rate")
raw_html <- rvest::read_html(case_when(
season == 2019 ~ pbwr_2019,
season == 2020 ~ pbwr_2020,
season == 2021 ~ pbwr_2021,
season == 2022 ~ pbwr_url
))
date_updated <- raw_html %>%
rvest::html_node("#article-feed > article:nth-child(1) > div > div.article-body > div.article-meta > span > span") %>%
rvest::html_text()
raw_text <- raw_html %>%
rvest::html_nodes("#article-feed > article:nth-child(1) > div > div.article-body > p") %>%
rvest::html_text()
tibble::enframe(raw_text) %>%
filter(str_detect(value, "1. ")) %>%
mutate(name = if_else(season == 2019, list(stat_2019),
list(stats_in)
)[[1]]) %>%
mutate(value = str_split(
value,
"\n"
)) %>%
unnest_longer(value) %>%
separate(value, into = c(
"rank",
"team", "win_pct"
), sep = "\\. |, ") %>%
mutate(
rank = as.integer(rank),
win_pct = str_remove(win_pct, "%"), win_pct = as.double(win_pct),
date_updated = date_updated, season = season
) %>%
rename(
stat = name,
stat_rank = rank
)
}
When I try this
scrape_team_stats_nfl(season = 2022, stats = "passing", role = "defense")
I get this error
Error in `purrr::set_names()`:
! The size of `nm` (15) must be compatible with the size of `x` (11).
A similar error occurs when stats
is changed to rushing
, scoring
, or downs
. When stats = "receiving"
, there is a different error. In particular, this code
scrape_team_stats_nfl(season = season, stats = "receiving", role = "defense")
gives this error
Error in rvest::html_table(raw_html, fill = TRUE)[[1]] :
subscript out of bounds
The code works well for all 5 choices of stats
when role="offense"
. The errors only happen when role="defense"
.
Hi,
I'm trying to use the function get_nfl_boxscore_players with game_id = "401220131" but it's not working. This is the 9/13/2020 MIA vs. NE game. Other functions recognize this game_id (e.g. get_nfl_pbp(game_id="401220131") works.
Here's the error that I am getting:
player_game = get_nfl_boxscore_players(game_id = "401220131");
Error: Problem with mutate()
input ..1
.
x 'list' object cannot be coerced to type 'double'
i Input ..1
is across(c(pass_yds:punt_long), ~suppressWarnings(as.double(.x)))
.
Run rlang::last_error()
to see where the error occurred.
get_athlete() function fails and returns the following: "Error: Not Found (HTTP 404)"
After browsing the source code the error is thrown after running this section of code:
raw_get <- base_url %>%
glue::glue() %>%
httr::GET()
httr::stop_for_status(raw_get)
#Error: Not Found (HTTP 404).
It appears that the link for the API request is invalid, perhaps due to a change on ESPN's end. This function was working properly for me about two weeks ago, but has since stopped working.
When running the scrape_espn_stats() function, I noticed that players who were on multiple teams have an error. For example, Kenyan Drake's name shows up as "Kenyan DrakeMIA/" with "ARI" showing up in the "team" field. Same thing for Josh Gordon's name showing up as "Josh GordonNE/" with "SEA" appearing in the "team" field.
Note: This Issue isn't a code problem! It is just for information to the users and to make the developer aware of it.
ESPN is writing on it's Total QBR website
To qualify, a player must play a minimum of 20 action plays
which always was my explanation when a player was missing in the data. But it gets very confusing now. I am doing this example for the 2018 playoffs and didn't check it for other years.
2018 Wildcard weekend had the following games (winners bold):
Running
qbr_week <- get_nfl_qbr("2018", season_type = "Playoffs", week = 1) %>%
select(short_name, team_short_name, qbr_total, qb_plays)
But running
qbr_all <- get_nfl_qbr("2018", season_type = "Playoffs", week = NA)%>%
select(short_name, team_short_name, qbr_total, qb_plays)
In the total data there are not only more qbs from the wildcard weekend (Watson, Wilson, Trubisky), there is also another total qbr given for Lamar Jackson...
It is unclear which dataset to trust and the problem is that we can only combine qbs that lost because the overall dataset mixes the games of qbs who played more than one game.
Hi! Is there a way to get a list of player ID's from a season or multiple seasons? (i.e. all player IDs) I would like to use get_athlete() to get player info for all athletes from a season. Thank you!
espnscrapeR/R/scrape_espn_win_rate.R
Line 19 in fb98b55
ESPN released 2023 win rates.
Would be awesome to have them added to espnscrapeR
Espn posts win probabilities that are updated live with each play/ clock tick during games. Have you looked at scraping this and/or is there a repo of anything interesting, eg time stamped probability data anywhere?
Hi Tom,
I wanted to scrape some College QBR to analyse the 2021 prospects, so I was looking for QBR since 2017.
Unfortunately the QBR via get_college_qbr()
data seems to be missing for all years before 2020 (except week 1). I attached my code where I tested with week 1 and 2.
I cross checked the ESPN website and it seems to be an issue with ESPN.
Any chance we can solve this problem?
Best regards,
Christian
get_qbr <- function(weeks, years, silent = F){
qbr_raw <- NULL
try(qbr_raw <- espnscrapeR::get_college_qbr(season=years, week = weeks))
if (is.null(qbr_raw)) return(tibble::tibble())
qbr_raw
}
all_qbr <- purrr::pmap_dfr(purrr::transpose(
purrr::cross2(1:2,2017:2020)), get_qbr)
#> Scraping QBR for week 1 of 2017!
#> Scraping QBR for week 2 of 2017!
#> Error : Can't subset columns that don't exist.
#> x Column `firstName` doesn't exist.
#> Scraping QBR for week 1 of 2018!
#> Scraping QBR for week 2 of 2018!
#> Error : Can't subset columns that don't exist.
#> x Column `firstName` doesn't exist.
#> Scraping QBR for week 1 of 2019!
#> Scraping QBR for week 2 of 2019!
#> Error : Can't subset columns that don't exist.
#> x Column `firstName` doesn't exist.
#> Scraping QBR for week 1 of 2020!
#> Scraping QBR for week 2 of 2020!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.