jalapic / engsoccerdata Goto Github PK

View Code? Open in Web Editor NEW

749.0 63.0 191.0 72.86 MB

English and European soccer results 1871-2022

R 100.00%

soccer rstats r data-science data-visualization sports-stats sports sport

engsoccerdata's People

Contributors

Stargazers

Watchers

Forkers

jokecamp erdisusanto sganapati ramiksadana-zz kjottmar mcianni livedo dmrust jakeruss mattagee evril7490 jiunsiew neale-eldash jbtaylor1202 suensummit pgomez1dpu atharaboelazm m-schondelmaier fw1121 nastako wintercmin iainharlow nikolayvoronchikhin kwonlonso dh1875 aashanand erichilarysmithsr kghub timothymartin76 rnyameke adgerrity shlmk stefaand geogaffer abheek9 lbrandt prabhjotsl rfleury2 bakert sandy4321 cbuckleydel lilinyonekura janeshdev amunnelly fernandomanuellopesrodrigues marshadkhan nhu2000 harisk87 zatopek8848 fabinhojorge athiwatp arouna jivt oizma hmelberg rice17 apburton84 desared miguelvm aqsmith08 imclab applied-statistic-using-r jrogol skmezanul jprescott13 kelemeto76 brandonliang benwol uberstig yeliu0218 maniiyer7 mfigueredo tresoldi zhaoxiaohe khaled131 fandoc astorgibus10 fuithof englianhu sibusiso16 jiahaobo maskegger stevenxxu abishek-murali mirroam quetzaltzin mikeybronowski georgebisbas franksalas mrlittle69 mvallonesta yangmuzhi tommykaneo dkrueger26 scully-co weissi4u evrenmercan jazzmoe andreashen beniao

engsoccerdata's Issues

Would love to see the scraping code

code available on request: Consider this an official request! Is there a reason it couldn't be included in this repo?

Love the work; amazingly thorough! I had a somewhat similar project which scraped several leagues through soccernet before they changed their format, and I haven't updated it since. I used that data to make a start at a little club ratings thing.

Anyway! Would love to see your scraping code.

Create a team names dataframe to standardize variations

Hi @jalapic ,

I wanted to start an issue to track the team names dataframe work since it has come up a couple times. For example, you have it listed in the README:

Team Names. Consistency in team names is very hard. A dataframe showing the various variations of team names for each team would be great (this is a particular problem in the French league).

and it's also been mentioned in your discussion with @JoGall in issue #16.

Before any work is done, it'd be great to clarify what your ideal outcome is. For example, are you looking for a dataframe like this --

team_name	standardized_team_name	most_recent
st. louis rams	los angeles rams	false
los angeles ram	los angeles rams	true
oakland raiders	oakland raiders	true

Yes, I apologize for using NFL teams here but it was the quickest example I could think of.

By having these three columns, folks will only have to check a subset of team names each year (e.g. only where most_recent == true) and can create a new row if something has changed.

What do you think?

2021 england_current()

need to update :(

england_current <- function(){
  
  
  home<-visitor<-hgoal<-vgoal<-goaldif<-FT<-Season<-division<-result<-NULL
  
  url1 <- "https://www.11v11.com/competitions/premier-league/2022/matches/"
  url2 <- "https://www.11v11.com/competitions/league-championship/2022/matches/"
  url3 <- "https://www.11v11.com/competitions/league-one/2022/matches/"
  url4 <- "https://www.11v11.com/competitions/league-two/2022/matches/"
  
  x1 <- xml2::read_html(url1) %>% rvest::html_table(fill = TRUE)
  x2 <- xml2::read_html(url2) %>% rvest::html_table(fill = TRUE)
  x3 <- xml2::read_html(url3) %>% rvest::html_table(fill = TRUE)
  x4 <- xml2::read_html(url4) %>% rvest::html_table(fill = TRUE)
  
  make_data <- function(x){
    x <- x[[1]][,1:4]
    x <- as.data.frame(x)
    x <-x[grepl("([0-9]+).*$", x[,1]),]#get rid of months text
    x <-x[grepl("([0-9]+).*$", x[,3]),]#get rid of blank scores
    colnames(x)<-c("Date","home","FT","visitor")
    x$Date <- as.character(as.Date(x$Date, format="%d %b %Y"))
    x$Season <- 2021
    x$FT <- gsub(":", "-", x$FT)
    x <- x[nchar(x$FT)>1,]
    hgvg <- matrix(unlist(strsplit(x$FT, "-")), ncol=2, byrow = T)
    x$hgoal <- as.numeric(hgvg[,1])
    x$vgoal <- as.numeric(hgvg[,2])
    x$totgoal <- x$hgoal+x$vgoal
    x$goaldif <- x$hgoal-x$vgoal
    x$result <- ifelse(x$hgoal>x$vgoal, "H", ifelse(x$hgoal<x$vgoal, "A", "D"))
    return(x)
  }
  
  x1d <- make_data(x1)
  x2d <- make_data(x2)
  x3d <- make_data(x3)
  x4d <- make_data(x4)
  
  x1d$division <- 1
  x1d$tier <- 1
  x2d$division <- 2
  x2d$tier <- 2
  x3d$division <- 3
  x3d$tier <- 3
  x4d$division <- 4
  x4d$tier <- 4
  
  xd <- rbind(x1d,x2d,x3d,x4d)
  xd <- xd[colnames(engsoccerdata::england)]
  
  xd %>%
    dplyr::mutate(home = dplyr::case_when(
      grepl("Brighton and Hove", home) ~ "Brighton & Hove Albion",
      grepl("Cheltenham Town", home) ~ "Cheltenham",
      grepl("Stevenage", home) ~ "Stevenage Borough",
      grepl("Harrogate Town", home) ~ "Harrogate Town A.F.C.",
      grepl("Macclesfield Town", home) ~ "Macclesfield",
      grepl("Yeovil", home) ~ "Yeovil",
      TRUE ~ home
    )) %>%
    dplyr::mutate(visitor = dplyr::case_when(
      grepl("Brighton and Hove", visitor) ~ "Brighton & Hove Albion",
      grepl("Cheltenham Town", visitor) ~ "Cheltenham",
      grepl("Stevenage", visitor) ~ "Stevenage Borough",
      grepl("Macclesfield Town", visitor) ~ "Macclesfield",
      grepl("Harrogate Town", visitor) ~ "Harrogate Town A.F.C.",
      grepl("Yeovil", visitor) ~ "Yeovil",
      TRUE ~ visitor
    )) -> xd
  
  return(xd)
  
}

results by team function

Add function to return df of all results by team- home & away:

Example:

library(tidyverse)
df <- rbind(england,england_current())

homeaway <- function(df){
  rbind(df %>% select(Date,Season,team=home,opp=visitor,gf=hgoal,ga=vgoal,division,tier) %>% mutate(venue="home"),
        df %>% select(Date,Season,team=visitor,opp=home,gf=vgoal,ga=hgoal,division,tier) %>% mutate(venue="away")
  ) %>% arrange(team,Date)
}

homeaway(df)

england_current() - Lincoln/Forest Green

Lincoln City and Forest Green Rovers appear as NA in the output of england_current(). This is due to something going awry with the match to teamnames. Cannot figure out why it's not finding the correct names of the team as both appear inteamnames.

League position

Hi, I would be interesting in adding league positions and team form into an analysis. It would be useful if I could get info on each time , with regards to position in league. shots/shots on target ect. for each of the teams in a specific match. It probably already been done, but I want to weight a match prediction based on the recent league position and how the team has performed against other teams close or further away from them in the league. Getting the league table data and all other stats for a team with a time stamp or period would be useful. is there anyway I can get week by week league table data?

thanks

Garry

Standardize team names in champions league dataset to league names

Teamnames for leagues portugal, belgium, scotland, greece, turkey are not necessarily consistent with teamnames in champions league dataset - this needs to be fixed.

Similarly, check facup teamnames dataset and check consistency of non-league teams with england5

Points deductions

The option to apply point deductions to relevant seasons in maketable would be useful so I grabbed a hopefully comprehensive and accurate list of historical points deductions in the top 4 leagues from this website and sent you a pull request for a .csv here (although I know you might prefer this into a .rda to bundle with the package).

games_between_sum

give option of returning home results, away results or all results combined. Currently not obvious that it's only returning home results.

Ensure Date in all dateframes is a Date variable

Each dataframe's "Date" variable is inconsistent - sometimes character, factor, Date. It needs to be a Date in all - otherwise it's annoying and can affect other functions adversely.

Champions League Dataset typos

Aberdeen is mentioned as Abderdeen in some lines, and the country is NA instead of SCO:
champs[is.na(champs$hcountry) | is.na(champs$vcountry),
c("home","hcountry", "visitor", "vcountry")]
ASK Vorwarts Berlin is mentioned also as Vorwarts Berlin from mid '66 season onwards:
champs[champs$home == 'Vorwarts Berlin' | champs$visitor == 'Vorwarts Berlin' |
champs$home == 'ASK Vorwarts Berlin' | champs$visitor == 'ASK Vorwarts Berlin',
c("Season", "home", "visitor")]

add second tier data for major European leagues

Have German data - need to add for France, Holland, Italy, Spain - even if not all years just yet.

england 2019/2020 double entries ?

Current Season Data Function for each league

england_current() gets data for the top 4 tiers of England for each season. We need a function to get current season data in the same format as each domestic league's dataframe for the following:

League	Tiers
Spain	1
Germany	1,2
Holland	1
France	1
Italy	1
Turkey	1
Greece	1
Belgium	1
Portugal	1
Scotland	1,2,3,4

I am happy for england to have it's own separate function as most people use the package for that dataset and it is the most complete. However, it's probably not good to have lots of x_current() functions from a package management perspective. Therefore I'd like one function with an argument of country= to get the data.

Importantly, as this function will rely on external sources of data which might change (and often do), we will need an error catching message to return an error - this will then require the function being changed/updated - which I can foresee happening fairly regularly unfortunately.

England_current() returns same column names but different column types

Fantastic package!
england_current() returns the same column names as in data(england), but the column types are different. For example, division is numeric whereas it is a character in data(england).

This causes an error when trying to bind these rows with dplyr::bind_rows().

Would work much better if these were all the same. Either all characters or numeric.

league cup

Used to be two leaguecup pens columns I think?

#' \item{pen}{Did game go to penalties?}
#' \item{pens}{Result of penalties}

One seems to be lost - currently only have pens and column appears empty

Make 'Season' and 'tier' arguments in `maketable()` function optional

Thought I'd make this a separate issue rather than clogging up the other threads!

Would the maketable() function still work as desired if you were to make the 'Season' and 'tier' arguments optional? At the moment, leaving them as NULL doesn't allow anything to be returned even if data are already subsetted (e.g. if 'df' contains data for only one season and one tier). Also, this would give the option of making tables for results across multiple seasons; for example, if I wanted to see the league table since Klopp took over as manager of Liverpool.

I've made this change to the maketable() function (not sure it's the most elegant solution):

maketable <- function (df = NULL, Season = NULL, tier = NULL, pts = 3) 
{
    GA <- GF <- ga <- gf <- gd <- GD <- D <- L <- W <- Pts <- . <- Date <- home <- team <- visitor <- hgoal <- opp <- vgoal <- goaldif <- FT <- division <- result <- maxgoal <- mingoal <- absgoaldif <- NULL
   
	if(!is.null(Season) & is.null(tier)) {
		dfx <- df[(df$Season == Season), ]
	} else if(is.null(Season) & !is.null(tier)) {
		dfx <- df[(df$tier == tier), ]
	} else if(!is.null(Season) & !is.null(tier)) {
		dfx <- df[(df$Season == Season & df$tier == tier), ]
	} else {
		dfx <- df
	}

	temp <- rbind(dfx %>% dplyr::select(team = home, opp = visitor, 
        GF = hgoal, GA = vgoal), dfx %>% dplyr::select(team = visitor, 
        opp = home, GF = vgoal, GA = hgoal)) %>% dplyr::mutate(GD = GF - 
        GA) %>% dplyr::group_by(team) %>% dplyr::summarise(GP = sum(GD <= 
        100), W = sum(GD > 0), D = sum(GD == 0), L = sum(GD < 
        0), gf = sum(GF), ga = sum(GA), gd = sum(GD)) %>% dplyr::mutate(Pts = (W * 
        pts) + D) %>% dplyr::arrange(-Pts, -gd, -gf) %>% dplyr::mutate(Pos = rownames(.)) %>% 
        as.data.frame()

    return(temp)
}

This seems to work as desired for the above example (using my currentSeasonEng() function):

> EPL <- rbind(subset(england, Season %in% 1992:2015 & tier == 1), subset(currentSeasonEng(), tier==1) )

> klopp <- subset(england, as.Date(Date) >= "2015-10-08")

> maketable(klopp, tier = 1)

                   team GP  W  D  L  gf ga  gd Pts Pos
1               Arsenal 50 27 15  8  96 51  45  96   1
2             Liverpool 50 26 14 10 103 63  40  92   2
3               Chelsea 50 26 13 11  89 51  38  91   3
4           Southampton 50 21 12 17  65 56   9  75   4
5        Leicester City 30 19  9  2  51 21  30  66   5
...

And is there a reason for the 'Date' variable being of the class factor as opposed to date? I know we can use as.Date(df$Date) but it might make it easier if date was the default class?

Champions League Final 2016

Missing final in 'champs'. It is in data-raw csv but needs to be added to RData file.

Limited output

Just discovered your excellent engsoccerdata, but i seem to have some trouble with maketable.R

When following your example;

library(engsoccerdata)
library(dplyr)
df <- engsoccerdata2 %>% filter(tier==1 & Season==2013)
maketable(df)

i get correct output

Source: local data frame [20 x 9]

               team    GP     W     D     L    gf    ga    gd   Pts
              (chr) (int) (int) (int) (int) (int) (int) (int) (dbl)

1 Manchester City 38 27 5 6 102 37 65 86
2 Liverpool 38 26 6 6 101 50 51 84
3 Chelsea 38 25 7 6 71 27 44 82
4 Arsenal 38 24 7 7 68 41 27 79
5 Everton 38 21 9 8 61 39 22 72
6 Tottenham Hotspur 38 21 6 11 55 51 4 69
7 Manchester United 38 19 7 12 64 43 21 64
8 Southampton 38 15 11 12 54 46 8 56
9 Stoke City 38 13 11 14 45 52 -7 50
10 Newcastle United 38 15 4 19 43 59 -16 49
11 Crystal Palace 38 13 6 19 33 48 -15 45
12 Swansea City 38 11 9 18 54 54 0 42
13 West Ham United 38 11 7 20 40 51 -11 40
14 Sunderland 38 10 8 20 41 60 -19 38
15 Aston Villa 38 10 8 20 39 61 -22 38
16 Hull City 38 10 7 21 38 53 -15 37
17 West Bromwich Albion 38 7 15 16 43 59 -16 36
18 Norwich City 38 8 9 21 28 62 -34 33
19 Fulham 38 9 5 24 40 85 -45 32
20 Cardiff City 38 7 9 22 32 74 -42 30

But doing it for the second tier and the same season, i only shows the first 10 rows;

df <- engsoccerdata2 %>% filter(tier==2 & Season==2013)
maketable(df)

Source: local data frame [24 x 9]

                 team    GP     W     D     L    gf    ga    gd   Pts
                (chr) (int) (int) (int) (int) (int) (int) (int) (dbl)

1 Leicester City 46 31 9 6 83 43 40 102
2 Burnley 46 26 15 5 72 37 35 93
3 Derby County 46 25 10 11 84 52 32 85
4 Queens Park Rangers 46 23 11 12 60 44 16 80
5 Wigan Athletic 46 21 10 15 61 48 13 73
6 Brighton & Hove Albion 46 19 15 12 55 40 15 72
7 Reading 46 19 14 13 70 56 14 71
8 Blackburn Rovers 46 18 16 12 70 62 8 70
9 Ipswich Town 46 18 14 14 60 54 6 68
10 AFC Bournemouth 46 18 12 16 67 66 1 66
.. ... ... ... ... ... ... ... ... ...

So the question is, do you have an idea why it doesn't show the rest of the table ?

Add domestic cup data

The domestic cups could be added, especially for the bigger leagues.

The records should be relatively extensive, in most cases.

Helper function to manually add latest results to dataframe

Just posting a helper function for manually adding new results to the results dataframe in case you want to include it in the package. I often find myself wanting to analyse the latest results before the football-data.co.uk CSVs have been updated so this comes in handy. It uses the matchTeamnames() function described here.

addResult <- function(home, visitor, hgoal, vgoal, Date = Sys.Date(), division = 1, tier = 1, Season = 2017) {
  
  home <- matchTeamnames(home)
  visitor <- matchTeamnames(visitor)
  
  FT <- paste0(hgoal, "-", vgoal)
  totgoal <- hgoal + vgoal
  goaldif <- hgoal - vgoal
  result <- ifelse(hgoal > vgoal, "H", ifelse(hgoal < vgoal, "A", "D"))
  
  df <- data.frame(Date, Season, home, visitor, FT, hgoal, vgoal, division, tier, totgoal, goaldif, result)
  
  rbind(england, df) %>% 
    arrange(Date)
}

Example usage:

addResult("Man City", "Stoke", 7, 2, Sys.Date() - 1) %>% 
  tail(2)

#             Date Season             home    visitor  FT hgoal vgoal division tier totgoal goaldif result
#194110 2017-10-01   2017 Newcastle United  Liverpool 1-1     1     1        1    1       2       0      D
#194111 2017-10-14   2017  Manchester City Stoke City 7-2     7     2        1    1       9       5      H

Function to match team names from other datasets with those in 'teamnames' dataframe

Just a possible improvement that I've used in recent blog posts; a function which matches team names from another dataset onto engsoccerdata team names.

For each team name in a vector, it finds the highest similarity string in name_other variable of teamnames dataframe (using the levenshteinSim function from the RecordLinkage package).

Works well for me so far but untested with non-England teams.

#---------------------------------------------------------------------------
# matchTeamnames()
#---------------------------------------------------------------------------
# Matches a vector of team names with names used by 'teamnames' dataframe in
# engsoccerdata package
#---------------------------------------------------------------------------
# * Inputs a vector of team names outputs the original 
#   dataframe with new teamname in column 'team' and old teamname in column 
#   'team_old'
# * 'min_dist' specifies lowest similarity threshold for a match; if all
#   possible matches for a team are below this value, returns 'NA'
# * Returns a vector by default; if checkResults' is TRUE, returns a 
#   dataframe of old names and best matches for purposes of validation
#---------------------------------------------------------------------------
matchTeamnames <- function(teams, min_dist = 0.1, checkResults = FALSE) {
  require(engsoccerdata)
  require(RecordLinkage)
  require(dplyr)
  
  teams <- as.character(teams)
  
  old_new_df <- lapply(unique(teams), function(x) {
    distance <- levenshteinSim(as.character(x), as.character(teamnames$name_other))
    # threshold on distance
    new_name <- ifelse(max(distance, na.rm=T) >= min_dist, as.character(teamnames[which.max(distance),]$name), "NA")
    
    old_new_df <- data.frame(old_name = x, new_name, distance = max(distance, na.rm=T), stringsAsFactors = FALSE)
  }) %>%
    plyr::rbind.fill()
  
  if(checkResults) {
    return(old_new_df)
  } else {
    teams <- old_new_df$new_name[match(teams, old_new_df$old_name)]
    return(teams)
  }
}

Points deductions - adjust `maketable` functions

Allow optional parameter in maketable functions to allow points deductions to be accounted for in league tables.

MLS data checks

1.Need to resolve if call pre 2005 team "San Jose Earthquakes" or "Houston Dynamo" - which franchise? I'm going with San Jose Earthquakes and treating team established in 2008 as continuation of team.

2.Also, check MLS playoffs especially in 2000 and 2001 - are sudden-death scores accurate ?

For 1996-1999Seasons - they had shootout wins/losses for tied games in regular season. There are no ties in the data - need to work out which games went to shootouts and adjust hgoal/vgoal and hpen/vpen accordingly. Put "SOW" and "SOL" in hpen/vpen.

update `most_recent` variable in teamnames df

The most_recent variable in the teamnames df needs to be updated to indicate which version of each teamname is being used in the current season.

France raw data is the same as Germany

https://github.com/jalapic/engsoccerdata/blob/master/data-raw/france.csv is identical to https://github.com/jalapic/engsoccerdata/blob/master/data-raw/germany.csv

Points Deductions - other countries

Currently have points deductions for English leagues. Need to add for other countries.

safrica deductions

Hi James. I trust you are well. Here are the deductions for the safrica data...

https://gist.github.com/sibusiso16/113e3b773fdc6952708bba944381653e

homeaway() - with non-standard dataframes

homeaway() does not reliably work with non-standard dataframes such as the spain dataset. Needs to be updated.

updates for 2017/2018 season

-add 2016/17 data

update current() functions
add Spurs to teamnames.csv

Add South Africa League and other competitions

Hi James. I trust you are well. I would like to help and add the South Africa league to your package. I would like to know what’s the process of doing this.

maketable_eng

Need to modify tiebreaker for 1888/89 and change default for 3North/3South

Initial engsoccerdata install problem

I had an error when trying to install the engsoccerdata package on Rstudio 3.1

The command "install_github('engsoccerdata', username = "jalapic")" threw the error:
Warning message:
Username parameter is deprecated. Please use jalapic/engsoccerdata

Using this instead worked for me: install_github('jalapic/engsoccerdata', username = "jalapic")

I've added it in below. I'm new to R and github so I'm not sure what happened. Also, thanks for the awesome data package! Can't wait to poke around with it.

'
library(devtools)
install_github('jalapic/engsoccerdata', username = "jalapic")
library(engsoccerdata)

data(package="engsoccerdata") # lists datasets currently available
df = engsoccerdata2 # this is the main dataset.

alltimerecord(df, "Aston Villa") #see notes below about what fixtures are included in the data
n_offs(df, 1, 1) #return results to have only occured once in top-tier
,

Updated england_current() function

Just a few changes to this function as NAs were being returned. Not long left for this season now but function can easily be updated next season (i.e. update .csv links and change 'Season' to 2017).

'Date' as date class instead of character; 'division' and 'tier' changed to extract numeric from string and prevent NAs being returned (e.g. E0 -> 1); call to teamnames dataframe to replace team name variants with main name used in england dataframe (e.g. "Man City" -> "Manchester City").

england_current <- function(){
  
  #*update each season*
  df1 <- rbind(read.csv("http://www.football-data.co.uk/mmz4281/1617/E0.csv"),
              read.csv("http://www.football-data.co.uk/mmz4281/1617/E1.csv"),
              read.csv("http://www.football-data.co.uk/mmz4281/1617/E2.csv"),
              read.csv("http://www.football-data.co.uk/mmz4281/1617/E3.csv")
  ) 
  
  df2 <- data.frame("Date" = as.Date(df1$Date, "%d/%m/%y"),
               "Season" = rep(2016, nrow(df1)), #*update each season*
               "home" = df1$HomeTeam,
               "visitor" = df1$AwayTeam,
               "FT" = paste0(df1$FTHG, "-", df1$FTAG),
               "hgoal" = df1$FTHG,
               "vgoal" = df1$FTAG,
               "division" = as.numeric(sapply(strsplit(df1$Div, ""), "[[", 2)) + 1, #convert division names to numeric (e.g. "E0" ->"1")
               "tier" = as.numeric(sapply(strsplit(df1$Div, ""), "[[", 2)) + 1,
               "totgoal" = df1$FTHG + df1$FTAG,
               "goaldif" = df1$FTHG - df1$FTAG,
               "result" = df1$FTR
    )
  
    #replace any new team name variants with pre-existing names (e.g. "Man City" -> "Manchester City")
    df2$home <- teamnames$name[match(df2$home, teamnames$name_other)]
    df2$visitor <- teamnames$name[match(df2$visitor, teamnames$name_other)]

return(df2)
}

Current year's data

I seem to recall you recommending a source for current season info - at least for the English League
Can you remind me what it is. Of course, if you have done some scraping code, it would be a useful
addition to the functions. If not, I will have a go and let you look at it

Game-by-game Shiny App "greyed out"

One of the example apps you provided appears to have been terminated (could be because it exceeded its RAM allocation).

https://jalapic.shinyapps.io/gamebygame/

div 3N and div 3S are messed up in england

tried to fix the division column in england but caused more issues - need to fix asap

Add Brazilian league Data

to add - Brazil league data

Problem with dates for EPL 2014 season

I'm seeing a problem with the formatting of dates for the 2014-15 EPL season. The majority of the entries in the table engsoccerdata2 have dates in the format "YYYY-MM-DD" which is easy to convert to class Date in R. However, the 2036 records for which Season=="2014" are entered as 5 digit numbers (days since 01-01-1990), which tends to happen in Excel.

data("engsoccerdata2")
dim(engsoccerdata2)[1]
[1] 190096

sum(grepl("-",engsoccerdata2$Date))
[1] 188060

sum(engsoccerdata2$Season=="2014")
[1] 2036

sum(engsoccerdata2$Season=="2014"&grepl("-",engsoccerdata2$Date))
[1] 0

head(engsoccerdata2$Date)
[1] "1888-12-15" "1889-01-19" "1889-03-23" "1888-12-01" "1888-10-13" "1888-12-29"

tail(engsoccerdata2$Date)
[1] "16557" "16557" "16557" "16557" "16557" "16557"

Discovered this issue when I was calling as.Date(engsoccerdata2$Date) and it was generating NAs.

R version 3.50 required?

Hi,

I'm getting an error message when I try and install the package : ERROR: this R is version 3.6.2, package 'engsoccerdata' requires R >= 3.50
I assume the requirements should be >=3.5.0

Add Conference (5th tier England) data

5th tier has been a national league since 1979/80 season - add data.

Note: check maketable_eng function for different tie-breaking and points procedures up to 1981/82 season.

Where does the data come from?

Hi!

Where does the data for the shot positions come from? Is it a reliable source that can be cited in a publication? Thanks!

Add playoff fixtures from other competitions

Need to add playoff fixtures from other competitions e.g. scotland

Add Major League Soccer (MLS) data to engsoccerdata

Consider using openfootball/major-league-soccer or another open source option.

Champs_Update Needed

All of the other datasets were updated with the exception of champs. Can you please post some code in champs.r that pulls in the most recent results and appends it to the older data?

England Tier 1 2016/2017 & 2017/2018 data missing?

Code I'm running:

print(england %>% 
  filter(Season %in% c(2016):c(2017),
         tier %in% c(1)) %>% 
  mutate(Date = as.Date(Date)))

Result:

# A tibble: 0 x 12
# ... with 12 variables: Date <date>, Season <dbl>, home <chr>, visitor <chr>,
#   FT <chr>, hgoal <dbl>, vgoal <dbl>, division <dbl>, tier <dbl>,
#   totgoal <dbl>, goaldif <dbl>, result <chr>

Edit: Problem seems to appear in tier 2 as well, predictably in all tiers of England?

Champion's League

Could you add this to the dataset requests? I'm guessing for many people this is a competition that they follow pretty closely

england_current() issue

it shows an issue as the following :(

Error: ! Assigned data as.numeric(hgvg[, 1])must be compatible with existing data. ✖ Existing data has 559 rows. ✖ Assigned data has 560 rows. ℹ Only vectors of size 1 are recycled. Runrlang::last_error() to see where the error occurred. Warning messages: 1: In matrix(unlist(strsplit(x$FT, "-")), ncol = 2, byrow = T) : data length [1119] is not a sub-multiple or multiple of the number of rows [560] 2: In make_data(x2) : NAs introduced by coercion

I tried with R v4.1 and 4.2 but the same error

MUFC-Bournemouth game 2016

The match Manchester United vs AFC Bournemouth on 17/05/2016 is listed incorrectly in the data as taking place on 16/05/2016.

While the original date was indeed supposed to be 16/05/2016, the match was delayed by a single day due to a bomb threat at Old Trafford (https://www.skysports.com/football/news/11661/10283952/manchester-united-v-bournemouth-rescheduled-for-tuesday-may-17)

Results from current season in `engsocerdata`

Do you plan to add a function to obtain up-to-date results for the current season at any point?

It's something I regularly need so made this script to fetch results from the current season from http://football-data.co.uk and change the formatting to use with engsoccerdata. The website has freely-available CSVs of historical results that are updated twice weekly (and also results from several other European leagues if you ever want to expand the data included with the package).

Feel free to use / adapt this if you can find a way to implement it in the package.

Thanks for all your work!

URLs <- c("http://www.football-data.co.uk/mmz4281/1617/E0.csv",
	"http://www.football-data.co.uk/mmz4281/1617/E1.csv",
	"http://www.football-data.co.uk/mmz4281/1617/E2.csv",
	"http://www.football-data.co.uk/mmz4281/1617/E3.csv")

convertFromFD <- function(df = NULL, tier = NULL) {
	data.frame("Date" = as.factor(as.Date(df$Date, "%d/%m/%y")),
		"Season" = 2016,
		"home" = df$HomeTeam,
		"visitor" = df$AwayTeam,
		"FT" = paste0(d$FTHG, "-", df$FTAG),
		"hgoal" = df$FTHG,
		"vgoal" = df$FTAG,
		"division" = tier,
		"tier" = tier,
		"totgoal" = df$FTHG + df$FTAG,
		"goaldif" = df$FTHG - df$FTAG,
		"result" = ifelse(df$FTHG > df$FTAG, "H",
			ifelse(df$FTHG < df$FTAG, "A", "D"))
	)
}

currentSeasonEng <- function() {
	rbind(convertFromFD(df = read.csv(URLs[1]), tier = 1), 
		convertFromFD(df = read.csv(URLs[2]), tier = 2),
		convertFromFD(df = read.csv(URLs[3]), tier = 3),
		convertFromFD(df = read.csv(URLs[4]), tier = 4)
	)
}

## Example: All EPL data including current season
#require(engsoccerdata)
# EPL <- rbind(subset(england, Season %in% 1992:2015 & tier == 1),
# 	subset(currentSeasonEng(), tier==1)
# )