Git Product home page Git Product logo

Comments (9)

jalapic avatar jalapic commented on May 25, 2024

I will try and implement this, it's a good idea and something that I frequently have to do. I have to make sure that the data are ok to take and include in a package. Also, the big issue is team names - my package may use slightly different versions of names for two or three teams. I have a dataframe of all possible version of teamnames for each team e.g. "Man Utd", "Manchester Utd", "Manchester United", "Man United", "Newton Heath", "Newton H", etc. That should allow us to ensure that the most recent season data can be used with historical data by team.

from engsoccerdata.

JoGall avatar JoGall commented on May 25, 2024

Football-Data.co.uk claim on their website that data are "FREE" and that "You are free experiment with the data yourselves", but might be worth checking with them just in case. Hadn't thought about teamname variations so a separate dataframe is a good idea. Good luck with the implementation!

from engsoccerdata.

jalapic avatar jalapic commented on May 25, 2024

I altered the function a bit - there was a typo and I've stripped it down slightly. I think the best thing is to have one function that brings all the data in from England and then puts it into engsoccerdata format. This is what I have:

england_current <- function(){
  df <- rbind(read.csv("http://www.football-data.co.uk/mmz4281/1617/E0.csv"),
              read.csv("http://www.football-data.co.uk/mmz4281/1617/E1.csv"),
              read.csv("http://www.football-data.co.uk/mmz4281/1617/E2.csv"),
              read.csv("http://www.football-data.co.uk/mmz4281/1617/E3.csv")
  ) 
  return( 
    data.frame("Date" = as.factor(as.Date(df$Date, "%d/%m/%y")),
               "Season" = 2016,
               "home" = df$HomeTeam,
               "visitor" = df$AwayTeam,
               "FT" = paste0(df$FTHG, "-", df$FTAG),
               "hgoal" = df$FTHG,
               "vgoal" = df$FTAG,
               "division" = as.numeric(df$Div),
               "tier" = as.numeric(df$Div),
               "totgoal" = df$FTHG + df$FTAG,
               "goaldif" = df$FTHG - df$FTAG,
               "result" = ifelse(df$FTHG > df$FTAG, "H",
                                 ifelse(df$FTHG < df$FTAG, "A", "D"))
    )
  )
}



england_current()

I will add as a function to the package and leave on GitHub. If I have time, I'd like to add this too for the other leagues.

As a note - if you're interested in collating data / helping, I have other leagues and competitions going all the way back to their origins e.g. League Cup, French League, - just haven't had time to check + add to package yet.

from engsoccerdata.

jalapic avatar jalapic commented on May 25, 2024

oops.... forgot about the teamnames fix. ugh - that will take time. I notice a lot of inconsistencies with my data, e.g. Bristol Rvs - I'll have to manually add that to the teamname df. Might take me a day or two to get that together.

from engsoccerdata.

JoGall avatar JoGall commented on May 25, 2024

I had a bit of spare time this morning so wrote code to fetch data for the five other leagues available on Football-Data.co.uk. Their data only goes back to '94/'95 but better than nothing. I've left the 'division' variable as a factor for now rather than numeric, e.g. Scotland's divisions are defined as SC0, SC1, etc...

I'm happy to help collate more data whenever I get a chance if you add them to the repo. Where did you obtain your data from by the way? It would be great to have European Cup fixtures too for completeness but can't find an archive of them anywhere.

## FUNs
##-------
#make season codes for URLs
makeSeasons <- function(start = NULL) {
	paste0(substr(start:2016, 3, 4), substr((start+1):2017, 3, 4))
}

#get CSVs
getCSVs <- function(x) {
	df <- read.csv(x)
	Sys.sleep(sample(seq(1, 2, by=0.001), 1))
	df$Season <- format(as.Date(df$Date[1], format="%d/%m/%y"), "%Y") #extract Season as year of first fixture
	df
}

#reformat to engsoccerdata
convertEngSoccerData <- function(df){
	return(data.frame(
		"Date" = as.factor(as.Date(df$Date, "%d/%m/%y")),
		"home" = df$HomeTeam,
		"visitor" = df$AwayTeam,
		"FT" = paste0(df$FTHG, "-", df$FTAG),
		"hgoal" = df$FTHG,
		"vgoal" = df$FTAG,
		"division" = df$Div,
		"tier" = as.numeric(df$Div),
		"totgoal" = df$FTHG + df$FTAG,
		"goaldif" = df$FTHG - df$FTAG,
		"result" = ifelse(df$FTHG > df$FTAG, "H",
		ifelse(df$FTHG < df$FTAG, "A", "D"))
	)
	)
}

## URLs
##------
#Scotland
sco_urls <- c(paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1994), "/SC0.csv"), paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1994), "/SC1.csv"), paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1997), "/SC2.csv"), paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1997), "/SC3.csv"))

#Belgium
bel_urls <- paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1995), "/B1.csv")

#Portugal
por_urls <- paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1994), "/P1.csv")

#Turkey
turk_urls <- paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1994), "/T1.csv")

#Greece
grc_urls <- paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1994), "/G1.csv")


## Example
##---------
scotland <- sco_urls %>%
	lapply(getCSVs) %>%
	lapply(convertEngSoccerData) %>%
	do.call(rbind.data.frame, .)

from engsoccerdata.

jalapic avatar jalapic commented on May 25, 2024

@JoGall Thanks Joe. This is great. Will take a closer look at it - happy to add more leagues. I do have European Cup / Champion's League data - it's in the champs dataframe that comes in the engsoccerdata package from CRAN.

The data come from everywhere - all open source. I believe there are a bunch in the ReadMe. I did notice when I collated this a few years ago that a lot of the online websites with soccer data had copied each other and there were a few errors that they made. Only about 0.1% of the data, but annoying nonetheless.

from engsoccerdata.

jalapic avatar jalapic commented on May 25, 2024

@JoGall Hi Joe - I had a look at importing the other leagues. On my first pass, the csvs imported by the Greek league would not all convert to tidy data using the convert function. Also, other leagues sometimes returned NAs in the csvs. I think these will work, but we'd have to check each file in turn before adding to the package. Also, adding a "Season" variable would be super useful for each csv- that would keep the data consistent with the other dataframes.

from engsoccerdata.

JoGall avatar JoGall commented on May 25, 2024

Ok I've updated this and tested it properly now. There were some inconsistencies in the CSVs (e.g. some columns had 'HT' instead of 'HomeTeam') and annoyingly the division names for some leagues are zero indexed and some aren't, making it hard to parse 'tier' properly. The convert functions seems to work now for all the leagues; I've added a boolean parameter to help create a tier number from the division data (e.g. when 'zeroIndexed' = TRUE, 'division' SC0 becomes 'tier' 1). Added a 'Season' variable too.

## FUNs
##-------
#make season codes for URLs
makeSeasons <- function(start = NULL) {
	paste0(substr(start:2016, 3, 4), substr((start+1):2017, 3, 4))
}

#get CSVs
getCSVs <- function(x) {
	#read and remove whitespace
	df <- read.csv(x, na.strings = c("NA", ""))
	df = na.omit(df[,1:7])
	#extract Season as year of first fixture
	df$Season <- format(as.Date(df$Date[1], format="%d/%m/%y"), "%Y")
	#change column names if required
	if(any(names(df)=="HT")) colnames(df)[which(names(df) == "HT")] <- "HomeTeam"
	if(any(names(df)=="AT")) colnames(df)[which(names(df) == "AT")] <- "AwayTeam"
	
	Sys.sleep(sample(seq(1, 2, by=0.001), 1))
	
	df
}

#reformat to engsoccerdata
convertToESD <- function(df, zeroIndexed = FALSE){

	dfx <- data.frame(
		"Date" = as.factor(as.Date(df$Date, "%d/%m/%y")),
		"Season" = as.numeric(as.character(df$Season)),
		"home" = df$HomeTeam,
		"visitor" = df$AwayTeam,
		"FT" = paste0(df$FTHG, "-", df$FTAG),
		"hgoal" = df$FTHG,
		"vgoal" = df$FTAG,
		"division" = df$Div,
		"tier" = as.numeric(unlist(strsplit(gsub("[^0-9]", "", unlist(df$Div)), ""))),
		"totgoal" = df$FTHG + df$FTAG,
		"goaldif" = df$FTHG - df$FTAG,
		"result" = df$FTR
	)
	
	if(zeroIndexed) dfx$tier <- dfx$tier + 1
	
	dfx
}

## Example: Scotland
##-------
urls <- c(paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1994), "/SC0.csv"), paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1994), "/SC1.csv"), paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1997), "/SC2.csv"), paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1997), "/SC3.csv"))

scotland <- urls %>%
	lapply(getCSVs) %>%
	lapply(function(x) convertToESD(x, zeroIndexed=TRUE)) %>%
	do.call(rbind.data.frame, .)

from engsoccerdata.

jalapic avatar jalapic commented on May 25, 2024

Sorry for slow action on this - when adding I was running checks to ensure CRAN compatibility etc. which always take longer.

All these data for these seasons have been added. Thanks for your help. I'd love to get more data going further back, but this is a great addition.

I noted one error with the function - it assigns tier=2 for Belgium/Portugal/Turkey/Greece rather than tier =1. It works for Scotland to get the correct tier. I've corrected that.

I like to release high quality proofed and checked data like I have for England, Germany, Spain etc. However, realistically I don't have time to do that level of checking for all leagues. Therefore, I've added these "as is" and hopefully if people find errors or additions they can file issues and/or pull-requests. Also, if these leagues had playoff games, those aren't included just yet. I've added that as a thing to do in the ReadMe.

Also, all teamnames are added to the teamnames dataframe. Going forward if other Seasons are added to each league, we should pick a teamname for each team to stay with.

Finally, I don't know if teams changed team names between 1994-2016 in each league. If they did, I've not noted that in the teamnames- I'm assuming unique teamnames in the data are unique teams not just those who changed name. I'll let others find that out and let me know if I need to fix.

from engsoccerdata.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.