Comments (9)
I will try and implement this, it's a good idea and something that I frequently have to do. I have to make sure that the data are ok to take and include in a package. Also, the big issue is team names - my package may use slightly different versions of names for two or three teams. I have a dataframe of all possible version of teamnames for each team e.g. "Man Utd", "Manchester Utd", "Manchester United", "Man United", "Newton Heath", "Newton H", etc. That should allow us to ensure that the most recent season data can be used with historical data by team.
from engsoccerdata.
Football-Data.co.uk claim on their website that data are "FREE" and that "You are free experiment with the data yourselves", but might be worth checking with them just in case. Hadn't thought about teamname variations so a separate dataframe is a good idea. Good luck with the implementation!
from engsoccerdata.
I altered the function a bit - there was a typo and I've stripped it down slightly. I think the best thing is to have one function that brings all the data in from England and then puts it into engsoccerdata
format. This is what I have:
england_current <- function(){
df <- rbind(read.csv("http://www.football-data.co.uk/mmz4281/1617/E0.csv"),
read.csv("http://www.football-data.co.uk/mmz4281/1617/E1.csv"),
read.csv("http://www.football-data.co.uk/mmz4281/1617/E2.csv"),
read.csv("http://www.football-data.co.uk/mmz4281/1617/E3.csv")
)
return(
data.frame("Date" = as.factor(as.Date(df$Date, "%d/%m/%y")),
"Season" = 2016,
"home" = df$HomeTeam,
"visitor" = df$AwayTeam,
"FT" = paste0(df$FTHG, "-", df$FTAG),
"hgoal" = df$FTHG,
"vgoal" = df$FTAG,
"division" = as.numeric(df$Div),
"tier" = as.numeric(df$Div),
"totgoal" = df$FTHG + df$FTAG,
"goaldif" = df$FTHG - df$FTAG,
"result" = ifelse(df$FTHG > df$FTAG, "H",
ifelse(df$FTHG < df$FTAG, "A", "D"))
)
)
}
england_current()
I will add as a function to the package and leave on GitHub. If I have time, I'd like to add this too for the other leagues.
As a note - if you're interested in collating data / helping, I have other leagues and competitions going all the way back to their origins e.g. League Cup, French League, - just haven't had time to check + add to package yet.
from engsoccerdata.
oops.... forgot about the teamnames fix. ugh - that will take time. I notice a lot of inconsistencies with my data, e.g. Bristol Rvs
- I'll have to manually add that to the teamname df. Might take me a day or two to get that together.
from engsoccerdata.
I had a bit of spare time this morning so wrote code to fetch data for the five other leagues available on Football-Data.co.uk. Their data only goes back to '94/'95 but better than nothing. I've left the 'division' variable as a factor for now rather than numeric, e.g. Scotland's divisions are defined as SC0, SC1, etc...
I'm happy to help collate more data whenever I get a chance if you add them to the repo. Where did you obtain your data from by the way? It would be great to have European Cup fixtures too for completeness but can't find an archive of them anywhere.
## FUNs
##-------
#make season codes for URLs
makeSeasons <- function(start = NULL) {
paste0(substr(start:2016, 3, 4), substr((start+1):2017, 3, 4))
}
#get CSVs
getCSVs <- function(x) {
df <- read.csv(x)
Sys.sleep(sample(seq(1, 2, by=0.001), 1))
df$Season <- format(as.Date(df$Date[1], format="%d/%m/%y"), "%Y") #extract Season as year of first fixture
df
}
#reformat to engsoccerdata
convertEngSoccerData <- function(df){
return(data.frame(
"Date" = as.factor(as.Date(df$Date, "%d/%m/%y")),
"home" = df$HomeTeam,
"visitor" = df$AwayTeam,
"FT" = paste0(df$FTHG, "-", df$FTAG),
"hgoal" = df$FTHG,
"vgoal" = df$FTAG,
"division" = df$Div,
"tier" = as.numeric(df$Div),
"totgoal" = df$FTHG + df$FTAG,
"goaldif" = df$FTHG - df$FTAG,
"result" = ifelse(df$FTHG > df$FTAG, "H",
ifelse(df$FTHG < df$FTAG, "A", "D"))
)
)
}
## URLs
##------
#Scotland
sco_urls <- c(paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1994), "/SC0.csv"), paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1994), "/SC1.csv"), paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1997), "/SC2.csv"), paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1997), "/SC3.csv"))
#Belgium
bel_urls <- paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1995), "/B1.csv")
#Portugal
por_urls <- paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1994), "/P1.csv")
#Turkey
turk_urls <- paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1994), "/T1.csv")
#Greece
grc_urls <- paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1994), "/G1.csv")
## Example
##---------
scotland <- sco_urls %>%
lapply(getCSVs) %>%
lapply(convertEngSoccerData) %>%
do.call(rbind.data.frame, .)
from engsoccerdata.
@JoGall Thanks Joe. This is great. Will take a closer look at it - happy to add more leagues. I do have European Cup / Champion's League data - it's in the champs
dataframe that comes in the engsoccerdata
package from CRAN.
The data come from everywhere - all open source. I believe there are a bunch in the ReadMe. I did notice when I collated this a few years ago that a lot of the online websites with soccer data had copied each other and there were a few errors that they made. Only about 0.1% of the data, but annoying nonetheless.
from engsoccerdata.
@JoGall Hi Joe - I had a look at importing the other leagues. On my first pass, the csvs imported by the Greek league would not all convert to tidy data using the convert function. Also, other leagues sometimes returned NAs in the csvs. I think these will work, but we'd have to check each file in turn before adding to the package. Also, adding a "Season" variable would be super useful for each csv- that would keep the data consistent with the other dataframes.
from engsoccerdata.
Ok I've updated this and tested it properly now. There were some inconsistencies in the CSVs (e.g. some columns had 'HT' instead of 'HomeTeam') and annoyingly the division names for some leagues are zero indexed and some aren't, making it hard to parse 'tier' properly. The convert functions seems to work now for all the leagues; I've added a boolean parameter to help create a tier number from the division data (e.g. when 'zeroIndexed' = TRUE
, 'division' SC0
becomes 'tier' 1
). Added a 'Season' variable too.
## FUNs
##-------
#make season codes for URLs
makeSeasons <- function(start = NULL) {
paste0(substr(start:2016, 3, 4), substr((start+1):2017, 3, 4))
}
#get CSVs
getCSVs <- function(x) {
#read and remove whitespace
df <- read.csv(x, na.strings = c("NA", ""))
df = na.omit(df[,1:7])
#extract Season as year of first fixture
df$Season <- format(as.Date(df$Date[1], format="%d/%m/%y"), "%Y")
#change column names if required
if(any(names(df)=="HT")) colnames(df)[which(names(df) == "HT")] <- "HomeTeam"
if(any(names(df)=="AT")) colnames(df)[which(names(df) == "AT")] <- "AwayTeam"
Sys.sleep(sample(seq(1, 2, by=0.001), 1))
df
}
#reformat to engsoccerdata
convertToESD <- function(df, zeroIndexed = FALSE){
dfx <- data.frame(
"Date" = as.factor(as.Date(df$Date, "%d/%m/%y")),
"Season" = as.numeric(as.character(df$Season)),
"home" = df$HomeTeam,
"visitor" = df$AwayTeam,
"FT" = paste0(df$FTHG, "-", df$FTAG),
"hgoal" = df$FTHG,
"vgoal" = df$FTAG,
"division" = df$Div,
"tier" = as.numeric(unlist(strsplit(gsub("[^0-9]", "", unlist(df$Div)), ""))),
"totgoal" = df$FTHG + df$FTAG,
"goaldif" = df$FTHG - df$FTAG,
"result" = df$FTR
)
if(zeroIndexed) dfx$tier <- dfx$tier + 1
dfx
}
## Example: Scotland
##-------
urls <- c(paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1994), "/SC0.csv"), paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1994), "/SC1.csv"), paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1997), "/SC2.csv"), paste0("http://www.football-data.co.uk/mmz4281/", makeSeasons(1997), "/SC3.csv"))
scotland <- urls %>%
lapply(getCSVs) %>%
lapply(function(x) convertToESD(x, zeroIndexed=TRUE)) %>%
do.call(rbind.data.frame, .)
from engsoccerdata.
Sorry for slow action on this - when adding I was running checks to ensure CRAN compatibility etc. which always take longer.
All these data for these seasons have been added. Thanks for your help. I'd love to get more data going further back, but this is a great addition.
I noted one error with the function - it assigns tier=2 for Belgium/Portugal/Turkey/Greece rather than tier =1. It works for Scotland to get the correct tier. I've corrected that.
I like to release high quality proofed and checked data like I have for England, Germany, Spain etc. However, realistically I don't have time to do that level of checking for all leagues. Therefore, I've added these "as is" and hopefully if people find errors or additions they can file issues and/or pull-requests. Also, if these leagues had playoff games, those aren't included just yet. I've added that as a thing to do in the ReadMe.
Also, all teamnames are added to the teamnames dataframe. Going forward if other Seasons are added to each league, we should pick a teamname for each team to stay with.
Finally, I don't know if teams changed team names between 1994-2016 in each league. If they did, I've not noted that in the teamnames- I'm assuming unique teamnames in the data are unique teams not just those who changed name. I'll let others find that out and let me know if I need to fix.
from engsoccerdata.
Related Issues (20)
- Add Brazilian league Data
- safrica deductions
- England_current() returns same column names but different column types HOT 2
- Champs_Update Needed HOT 2
- League position HOT 2
- England Tier 1 2016/2017 & 2017/2018 data missing? HOT 5
- Add domestic cup data HOT 1
- league cup
- div 3N and div 3S are messed up in england HOT 1
- england 2019/2020 double entries ? HOT 1
- R version 3.50 required? HOT 4
- France raw data is the same as Germany
- MUFC-Bournemouth game 2016
- 2021 england_current() HOT 1
- england_current() issue HOT 1
- belgium
- Greek Playoff Games
- NA in some team names in name_other column of teamnames.csv
- italy 2nd tier
- add England conference data - at least from 2006/7 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from engsoccerdata.