Git Product home page Git Product logo

ahu0605 / football_analytics Goto Github PK

View Code? Open in Web Editor NEW

This project forked from eddwebster/football_analytics

0.0 0.0 0.0 1.15 GB

๐Ÿ“Šโšฝ A collection of football analytics projects, data, and analysis by Edd Webster (@eddwebster), including a curated list of publicly available resources published by the football analytics community.

Home Page: https://www.eddwebster.com

Python 1.14% Makefile 0.01% Jupyter Notebook 98.86%

football_analytics's Introduction

Edd Webster Football Analytics

Edd Webster Analytics

A space for football analytics projects by Edd Webster, including a curated list of publicly available resources published by the football analytics community

Visitors trackgit-views GitHub Stars GitHub Last Commit GitHub Commit Activity GitHub Repository Size Licence Kofi Badge

-----------------------------------------------------

๐Ÿ‘‹ About This Repository and Author

Edd Webster

The README of this repository is a resources guide of learning materials, data sources, libraries, papers, blogs, , etc., created by all those that have made contributions to the open source football analytics community. This GitHub repository and resources list is always a work in progress, with new resources added semi-regularly. If you feel there's any resource(s) that I have missed, please feel free to create a pull request or send me a message on the links above and I'll get back to you as quick as I can!

If you like the repo, please feel free to give it a โญ (top right). Cheers!

For more information about this repository and the author, see the following:

CV Badge Personal Website Badge Email Badge LinkedIn Badge Twitter Badge Mastadon Badge Linktree Badge GitHub Badge Tableau Badge Kofi Badge

-----------------------------------------------------

๐Ÿ“ Table of Contents

Table of Contents
  1. ๐Ÿ‘‹ About This Repository and Author
  2. ๐Ÿ“ Table of Contents
  3. ๐Ÿš€ Getting Started
  4. ๐ŸŒต Repository Structure
  5. ๐Ÿ“š Source Code and Notebooks
  6. ๐Ÿ“ˆ Data Visualisation and Tableau
  7. ๐Ÿ“‘ Resources
  8. ๐Ÿ—ฃ๏ธ Citations
  9. ๐Ÿค Contributing
  10. โญ Star Tracker
  11. ๐Ÿ‘ Acknowledgements

-----------------------------------------------------

๐Ÿš€ Getting Started

โœ… Dependencies

The code in this repository is written in a mix of both Python and R. Before you begin, ensure that you have the following prerequisites installed:

  1. Python (ideally 3.6.1+ installed)
  2. R (ideally 4.0.4+ installed)
  3. The following Python and R libraries...

๐Ÿ Python

General Python data science libraries:

Football analytics Python libraries:

ยฎ๏ธ R

General R data science libraries:

  • tidyverse

Football analytics R libraries:

๐Ÿ” Return

-----------------------------------------------------

๐ŸŒต Repository Structure

The contents of this GitHub repository is organised as follows:

๐Ÿ“‚ eddwebster/football_analytics/ โžก๏ธ central repository of code and analysis by Edd Webster ๐Ÿ“โšฝ
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ dashboards/ โžก๏ธ store of Tableau dashboards used for analysis ๐Ÿ“Š๐Ÿ”
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ data/ โžก๏ธ a selection of raw and processed data extracts by various providers ๐Ÿ’พ๐Ÿ”
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ capology
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ davies
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ elo
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ fbref
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ fifa
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ guardian
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ metrica-sports
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ opta
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ reference
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ sb
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ shots
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ stats-perform
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ stratabet
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ tm
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ touchline-analytics
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ twenty-first-group
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ understat      
โ”‚   โ””โ”€โ”€ ๐Ÿ“‚ wyscout
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ docs/ โžก๏ธ store of documentation for different vendors ๐Ÿ“„๐Ÿ“š
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ centre-circle
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ metrica-sports
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ opta
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ sb
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ shots
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ stratabet          
โ”‚   โ””โ”€โ”€ ๐Ÿ“‚ wyscout
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ fonts/ โžก๏ธ store of custom and externally acquired fonts used for data visualisation โœ๏ธ๐Ÿ“„
โ”‚
โ”œโ”€โ”€ ๐Ÿ“„ .gitignore โžก๏ธ ignore unnecessary files for version control with Git ๐Ÿšซ๐Ÿ“ค
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ img/ โžก๏ธ store of images used for analysis including club badges, vendor logos and official media images ๐Ÿ“ท๐Ÿ’พ
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ club_badges/              # badges for football clubs
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ edd_webster/              # images related to Edd Werbster
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ fig/                      # generated figures derived from analysis and reports in this repository
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ gif/                      # GIF images
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ memes/                    # memes
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ pitches/                  # images of football pitches and goals used mostly for Tableau visualisation
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ players/                  # images of football players
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ vendors/                  # logos for data vendors e.g. StatsBomb
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ vizpiration/              # high-quality visualisations and analysis from renowned members of the football analytics community
โ”‚   โ””โ”€โ”€ ๐Ÿ“‚ websites-blogs/           # logos for data analysis websites and blogs e.g. Club Elo
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ scripts/ โžก๏ธ store of libraries and Python and open source code ๐Ÿ“™๐Ÿ› 
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ notebooks/ โžก๏ธ Jupyter notebooks for exploration and visualisation
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ 1_data_scraping/          # notebooks with code to acquire data via webscraping
โ”‚   โ”‚   โ”œโ”€โ”€ Capology Player Salary Web Scraping.ipynb
โ”‚   โ”‚   โ”œโ”€โ”€ FBref Player Stats Web Scraping.ipynb
โ”‚   โ”‚   โ””โ”€โ”€ TransferMarkt Player Bio and Status Web Scraping.ipynb   
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ 2_data_parsing/           # notebooks with code to acquire data via APIs
โ”‚   โ”‚   โ”œโ”€โ”€ Elo Team Ratings Data Parsing.ipynb
โ”‚   โ”‚   โ”œโ”€โ”€ StatsBomb Data Parsing.ipynb
โ”‚   โ”‚   โ””โ”€โ”€ Wyscout Data Parsing.ipynb   
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ 3_data_engineering/       # notebooks with code to engineer raw, unprocessed data to processed data
โ”‚   โ”‚   โ”œโ”€โ”€ Capology Player Salary Data Engineering.ipynb
โ”‚   โ”‚   โ”œโ”€โ”€ Centre Circle Opta CPL Data Engineering.ipynb
โ”‚   โ”‚   โ”œโ”€โ”€ FBref Player Stats Data Engineering.ipynb
โ”‚   โ”‚   โ”œโ”€โ”€ Opta #mcfcanalytics PL 2011-2012.ipynb
โ”‚   โ”‚   โ”œโ”€โ”€ StatsBomb Data Engineering.ipynb
โ”‚   โ”‚   โ”œโ”€โ”€ The Guardian Player Recorded Transfer Fees Data Engineering.ipynb
โ”‚   โ”‚   โ”œโ”€โ”€ TransferMarkt Historical Market Value Data Engineering.ipynb
โ”‚   โ”‚   โ”œโ”€โ”€ TransferMarkt Player Bio and Status Data Engineering.ipynb
โ”‚   โ”‚   โ”œโ”€โ”€ TransferMarkt Player Recorded Transfer Fees Data Engineering.ipynb
โ”‚   โ”‚   โ”œโ”€โ”€ Understat Data Engineering.ipynb
โ”‚   โ”‚   โ””โ”€โ”€ Wyscout Data Engineering.ipynb
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ 4_data_unification/       # notebooks with code to unify disperate datasets
โ”‚   โ”‚   โ””โ”€โ”€ Unification of Aggregated Seasonal Football Datasets.ipynb
โ”‚   โ”‚
โ”‚   โ””โ”€โ”€ ๐Ÿ“‚ 5_data_analysis_and_projects    # notebooks with code for example projects and analysis
โ”‚       โ”œโ”€โ”€ ๐Ÿ“‚ player_similarity_and_clustering
โ”‚       โ”‚   โ””โ”€โ”€ PCA and K-Means Clustering of 'Piquรฉ-like' Defenders.ipynb 
โ”‚       โ”‚
โ”‚       โ”œโ”€โ”€ ๐Ÿ“‚ tracking_data
โ”‚       โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ metrica_sports
โ”‚       โ”‚   โ”‚   โ””โ”€โ”€ Metrica Tracking Data EDA.ipynb
โ”‚       โ”‚   โ””โ”€โ”€ ๐Ÿ“‚ signality
โ”‚       โ”‚       โ”œโ”€โ”€ Signality Tracking Data Engineering.ipynb
โ”‚       โ”‚       โ””โ”€โ”€ Signality Tracking Data EDA.ipynb
โ”‚       โ”‚
โ”‚       โ””โ”€โ”€ ๐Ÿ“‚ xg_modeling
โ”‚           โ”œโ”€โ”€ ๐Ÿ“‚ shots_dataset
โ”‚           โ”‚   โ”œโ”€โ”€ Logistic Regression Expected Goals Model.ipynb
โ”‚           โ”‚   โ””โ”€โ”€ XGBoost Expected Goals Model.ipynb
โ”‚           โ””โ”€โ”€ ๐Ÿ“‚ opta_dataset
โ”‚               โ””โ”€โ”€ raining of an Expected Goals Model Using Opta Event Data.ipynb
โ”‚
โ”œโ”€โ”€ ๐Ÿ“„ README.md โžก๏ธ project description and setup guide for better structure and collaboration ๐Ÿ“–๐Ÿค
โ”‚
โ”œโ”€โ”€ ๐Ÿ“‚ research/ โžก๏ธ central repository of research and publicly available resources in football analytics ๐Ÿ“™โšฝ
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ documents/                # documents
โ”‚   โ”œโ”€โ”€ ๐Ÿ“‚ papers/                   # published academic papers and literature
โ”‚   โ””โ”€โ”€ ๐Ÿ“‚ slides/                   # PowerPoint slides for published research
โ”‚
โ””โ”€โ”€ ๐Ÿ“‚ video/ โžก๏ธ store of videos used or generated for analysis ๐ŸŽฅ๐Ÿ’พ

๐Ÿ” Return

-----------------------------------------------------

๐Ÿ“š Source Code and Notebooks

The code in this repository is mostly written in Jupyter notebooks or Python scripts, organised in the following workflow:

  1. Webscraping
  2. Data Parsing
  3. Data Engineering
  4. Data Unification
  5. Data Analysis - projects include working with Tracking data, constructing VAEP models (as introduced by SciSports), building xG models using Logistic Regression, Random Forests and Gradient Booested Decision Tree algorithms such as XGBoost, and analysing player similarity using PCA and K-Means clustering.

๐Ÿ” Return

-----------------------------------------------------

๐Ÿ“Š Data Visualisation and Tableau Dashboards

For Tableau dashboards produced using the data engineered in the notebooks in this repository, please see my Tableau Public profile: public.tableau.com/profile/edd.webster.

Example Tableau dashboards:

๐Ÿ” Return

-----------------------------------------------------

๐Ÿ“‘ Resources

๐Ÿ”– Other Football Analytics Resources Guides

Credit to the following resources that were all used to plug gaps in this resources guide once it was published:

๐Ÿ” Return

-----------------------------------------------------

๐Ÿƒ Getting Started with Football Analytics

Good resources for those new for the use of data in football:

๐Ÿ” Return

-----------------------------------------------------

๐Ÿ’พ Data

โ„น๏ธ Data Sources

Publicly available data sources and datasets relating to football, from Tracking data, Event data, aggregated player performance data, detailed match statistics, injury records and transfer values, and more.

Data sources that have been used in the code and analysis in this repository can be found in the data subfolder of this repository or in Google Drive (due to GitHub's 100mb file limit) [link]. All code however in this repository should enable you to scrape, parse, and engineer the datasets as per the output used for analysis and visualisations featured.

To learn more about the different types of data available, such as Event and Tracking data, see the "Where can I get data?" section of Devin Pleuler's soccer_analytics_handbook [link].

For a quick primer of the free football data resources available, see the following Twitter thread by James Nalton [link].


Event data

Event Data is labelled data for each on-the-ball event that takes place during a game. The data is manually collected from television footage. To learn more about the data collection, see the following video [link].

Each match of event data has around 2-3 thousand individual events (rows), depending on the provider.

The main providers of this data are StatsBomb, Stats Perform (formally Opta), and Wyscout.

Name Comments Source / method(s) to get the data
StatsBomb Open Data StatsBomb Open Data GitHub Repo
StrataData by StrataBet Chance shooting data provided No longer made available (since 2018), however, it can be found in GitHub repos of old analysis (including this one) [link].
Soccer Video and Player Position Dataset Dataset of elite soccer player movements and corresponding videos, made available by the University of Oslo. See the accompanying paper [link] [Link] (appears to no longer be working)
Opta Event data for 20+ leagues including the 'Big 5' European leagues, some of which go back to the 09/10 season, Data available through scraping WhoScored? Match Centre through the following methods:
Opta (11/12 sample dataset) Match-by-match aggregated player performance data for the 11/12 season and F24 Event data for a 11/12 match of Manchester City vs. Bolton Wanders as part of the #mcfcanalytics initiative No longer made available (since 2012), however, it can be found in GitHub repos of old analysis (including this one).
Understat Shooting and meta data including xG values for the 'Big 5' European leagues and Russian Premier League This data can be accessed through the following:
Wyscout Event data for the 17/18 season for the 'Big 5' European leagues, Euro 2016 Chanpionship, and 2018 World Cup made available by Luca Pappalardo, Alessio Rossi, and Paolo Cintia. See their paper A public data set of spatio-temporal match events in soccer competitions. Figshare

Tracking data

Tracking Data records the x and y coordinates of every player on the field, as well as the ball, a number of times per second (usually 10-25). For this reason, the dataset is quite large, much larger than event data at around 2-3 million rows per game.

The data is collected by cameras installed in a stadium and is therefore not widely available, with teams usually only having access to the data in their own league.

The main providers of this data are Second Spectrum, STATS Perform, Metrica Sports, and Signality.

Name Comments Source / method(s) to get the data
Last Row Tracking-like data by Ricardo Tavares Tracking-like data collected by Ricardo Tavares. See the Liverpool Analytics Challenge for which this data was used (winners discussed on Friends of Tracking [link]). GitHub repo
Metrica Sports Sample Tracking and corresponding Event data Three sample matches of synced event and tracking data. For code to work with this data including Pitch Control modellng, see the LaurieOnTracking GitHub repo by Laurie Shaw and the corresponding Friends of Tracking tutorials. GitHub repo
Signality Tracking data Three matches of tracking data from the Allsvenskan - Hammarby vs. IF Elfsborg (22/07/2019), Hammarby 5 vs. 1 Oฬˆrebroฬˆ (30/09/2019), and Hammarby vs. Malmoฬˆ FF (20/10/2019). This data was made available as part of the 2020 Mathematical Modelling of Football course. The password to download the data is not publicly available, but can be found in the Uppsala Mathematical Modelling of Football Slack group [link]. For access, contact Novosom Salvador Twitter and [email protected], or feel free to contact myself. Note, that the 2nd half of the Hammarby-ร–rebro match is incomplete.

Broadcast Tracking data

Broadcast Tracking is collected from broadcast footage using computer vision techniques. Unlike in-stadium tracking data, the dataset is not complete and missing players out of shot of the broadcast footage. However, the great benefit is that the data collected is much cheaper and the coverage for what leagues are available is much greater which is extremely useful for tasks such as recruitment analysis.

The main providers of this data are SkillCorner and Sportlogiq.

Name Comments Source / method(s) to get the data
SkillCorner broadcast Tracking data 9 matches of broadcast tracking data, including matches from 2019/2020 for the league champions and runners up in English Premier League, French L1, Spanish LaLiga, Italian Serie A and German Bundesliga. To find out more about broadcast tracking data and its use cases, see the following Medium article [link]. GitHub repo

Aggregated Player/Team Performance data
Name Comments Source / method(s) to get the data
DAVIES modelling data Estimated player evaluation data by Sam Goldberg and Mike Imburgio for American Soccer Analysis. To learn more about DAVIES, see the following blog post [link]. Shiny App
FBref season-on-season aggregated player performance data provided by StatsPerform. Aggregated player performance data for the following competitions:
  • Men's competitions
    • English Premier League
    • Spanish La Liga
    • German Bundesliga
    • French Ligue 1
    • Italian Serie A
    • Dutch Eredivisie
    • Portuguese Primeira Liga
    • Brazilian Serie A
    • Mexican Liga MX
    • MLS
    • English Championship
    • Champions League
    • Europa League
    • Conmebol Copa Libertadores
    • World Cup
    • Euros
    • Copa America
  • Women's competitions
    • American NWSL
    • English Super League
    • Australian A-League
    • French Division 1 Feminine
    • German Frauen-Bundesliga
    • Italian Serie A
    • Spanish Liga F
    • Women's Champions League
    • World Cup
    • Euros
Note: there was a change in the data provider used by FBref for their statistics in October 2022, from StatsBomb to StatsPerform. Therefore, the following scraping code is split into current working solutions and archived solutions: Additional data sources:
  • Every FBref metric for every 2020-21 Big 5 European league player by Ronan, see [link], [link] and [Tweet]. A 'tidied' version have been made by goaltergeist, see [link]
  • 2,823 players in Europe's top 5 leagues on FBref, with their positions as listed on Transfermarkt by Rahul Iyer, see [link] and [Tweet]
Stats Perform and Centre Circle Canadian Premiere League data Aggregated player performance data Google Drive

Team Rating data
Name Comments Source / method(s) to get the data
Elo club rankings Elo ratings for club football based on past results to allow for estimation of each club's strength, allowing predictions for the future. Data available through:
Euro Club Index Ranking of the football teams in the highest division of all European countries, that shows their relative playing strengths at a given point in time, and the development of playing strengths in time. To see more about the methodology used to calculate these rankings, see the following page [link] Link
FiveThirtyEight Club Ranking Global Club Soccer Rankings. How 637 international club teams compare by Soccer Power Index Data available through:
Opta Power Rankings Opta Power Rankings Data available through:
UEFA Club Coefficients UEFA club coefficient rankings based on the results of all European clubs in UEFA club competition. Data available through:
World Football / Soccer Clubs Ranking Club ranking website Link

Physical data
Name Comments Source / method(s) to get the data
Bundesliga physical data Bundesliga player stats, powered by AWS Link (not scraped into a CSV)

Results and Match Sheet data
Name Comments Source / method(s) to get the data
2018 FIFA World Cup Rosters Goals, caps, club, and date of birth for players on 2018 FIFA World Cup rosters. Source: data.world Excel
engsoccerdata English and European soccer results 1871-2017 GitHub repo
FIFA World Cup Match Results Matchups and results of FIFA World Cup matches from 1930 - 2014. Source: data.world Excel
FotMob Dataset including team and play stats including xG and post-shot xG. This data can be scraped using:
Football Lineups A database of teams tactics and formations crowdsourced by the users. Link
international_results Repository of results of 44,353 results of international football matches starting from the very first official match in 1872 up to 2022. GitHub repo
smarterscout Scouting and player rating information platform for evaluating the performance of football players around the world. The platform was developed by Dan Altman at North Yard Analytics to assess players' contributions to winning, their playing style, and their skill level. Note: this is a subscription service. Link
SofaScore Live scores, lineups, standings, heatmaps, and basic teams, coaches and player data Link
Soccerway Match sheet data Link

Financial, Valuation, and Transfer data
Name Comments Source / method(s) to get the data
Capology Player salaries See the Capology Player Salary Web Scraping notebook for Python code to scrape Capology data or access saved CSV files in data subfolder
KPMG Football Benchmark player valuation data
The Price of Football Master Spreadsheet data from the finance/business aspect of football by Kieran Maguire Link
spotrac Player contracts, salaries, and transfer information for the Premier League, MLS, and NWSL
TransferMarket Player bio, contractual, and estimated value data This data can be accessed through the following:
Guardian Player Transfer data Collated by Tom Worville (see Tweet [link]) GitHub

Odds, Betting, and Predictions data
Name Comments Source / method(s) to get the data
BetExplorer odds data Link
FiveThirtyEight Soccer Predictions database football prediction data Link
Football-Data.co.uk free bets and football betting, historical football results and a betting odds archive, live scores, odds comparison, betting advice and betting articles Link
International football results from 1872 to 2020 an up-to-date dataset of over 40,000 international football results by Mart Jรผrisoo Link

Plotting Tools

See Mark Wilkin's Twitter thread for more about how to plot your own event data [link]:


Reference data
Name Comments Source / method(s) to get the data
xT grid League-wide Expected Threat (xT) values from the 2017-18 Premier League season (12x8 grid) determined by Karun Singh. For more information about about xT, see Karun's blog post [link] Link
EPV grid Grid of Expected Possession Values determined by Laurie Shaw. See the following lecture for more information [link] Link
Zones of a pitch Breakdown of a pitch into zones, for use with visualisation.Created by Rob Carroll Link

Miscellaneous Data
Name Comments Source / method(s) to get the data
awesome-football โญ by football.db (Gerald Bauer) A collection of awesome football (national teams, clubs, match schedules, players, stadiums, etc.) datasets GitHub repo
Data Hub Football data Link
European Soccer Database 25k+ matches, players & teams attributes for European Professional Football Link
FIFA 15-22 player rating data Scraped from SoFIFA by Stefano Leone Link
FIFA 18 Player Ratings 17k+ players, 70+ attributes extracted from FIFA 18, provided by sofifa Link
FootballData "A hodgepodge of JSON and CSV Football data" GitHub
footballcsv Historical soccer results in CSV format Link
football.db A free and open public domain football database & schema for use in any (programming) language (e.g. uses plain datasets) Link
Football xG Link
Guide to Football/Soccer data and APIs by Joe Kampschmid Link
My Football Facts Link
Physio Room Link
PlusMinusData play by play data from espn.com Link
Rec.Sport.Soccer Statistics Foundation Historical league tables and football results Link
RoboCup Soccer Simulator RoboCup Soccer Simulator Data Link
Squawka Link
Stat Bunker Link
Tableau data resources including sports data Link
Transfer League Link
Twelve Football Link
wosostats Women's soccer data from around the world Link

๐Ÿ“„ Documentation

All documentation saved locally in the documentation subfolder, including:


๐Ÿ’ผ Data Types and Companies

Data Providers
Tracking
Videos / Performances Analysis
Consultancy / Service Providers

๐Ÿ” Return

-----------------------------------------------------

๐Ÿง‘โ€๐ŸŽ“ Tutorials

Python

R

Tableau

Check out the Tableau for Sports Discord server organised by Ninad Barbadikar, to interact with a community of Tableau developers

For a YouTube playlist of Tableau-football videos and tutorials that I have collated from various sources including the Tableau Football User Group, Rob Carroll, Tom Goodall, and Ninad Barbadikar, see the following [link].

PowerBI

For a YouTube playlist of Power BI-football videos and tutorials that I have collated from various sources including Futbol AnalysR and PowerBI for Sports, see the following [link].

SQL

Excel

PowerPoint

๐Ÿ” Return

-----------------------------------------------------

๐Ÿ›๏ธ Libaries

GitHub libraries that are considered to be 'Top rated' are those with 50 or more stars (at the time of writing) and have been indicated with a star emoji (โญ).

For a full list of Football Analytics GitHub repositories and libraries, see the following list on GitHub [link].

Python

R

  • ggsoccer โญ by Ben Torvaney - a soccer visualisation library in R
  • ggshakeR โญ by Abhishek Mishra - an analysis and visualisation R package that works with publicly available soccer data. See the following documentation [link]
  • StatsBombR โญ - an R package to easily stream StatsBomb data from the API using your log in credentials or from the Open Data GitHub repository cost free into R
  • soccerAnimate โญ - an R package to create 2D animations of soccer tracking data
  • soccermatics โญ by Joe Gallagher - an R package for the visualisation and analysis of soccer tracking and event data
  • worldfootballR โญ by Jason Zivkovic - a R package for extracting world football (soccer) data from FBref, TransferMarkt, Understat and fotmob (see guide on how to use this package [link])
  • understatr โญ by ewenme - a R package to scrape Understat shooting and player meta data.

๐Ÿ” Return

-----------------------------------------------------

๐Ÿ“ GitHub Repositories

The following GitHub repositories are either repos that I have found and recommend or are publicly available analytics work in the subject of football with at least 5 stars on GitHub (at the time of writing).

GitHub repositories that are considered to be 'Top Rated' are those with 50 or more stars (again, at the time of writing) and have been indicated with a star emoji (โญ).

For a full list of Football Analytics GitHub repositories and libraries, see the following list on GitHub [link].

Python

R

Other Languages

No Language Specified

๐Ÿ” Return

-----------------------------------------------------

๐Ÿ“ฑ Apps

๐Ÿ” Return

-----------------------------------------------------

๐Ÿ“Š Data Visualisation Resources and Tools

Resources to aid data visualisation:

Vizpiration

Check out the vizpiration subfolder in the img folder, for examples of visualisations created by analysts in the community.

Tutorials

Repos and libraries

Resources

Tweets

๐Ÿ” Return

-----------------------------------------------------

โœ’๏ธ Written Pieces

โœ๏ธ Blogs

Many of these blog posts are recommended in Sam Gregory's Best Football Analytics Pieces piece and Tom Worville's โ€œWhatโ€™s the best Football Analytics piece youโ€™ve ever read?โ€, both articles now a few years old. This section is very subjective so if I've missed anything obvious, apologies.

Blogs and Data Analytics Websites

The following list contains those blogs that are still maintained, as well as the original blogs from the OGs of football analytics.

For a Twitter thread of the football analytics blogs from 2009 an earlier, see the following Twitter thread from Tiotal Football [link].

๐Ÿ“ƒ Papers

See the following subfolder of this GitHub repo for PDF copies of the papers listed below [link].

Many of the papers included in this list have been included after reading Jan Van Haaren's Jan Van Haaren's Soccer Analytics Reviews ([2020]https://janvanhaaren.be/posts/soccer-analytics-review-2020/)), 2021), 2022)), and 2023). All credit to him for reading a paper a week and making his reviews publicly available and give his reviews a read through if you haven't already done so!

The following Shiny App from Lars Maurath is a great tool for looking up publications [link].

See the following webpages of conference papers per year (where available):

2022
2021
2020
2019
2018
2017
2016
2015
2014
2011
2002
1997
1971

Newsletters

News Articles

๐Ÿ“š Books

The list of books below include are not only for football but for sports analytics in general.

See the following reading lists for book recommendations from other sports data scientists:

The following use Amazon UK links where available and are not affiliate links.

Magazines

๐Ÿ” Return

-----------------------------------------------------

๐Ÿ“ผ Video

YouTube Playlists

Custom Playlists Curated by Myself

The following is a series of playlists that that I have collated originally for my own personal viewing but they may be useful to you:

Public Playlists

Playlists created by others

YouTube Channels

Video Analysis

Webinars and Lectures

Ted Talks

Documentaries

Match Highlights

Other

๐Ÿ” Return

-----------------------------------------------------

๐Ÿ”Š Podcasts

Below I've tried to include both the Sports/Football Analytics and then notable episodes of all podcasts that have analytical content/interviews. Spotify and YouTube links used where available. All episodes mentioned below that are available on Spotify can be found in the following playlist (updated periodically): [link].

Football Analytics Podcasts

Notable Episodes (including non-football-data-specific podcasts)

๐Ÿ” Return

-----------------------------------------------------

๐Ÿ‘จโ€๐Ÿ’ป Notable Figures and Twitter Accounts

๐Ÿ” Return

-----------------------------------------------------

๐Ÿ—“๏ธ Events and Conferences

๐Ÿ” Return

-----------------------------------------------------

๐Ÿ† Competitions

The following includes non-football competitions.

๐Ÿ” Return

-----------------------------------------------------

๐Ÿง‘โ€๐Ÿซ Courses

๐Ÿ” Return

-----------------------------------------------------

๐Ÿ’ผ Jobs

For live job postings tracked by the community, check the Jobs channel of the Football in Numbers Discord server.

Clubs

The list of clubs is quite UK-centric. I would like to add more clubs but it takes a bit of time.

Premier League
Championship
League One

League Two

Scottish Premier League

Analytics Companies and Consultancies

Associations and Organisations

Betting Companiess

Media

Job Boards

Other Website Lists

๐Ÿ” Return

-----------------------------------------------------

Discord/Slack groups

๐Ÿ” Return

-----------------------------------------------------

๐Ÿ”‘ Key Concepts

Focus on some of the key topics in football analytics. Most of the following resources features above but are instead reorganised by topic. This section is still very much a work in progress as I go along and may be missing resources mentioned above.

History of Football Analytics

Expected Goals (xG) Modeling

Videos

For a playlist of Expected Goals related videos available on YouTube, see the following playlist I have created [link].

Webinars and Lectures
Tutorials
Notable Models
Written Pieces

For a collated list of Expected Goals literature collated by Keith Lyons, see the following [link]

Libraries
GitHub Repositories
Podcasts
Tweets

Web Scraping Football Data

Written Pieces
Videos
Libraries

Tracking Data

Pitch Control Modeling

Tutorials

Pitch Control modelling and Valuing Actions tutorials by Laurie Shaw as part of his Metrica Sports Tracking data series for Friends of Tracking. See the following for code [link]

GitHub Repositories
Written Pieces
Video
Podcasts

Passing Networks

Written Pieces
Blogs
Papers
Tutorials
Videos
Tweets

Possession Value (PV) Frameworks

General
Expected Threat (xT)
Valuing Actions by Estimating Probabilities (VAEP)
Goals Added (g+)
On-Ball Value (OBV)

Dixon Coles Modeling

Player Similarity and Style Analysis

Written Pieces
Videos
Tutorials
GitHub Repositories

Reinforcement Learning for Football Simulation

Player Rating Modelling

Written Pieces
Podcasts
Github Repos
Companies
  • Traits Insights

Team Playing Style Analysis

Written Pieces
Papers
Blogs
Videos
GitHub Repositories

Set Pieces

Section created after seeing the following tweets and threads by Ashwin Raman ([link]) and Stuart Reid ([link])

Radars

Recruitment Analysis

Quantifying Relative Club and League Strength

Models
Financial
Historical Match Results
Historical Statistical Player Performance
Articles
Papers
Videos
Data
Miscellaneous
  • Tweets by AI Abucus [link] and [link]. They use a simple Dickson-Coles method focusing on historic results going back 15 years to build an order of hierarchy amongst teams in leagues that might have never played each other.

Tactics

Counter Attacking
Articles
Papers
Videos
Podcasts
Pressing
Articles
Videos
Counter Pressing
Articles
Papers
Videos

Player Valuation Modeling

Example Models
Example Methodologies
Written Pieces Regarding the Topic of Player Valuation
Articles
Blogs
Papers
Code/Notebooks
Slides
Tweets
Financial Data
Player Values
Recorded Transfers
Other
Relevant Packages/Repos
Miscellaneous

Game Win Probability Modeling

Goalkeeper Analysis

๐Ÿ” Return

-----------------------------------------------------

๐Ÿ—ฃ๏ธ Citations

Thanks to all those that have kindly wrote about or promoted this GitHub repository. See:

๐Ÿ” Return

-----------------------------------------------------

๐Ÿค Contributing

This GitHub repository and resources list is always a work in progress, with new resources added semi-regularly. If you feel there's any resource(s) that I've missed, I'm always open to contributions! Please feel free to create a pull request or send me a message @ [email protected] or @eddwebster and I'll get back to you as quick as I can!

If you're new to creating a pull request, please follow these steps (based on this)

  1. Create an account on GitHub if you do not already have one.

  2. Fork the project repository: click on the โ€˜Forkโ€™ button near the top of the page. This creates a copy of the code under your account on the GitHub user account. For more details on how to fork a repository see this guide.

  3. Clone your fork of the football_analytics repo from your GitHub account to your local disk:

    git clone https://github.com/<github username>/football_analytics.git
    cd football_analytics
  4. Create environment with:
    $ python3 -m venv my_env or $ python -m venv my_env or with conda:
    $ conda create -n my_env python=3

  5. Activate the environment:
    $ source my_env/bin/activate
    or with conda:
    $ conda activate my_env

  6. Add the upstream remote. This saves a reference to the main hyperopt repository, which you can use to keep your repository synchronised with the latest changes:

    $ git remote add upstream https://github.com/eddwebster/footbal_analytics.git

    You should now have a copy of the football analytics repository, and your git repository properly configured. The next steps now describe the process of modifying code and submitting a pull request:

  7. Synchronize your master branch with the upstream master branch:

    git checkout master
    git pull upstream master
  8. Create a feature branch to hold your development changes:

    $ git checkout -b my_change

    and start making changes. Always use a feature branch. Itโ€™s good practice to never work on the master branch!

  9. Then, once you commit ensure that git hooks are activated (Pycharm for example has the option to omit them). This can be done using pre-commit, as follows:

    pre-commit install
  10. Develop the feature on your feature branch on your computer, using Git to do the version control. When youโ€™re done editing, add changed files using git add and then git commit:

    git add modified_files
    git commit -m "my first football_analyitcs commit"
  11. Record your changes in Git, then push the changes to your GitHub account with:

    git push -u origin my_change

๐Ÿ” Return

-----------------------------------------------------

โญ Star History

Star history for the football_analytics repository.

Football Analytics GitHub Stars History

๐Ÿ” Return

-----------------------------------------------------

๐Ÿ‘ Acknowledgements

๐Ÿ” Return

football_analytics's People

Contributors

eddwebster avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.