Git Product home page Git Product logo

nlarki / fantasy-league-pipeline Goto Github PK

View Code? Open in Web Editor NEW
9.0 1.0 0.0 6.12 MB

Creation of a Fantasy Premier League data pipeline for analysis of both team & player performance. Technologies include, dbt, Prefect, Terraform & docker

License: MIT License

Python 61.05% HCL 34.66% Dockerfile 4.28%
analytics data data-analysis dataengineering dbt docker football football-analytics prefect premier-league

fantasy-league-pipeline's Introduction

Fantasy Premier League data ingestion and analysis ⚽

Overview

The core premise of this project is to showcase what i have learned whilst partaking in the Data Talks Club Data Engineering course. I will be utilising multiple tools in order to create an effective pipeline that can ingest and manipulate the sourced FPL data into a finalised visual dashboard which you can view here!


What is Fantasy Premier League

Fantasy Premier league is an online game that casts you in the role of a Fantasy manager of Premier League players. You must pick a squad of 15 players from the current Premier League season, who score points for your team based on their performances for their clubs in PL matches.

Problem description

footballer

The project will aim to extract multiple years of FPL data for analysis so that we can take a deeper look into individual stats of players and teams across the 2016 to 2023 seasons.

Key insights to be extracted:

  • Who are the most inform goal scorers
  • Who are the most inform assisters
  • What players influence their teams the most
  • What players have the highest points
  • Who are the most expensive players
  • How many goals are scored per season

Technologies

I will use the technolgies below to help with the creation of the project:

  • Cloud: GCP
    • Data Lake: GCS
    • Data warehouse: Big Query
  • Terraform: Infrastructure as code (IaC) - creates project configuration for GCP to bypass cloud GUI.
  • Workflow orchestration: Prefect (docker)
  • Transforming data: DBT
  • Data Visualisation: SAS Visual Analytics

Architecture visualised:

Dashboard examples

The dashboard allows the user to ingest a highlevel analysis of both players and teams across several seasons in the Barclays Premier League. You can view the dashboard here

Home page for visualisation:

alt text

Overview analysis of all seasons:

alt text

Individual team analysis:

alt text

How to run the project

  1. Clone the repo and install the neccesary packages
pip install -r requirements.txt
  1. Next you will want to setup your Google Cloud environment
export GOOGLE_APPLICATION_CREDENTIALS=<path_to_your_credentials>.json
gcloud auth activate-service-account --key-file $GOOGLE_APPLICATION_CREDENTIALS
gcloud auth application-default login
  1. Set up the infrastructure of the project using Teeraform
  • If you do not have Terraform installed you can install it here and then add it to your PATH
  • Once donwloaded run the following commands:
cd terraform/
terraform init
terraform plan -var="project=<your-gcp-project-id>"
terraform apply -var="project=<your-gcp-project-id>"
  1. Run python code in Prefect folder
  • After installing the required python packages, prefect should be installed
  • You can setup the prefect server so that you can access the UI using the command below:
prefect orion start
  • access the UI at: http://127.0.0.1:4200/
  • You will then want to change out the blocks so that they are registered to your credentials for GCS and Big Query. This can be done in the Blocks options
  • You can keep the blocks under the same names as in the code or change them. If you do change them make sure to change the code to reference the new block name
  • Go back to the terminal and run:
cd flows/
python etl_gcs_player.py
  • The data will then be stored both in your GCS bucket and in Big Query
  • If you want to run the process in Docker you can run the commands below:
cd Prefect/
docker image build -t <docker-username>/fantasy:fpl .
docker image push <docker-username>/fantasy:fpl
  • the docker_deploy.py will load the flows into deployment area of prefect so that they can then be run directly from your container.
cd flows/
python docker_deploy.py
  • will start the agent to listen for job flows to run
prefect agent start
  • run the containerized flow from CLI:
prefect deployment run etl-parent-flow/docker_player_flow --param yr=[16,17,18,19,20,21,22] --param yrs=[17,18,19,20,21,22,23]"
  1. Running the dbt flow
  • Create a dbt account and log in using dbt cloud here
  • Once logged in clone the repo for use
  • in the cli at the bottom run the following command:
dbt run
  • this will run all the models and create our final dataset "final_players"
  • final_players will then be placed within the schema chosen when setting up the project in dbt.
  1. How the lineage should look once run: alt text

  2. Visualisation choices

  • You can now take the final_players dataset and use it within Looker or another data visualisation tool like SAS VA which i used.

fantasy-league-pipeline's People

Contributors

nlarki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.