Git Product home page Git Product logo

sql-football-match-analysis's Introduction

European Football League Data Analysis using PostgreSQL

In this exciting project, I explored the European Soccer Database available on Kaggle, originally in SQLite format. I transformed the data into a PostgreSQL database and conducted an in-depth Exploratory Data Analysis (EDA) to uncover intriguing insights about the world of football.

Project Steps:

  1. Data Download: I downloaded the European Soccer Database in SQLite format from Kaggle using this link.

  2. CSV Creation: Utilizing the csv_maker.py script, I extracted data from 7 tables in the SQLite file and uploaded CSV files for 5 of them. So,you have to use this file to create "match.csv" and "player_attributes.csv" file for the next process.

  3. PostgreSQL Database Creation: I created a PostgreSQL database using my username and other relevant details. No table creation was required at this stage, but I ensured that all datasets were available in the Datasets folder. By running the table_creation.sql file, I efficiently created and exported the necessary tables into my PostgreSQL database.

  4. Explanatory Data Analysis: The fascinating football_project_database.sql file contains answers to several intriguing questions:

    • What are the top 30 teams with the highest average overall rating?

      SELECT team_long_name,
             ROUND((buildUpPlaySpeed +
                    CASE WHEN buildUpPlayDribbling IS NULL 
                         THEN (SELECT ROUND(AVG(buildUpPlayDribbling), 2) FROM team_attributes)
                         ELSE buildUpPlayDribbling END +
                    buildUpPlayPassing +
                    chanceCreationPassing +
                    chanceCreationCrossing +
                    chanceCreationShooting +
                    defencePressure +
                    defenceAggression +
                    defenceTeamWidth) / 9, 2) AS overall_rating
      FROM team_attributes
      LEFT JOIN team USING (team_api_id)
      ORDER BY overall_rating DESC
      LIMIT 30;

      Top 30 Teams

    • Which team has the highest goal-scoring rate (goals scored per match)?

      WITH goal_by_team AS (
          SELECT match_api_id, home_team_api_id AS team_api_id, home_team_goal AS goal
          FROM match
          UNION
          SELECT match_api_id, away_team_api_id AS team_api_id, away_team_goal AS goal
          FROM match
      )
      SELECT team_long_name, goal_per_match
      FROM (SELECT team_api_id, SUM(goal)::FLOAT / COUNT(team_api_id) AS goal_per_match
            FROM goal_by_team
            GROUP BY team_api_id) AS t1
      LEFT JOIN team USING (team_api_id)
      ORDER BY goal_per_match DESC
      LIMIT 10;

      Highest Goal-Scoring Team

    • Who is the best player from 2008 to 2016?

      SELECT player_name, overall_rating,
             EXTRACT(YEAR FROM now()) - EXTRACT(YEAR FROM birthday) AS age,
             height, weight
      FROM (SELECT player_api_id, ROUND(AVG(overall_rating), 2) AS overall_rating
            FROM player_attributes
            WHERE overall_rating IS NOT NULL
            GROUP BY player_api_id) AS t1
      LEFT JOIN player USING (player_api_id)
      ORDER BY overall_rating DESC
      LIMIT 20;

      Best Player

    • Analyzing the performance of Real Madrid (Team ID: 8633) over the years.

      WITH team_result AS (
          SELECT home_team_api_id, away_team_api_id, home_team_goal, away_team_goal,
                 CASE WHEN (home_team_api_id = 8633 AND home_team_goal > away_team_goal) OR
                           (away_team_api_id = 8633 AND home_team_goal < away_team_goal)
                      THEN 'Won'
                      WHEN (home_team_api_id = 8633 AND home_team_goal < away_team_goal) OR
                           (away_team_api_id = 8633 AND home_team_goal > away_team_goal)
                      THEN 'Lost'
                      WHEN home_team_goal = away_team_goal
                      THEN 'Draw' END AS result
          FROM match
          WHERE home_team_api_id = 8633 OR away_team_api_id = 8633
      )
      SELECT result, COUNT(*)
      FROM team_result
      GROUP BY result;

      Real Madrid Performance

    • Determining the team with the most wins.

      WITH winner_teams AS (
          SELECT winner, COUNT(*) AS wins
          FROM (SELECT match_api_id, home_team_api_id, home_team_goal, away_team_goal, away_team_api_id,
                       CASE WHEN home_team_goal > away_team_goal THEN home_team_api_id
                            WHEN home_team_goal < away_team_goal THEN away_team_api_id
                            ELSE NULL END AS winner
                FROM match) AS t1
          GROUP BY winner
          HAVING winner IS NOT NULL
          ORDER BY wins DESC
      )
      SELECT team_long_name, wins
      FROM winner_teams
      LEFT JOIN team ON winner_teams.winner = team.team_api_id
      LIMIT 20;

      Most Wins

    • Identifying the best teams in the German, English, French, Spanish, and Italian leagues.

      WITH winner_teams AS (
          SELECT winner, COUNT(*) AS wins
          FROM (SELECT match_api_id, home_team_api_id, home_team_goal, away_team_goal, away_team_api_id,
                       CASE WHEN home_team_goal > away_team_goal THEN home_team_api_id
                            WHEN home_team_goal < away_team_goal THEN away_team_api_id
                            ELSE NULL END AS winner
                FROM match
                WHERE league_id IN (1729, 4769, 7809, 10257, 21518)) AS t1
          GROUP BY winner
      ),
      loser_teams AS (
          SELECT loser, COUNT(*) AS losses
          FROM (SELECT match_api_id, home_team_api_id, home_team_goal, away_team_goal, away_team_api_id,
                       CASE WHEN home_team_goal < away_team_goal THEN home_team_api_id
                            WHEN home_team_goal > away_team_goal THEN away_team_api_id
                            ELSE NULL END AS loser
                FROM match
                WHERE league_id IN (1729, 4769, 7809, 10257, 21518)) AS t2
          GROUP BY loser
      ),
      draw_teams AS (
          SELECT draw_team, COUNT(*) AS draws
          FROM (SELECT match_api_id,
                       CASE WHEN home_team_goal = away_team_goal THEN home_team_api_id END AS draw_team
                FROM match
                UNION
                SELECT match_api_id,
                       CASE WHEN home_team_goal = away_team_goal THEN away_team_api_id END AS draw_team
                FROM match
                WHERE league_id IN (1729, 4769, 7809, 10257, 21518)) AS t3
          GROUP BY draw_team
      )
      SELECT team_long_name, wins, losses, draws, (wins * 3 + draws) AS points
      FROM winner_teams AS w
      LEFT JOIN loser_teams AS l ON w.winner = l.loser
      LEFT JOIN draw_teams AS d ON l.loser = d.draw_team
      LEFT JOIN team ON d.draw_team = team.team_api_id
      WHERE team_long_name IS NOT NULL
      ORDER BY points DESC
      LIMIT 50;

      Best Teams

Feel free to run this code on different files each time to obtain insightful results.

For a detailed exploration of the European Football League data analysis and to witness the thrilling SQL queries, check the respective scripts and SQL files in the repository. Experience the excitement of football data come alive through the power of PostgreSQL!

sql-football-match-analysis's People

Contributors

zehan-alam avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.