This project delves into a comprehensive soccer database containing extensive data on matches, players, and teams from various European countries spanning the period 2008 to 2016. The dataset comprises seven tables (Country, League, Match, Player, Player Attribute, Team, and Team Attribute). The data was initially explored and then exported into CSV files using DB Browser for subsequent analysis.
-
Team Performance: Home Wins
- Identifying the team with the highest number of home wins.
-
Team Performance: Away Wins
- Determining the team with the highest number of away wins.
-
Performance Metrics Relationship
- Exploring the relationship between key performance metrics (Speed Buildup, Defence Pressure, Chance Creation) and winning percentages.
- Utilized Pandas library to read CSV files.
- Eliminated unnecessary columns to focus on relevant information.
- Removed missing values and handled duplicates.
- Created a 'result' column to categorize match outcomes.
- Visualized the distribution of home wins.
- Examined the team with the highest number of away wins.
- Analyzed the relationship between metrics (Buildup Speed, Defence Pressure, Chance Creation) and winning percentages.
- Generated scatter plots for visual representation.
- Utilized correlation coefficients to quantify relationships between variables.
- FC Barcelona emerged as a dominant team with the most away and home wins between 2008 and 2016.
- Notable findings include a lack of significant correlation between buildup speed or chance creation and winning percentages, while a slight correlation exists between defense pressure and winning percentage.