Git Product home page Git Product logo

sabermetric_networks's Introduction

MLB Pitcher-Batter Network Analysis

Project Overview

This project utilizes Statcast pitch data to perform a network analysis of MLB pitchers and batters. It leverages statistical measures such as Win Probability Added (WPA) and Run Expectancy (RE) to analyze player performance using a PageRank algorithm. The goal is to identify key players and understand the dynamics of player interactions throughout the season.

Features

  • Data Import: Fetch and preprocess Statcast event data.
  • Statistical Analysis: Calculate cumulative stats like WPA and RE for each player and team
  • Network Analysis: Construct a network graph where nodes represent players and edges represent performance metrics.
  • PageRank Calculation: Apply PageRank to assess the importance of players in the network based on their performance metrics.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Before running the project, ensure you have the following installed:

  • Python 3.8+
  • Pandas
  • Numpy
  • NetworkX
  • PyBaseball
  • Matplotlib
  • Pickle

Files Description

Payroll.csv

  • Manually sourced payroll for 2023 MlB season

Import.py

This script is responsible for fetching and preprocessing Statcast pitch data from the specified MLB season. Key operations include:

  • Data Fetching: Using the pybaseball library, it retrieves event data between specified dates.
  • Preprocessing: Filters out incomplete records, normalizes player names using unidecode, assigns players to teams, and prepares several datasets for further analysis. Additionally, all pitches which do not result in an outcome (thus strike and ball) are filtered out.
  • Gamescoring Goes through pitch data and assigns WPA and RE scores from each event to the corresponding players.
  • Data Export: Outputs a filtered event dataset to 2023_mlb_event_data.csv and performs initial calculations of Win Probability Added (WPA) and Run Expectancy (RE) for further use. Batter and Pitcher Csv's are created to store stats for the respective players and game restults csv is created to store game results.

Create_Graph.py

Builds a directed multigraph representing the interactions between pitchers and batters:

  • Data Loading: Reads the preprocessed CSV files to retrieve player statistics.
  • Graph Construction: Utilizes NetworkX to create nodes for each player (pitchers and batters) and edges that represent game events. There are a maximum of two edges between each pitcher and batter, one for RE and one for WPA. These edges represent the cummulative stats for all the matchups between the respective players and the edge is directed towards the player who has the postive RE or WPA.
  • Node and Edge Attributes: Each node stores player stats, and edges are weighted by performance metrics (WPA, RE, scores from events).
  • Graph Serialization: Saves the graph to a file player_network.pickle for use in subsequent analysis.

PageRank.py

Applies the PageRank algorithm to the network to identify influential players:

  • Graph Loading: Loads the network graph from the pickle file.
  • PageRank Calculation: Computes PageRank scores for each player using their on-field interactions and statistics like WPA and RE.
  • Normalization: Adjusts the PageRank scores and recalculates them to ensure consistency across different performance metrics.
  • Results Saving: Updates the graph with new PageRank scores and saves the updated graph back to a pickle file.

Network_Visualization.py

Handles the visualization of the network graph created and analyzed in previous steps:

  • Graph Loading: Retrieves the graph from the pickle file.
  • Visualization Setup: Configures node sizes and colors based on player roles and performance metrics.
  • Drawing the Graph: Uses matplotlib to draw nodes, edges, and labels.
  • Display Output: Renders the graph visually on screen, helping to interpret the complex relationships and player impacts visually.

sabermetric_networks's People

Contributors

matta-kelly avatar

Watchers

 avatar

sabermetric_networks's Issues

Payroll Implementation

Currently requires manual creation of a payroll.csv --> would be best to find a way to reliably and automatically gather aav for player

Processing Names: Payroll

Names with accents are not matched with the payroll csv when imported and processing the data and thus automatically set to the league minimum. Unsure if its a issue with the code or the csv.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.