Git Product home page Git Product logo

smt-data-challenge's Introduction

Vacuums and Stone Hands: An Exploration of First Baseman Receiving

See full paper here

Abstract

In an era where the game is increasingly quantified, the receiving aspect of first baseman defense has been largely ignored. While particularly skilled (or lacking) play contributes to a fielder’s reputation, even advanced defensive metrics fail to explicitly consider the role of the first baseman in the assist. In this analysis, I describe what makes an out at first from the perspective of the first baseman. I find that bounced and offline throws are less likely to be outs, and that some players seem to be better at fielding these than others, lending credence to the conventional wisdom that some possess this skill. While the data are insufficient to compute full player-level first baseman receiving rankings, I demonstrate a viable framework for its evaluation. Finally, I discuss the role that first baseman receiving can take in player development and valuation, and suggest future extensions of this approach.

An Overview of this Repo

Foundational Files

The files that power all analysis are:

  • src/game.py, contains a class for a Game. Game does all of the preprocessing and imputation
  • src/plotting.py, contains a class for a Baseball_Field. This does all of the plotting and gif making.
  • src/utils.py, contains assorted functions used in both classes and throughout the analysis.

Analysis Files

Final

Explorations

Outputs

Media From Paper

Aligning Ball and Player Data

a. A misaligned play

  • A misaligned play (Play ID 124, from Game 1903_01_TeamNE_TeamA2), notice the second basemen is not near the ball or bag when the turn is made.

b. A realigned play

  • The same play with the ball and player timestamps aligned as described in the section 3.1 of the paper

Example Scenarios

a. A likely out that was converted for an out

  • A likely out that was converted for out. These plays are considered routine. (Play ID 150, from Game 1902_26_TeamMH_TeamA3)

b. A unlikely out that was not converted for an out

  • An unlikely out that was not converted for an out. These plays are long shots. (Play ID 132, from Game 1903_32_TeamNB_TeamA1)

c. A unlikely out that was converted for an out

  • An unlikely out that was converted for an out. This is a great play by the first baseman. (Play ID 106, from Game 1903_30_TeamNF_TeamA2)

d. A likely out that was not converted for an out

  • A likely out that was not converted. (Play ID 193, from Game 1903_05_TeamND_TeamA2) These plays are hard to interpret without additional context. It could be a blunder on the first basemen, or a throw bad enough that preventing the throw from getting away is more important than the out.

Preventing a bad play from getting worse

a.

b. A bad throw pulls a player off the bag

  • A similarly bad throw pulls the first baseman off the bag in Play ID 25 of Game 1902_05_TeamML_TeamB. While there is value in stopping a bad play from getting worse, the data anonymization hampered my ability to explore that in this analysis.

smt-data-challenge's People

Contributors

nicholson2208 avatar

Stargazers

 avatar

Watchers

 avatar

smt-data-challenge's Issues

Fill in outs with rules and recursion

My inspiration is the famous recursive sudoku solver. Here is some pseudo code

solve_outs_sequence(df, seq):
     if no more cells, return True

     for this_play_outs in range(3):
        if valid_assignment
             fill in seq here

             if solve_outs_sequence(df, seq), return True

             # this means you messed up
             remove those things you tried to add in seq

    return False

Add more angles into the data, refine distance measures

Should come up with a way to say whether distances/angles are positive of negative

  • like for batter_dist, if could be that they overran the bag!

Compute how far "offline" a throw ends up, so I can map this to the same perspective

New data fields that might be useful to me as I move forward

What I notice isn't there that might be nice to have:

  • the at_bat field is not there for older games
  • I don't think I have any outs or count
    • Not sure I can get counts
  • Players speed at a given time
  • Player orientation?
    • I think I could noisily infer this
  • I don't think that "ball bounce" is in the older data -- how is this defined?
    • I could maybe fill this one in if z_coord is < 0.5 or something arbitrarily small?

Fielders choice outs at second are not working right now

game_info_1903_19.loc[(game_info_1903_19["inning"] == 7) & (game_info_1903_19["top_bottom_inning"] == "Top")]

there should be an out at 209

game_str home_team away_team at_bat play_per_game inning top_bottom_inning pitcher catcher first_base second_base third_base shortstop left_field center_field right_field batter first_baserunner second_baserunner third_baserunner n_br prev_outs this_play_outs trust_this_play Batter Runner 1st Runner 2nd Runner 3rd player_pos_and_info_agree valid_half trust_this_half
1903_19_TeamNL_TeamB TeamB TeamNL 55 208 7 Top 1117 1279 2488 2804 1080 2235 1124 1144 2029 3283 9785 0 0 1 0 0 1 1.0 1.0 0.0 0.0 1 1 1
1903_19_TeamNL_TeamB TeamB TeamNL 55 209 7 Top 1117 1279 2488 2804 1080 2235 1124 1144 2029 3283 9785 0 0 1 0 0 1 1.0 1.0 0.0 0.0 1 1 1
1903_19_TeamNL_TeamB TeamB TeamNL 56 210 7 Top 1117 1279 2488 2804 1080 2235 1124 1144 2029 3236 3283 0 0 1 0 0 1 1.0 1.0 0.0 0.0 1 1 1
image

I need to collect several type of play for this analysis -- successful throws to first, unsuccessful throws to first that are caught, unsuccessful throws to first that are not caught

maybe also add the "trust this half" filter on all of these!
bf_1902_24 is a good one though

Successful looks like

Still need to filter though, see #10 and #6

g_1902_24_events = bf_1902_24.game_obj.game_events_df.copy()

# I should use that function I wrote here!

g_1902_24_events.loc[(g_1902_24_events["event"] == "ball acquired") & (g_1902_24_events["player_position"] == 3) &\
                     (g_1902_24_events["prev_event"] != "pickoff throw") &\
                     g_1902_24_events["play_per_game"].isin(game_info_1902_24.loc[game_info_1902_24["this_play_outs"] >= 1, "play_per_game"].values)
                     ,
                     :
                    ]

Unsuccessful and caught looks like

g_1902_24_events = bf_1902_24.game_obj.game_events_df.copy()

# I should use that function I wrote here!

g_1902_24_events.loc[(g_1902_24_events["event"] == "ball acquired") & (g_1902_24_events["player_position"] == 3) &\
                     (g_1902_24_events["prev_event"] != "pickoff throw") &\
                     g_1902_24_events["play_per_game"].isin(game_info_1902_24.loc[game_info_1902_24["this_play_outs"] == 0, "play_per_game"].values)
                     ,
                     :
                    ]

I am not sure what Unsuccessful and not caught would look like??

  • Maybe look at the angle of the throw?

Make a function to snap events together, or make the plotting more object oriented?

Right now, there are some occurrences where not everything happens at either the same cadence, or the same frequency. I imagine this will make analysis hard because I will have to do some fuzzy joins or something.

make a modules that lines up the data, but taking one series as ground truth (probably player pos?) and "snapping" other event to the nearest time stamp? Or maybe line up the data left joins into a single table, so I don't have to do deal with these conditional and this weird strobe effect that comes from the catcher being on the different phase as everyone else.

field

In 1903_30, play 198 looks just like a pitch catch but the third br disappears?

game_str home_team away_team at_bat play_per_game inning top_bottom_inning pitcher catcher first_base second_base third_base shortstop left_field center_field right_field batter first_baserunner second_baserunner third_baserunner prev_outs this_play_outs n_br
1903_30_TeamNB_TeamA1 TeamA1 TeamNB 59 196 6 Bottom 7819 3718 6789 6189 3363 8079 8638 5291 3291 2386 1945 1185 0 0.0 0.0 2
1903_30_TeamNB_TeamA1 TeamA1 TeamNB 59 197 6 Bottom 7819 3718 6789 6189 3363 8079 8638 5291 3291 2386 1945 1185 0 0.0 0.0 2
1903_30_TeamNB_TeamA1 TeamA1 TeamNB 59 198 6 Bottom 7819 3718 6789 6189 3363 8079 8638 5291 3291 2386 1945 1185 0 0.0 1.0 2
1903_30_TeamNB_TeamA1 TeamA1 TeamNB 61 199 6 Bottom 7819 3718 6789 6189 3363 8079 8638 5291 3291 1286 2386 1945 0 1.0 0.0 2
bf_1903_30.clear_plot()

play_id = bf_1903_30.game_obj.get_pid_from_ppg(198)

bf_1903_30.plot_all_components(play_id=play_id)
bf_1903_30.fig
image

Fill in missing base runners with player_pos data

Another data issue in 1902_24_TeamMA_TeamA1

  • there are sometimes no BR listed in the game_info, even though there is definitely a person on base
    • examples:
      • 383 had someone get on but they aren't recorded
      • 386 shows a person on third in the data, but they aren't there in game info

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.