Git Product home page Git Product logo

nfl-betting-data's Introduction

Regression Model to Predict Over/Under Outcomes for the Upcoming NFL Season

Goal

Develop a regression model that can recommend to bettors whether teams are likely to go over or under the set line

ETL

  • To train our regression model, we used a dataset with every NFL game since 1979, with features including betting lines, game outcomes, weather conditions, and more.
  • We needed to engineer a lot of new features to augment the data we started with. First, since our data did not include the results of any betting line (i.e. whether the home team pushed), we needed to add new columns to identify whether the over/under line was exceeded, pushed, or above the game's total score. To add more context to the difference in quality between teams, we went through the dataset and calculated each team's current record and point differential at the time of their game.
  • Since 16 game seasons are an awfully small sample size, a team's record can be misleading, so using the point differential we obtained, we calculated each team's Adjusted Pythagorean Expected Win Percentage at the time of each matchup, using Football Outsiders formula. This provided a more accurate look at the true strength of each team, using point differential to calculate what their expected win percentage should be.
  • Since total points scored in a game is less reliant on the relative strength of each team and more dependent solely on offensive and defensive performance, we calculated for each team their average points scored and allowed per game at the time of each matchup.
  • While we did not have injury data, which would have clued our model in on when a team's actual performance would be worse than its expected performance, we were able to calculate a rolling win percentage of each team's last four games, which provided a more accurate glimpse of how good or bad the team currently was, rather then their season-to-date performance.

Model and Feature Selection

To narrow down our features, we explored the correlations each had with the Over/Under line, as well as with each other so that we could limit multicollinearity. We ultimately decided on using 7 features: points per game, points allowed per game, temperature, wind, dome (binary 1 or 0), and season (to account for changes between eras).

Correlation heatmap

Regression for Points per Game

For our model, we tested out linear, log-linear, and log-log regression models, settling on a linear regression which fit out data best. Due to the odd distribution of NFL scores (most scoring plays are either 3 or 7 points), we used the BoxCox Power Transformation on each of our variables to transform them into a more normal distribution. Our final regression model had an Adjusted R^2 of 0.697.

Boxcox Transformation for Points per Game

Clustering the Data

In order to train a more accurate regression model, we experimented with clustering our data, classifying games as being one of four types: good offense vs. good defense, good offense vs. bad defense, bad offense vs. bad defense, and bad offense vs. good defense. Our hypothesis was that by clustering the games into these four categories and regressing on each individually, we'd be able to predict Over/Under lines with even higher accuracy. However, after testing this on 4 separate regression models, we found that the regression on each cluster was, in fact, less accurate than the overall regression model, since having a good offense far outweighed having a strong defense.

Data Clustered into 4 Game Types

Week One Predictions

After running our regressions, we plugged in the data from Week 1 of the 2018 season and predicted for each game what the Over/Under line should be. If our line was higher than the actual line, we recommended better the over, and vice versa for the under. Using Naive Bayes, we then calculated the probability of a game going over or under, given our predictions and the past history of games with the same line.

Week 1 Predictions

For Week 1, we correctly predicted 9 of 16 games. We also ran every prior game through our model to see how accurate it would be if we had bet on every single game since 1979 (Weeks 5-16). Our model gave good predictions 54.79% of the time, classified as when our model guided the bettor to a win or push.

Back-tested Results

nfl-betting-data's People

Contributors

mrethana avatar slieb74 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.