Git Product home page Git Product logo

oddsscraper's Introduction

Scraping odds for NBA, MLB, NHL - actnetscrape module

This python module can be use to compile odds from the major sports leagues from a public API. It can output the data to a dataframe for immediate analysis or send it to a MySQL database for storage via pymysql.

The API has a decent amount of historical data stored but I have not tested the limits of the historical cutoff dates. All the leagues atleast have data available going back 2020-2021 seasons, although some of the specific props are not as frequent.

The scraping utilizes Selenium and takes a few minutes to run through all the props/dates with the API when doing a small number of days. Full seasons or months will take a couple hours as to not overload the public facing server. It has only been tested using a Firefox driver. The page source data is also store in the scraper object if different extraction is required. The page source includes team and player ids.

A pipfile contains the required python packages.

Examples

** loadDb note: I use a table for player name and ID and a table for props that includes playerId the loadDb function is setup to accommodate this set up.

Initiating the class then option 1 or 2

import actNetScrape as ans

# assign class
odds = ans.actNetScraper()

Option one - single league scrape, db save optional

# league and date list
dates = ['2023-05-24', '2023-05-22', '2023-05-21', '2023-05-20']
league = 'nhl'

conn_details = <insert pymysql connection string>
browser_path = <insert browser file path>

# scrape the api
odds.scrape(league=league, 
            dates=dates, 
            selenium_browser_path= browser_path, 
            sleep_secs = 2
)

# process page_source to JSON to df
df = odds.processScrapes(leauge = league, 
                        dates = dates
)

# if desired, save to existing database
odds.loadDb(df_props=df, 
            pymysql_conn_str=conn_details, 
            oddsTableName=<INSERT ODDS TABLE NAME>, 
            dbAction='append',
            update_players=True, 
            playerTableName=<INSERT PLAYER TABLE NAME>
)

Option two - multiple league scrape AND save to database

# league and date list
leagues = ['mlb', 'nhl']
dates = ['2023-05-24', '2023-05-22', '2023-05-21', '2023-05-20']

# store dataframes
df_odds = []

# looping through the leagues and scraping the data
for league in leagues:
    conn_details = <pymysql connection string>

    # scrape the api
    odds.scrape(league=league, 
                dates=dates, 
                selenium_browser_path= browser_path, 
                sleep_secs = 2
    )

    # process page_source to JSON to df
    df = odds.processScrapes(league=league, 
                            dates=dates
    )
    df_odds.append(df)

    # store in database
    odds.loadDb(df_props=df, 
            pymysql_conn_str=conn_details, 
            oddsTableName=<INSERT ODDS TABLE NAME>, 
            dbAction='append',
            update_players=True, 
            playerTableName=<INSERT PLAYER TABLE NAME>
    )

SCRAPING CURRENT DATE WNBA ODDS fdscrape module

>>>>>>>>>>>>>>>>> FD STOPPED ME FROM USING THIS AFTER ABOUT A MONTH <<<<<<<<<<<<<<<<

This module will scrape odds for the WNBA that are currently posted on the website. There is no historical function and only the current odds can be retrieved.

Example

import fdScrape as fds

# assign class
odds = fds.fdScraper()

# selenium browser path
browser_path = <insert browser file path>

# call method to scrape site into dataframe
df = odds.scrapeToDf('wnba', browser_path)

#### IF LOADING DB IS DESIRED #####
conn_details = <insert pymysql connection string>

# call method to load df to database
# append or replace can be used. 
odds.loadDb(df, conn_details, "append")

oddsscraper's People

Contributors

iksnizez avatar

Stargazers

 avatar

Watchers

 avatar

oddsscraper's Issues

NHL propId duplicates

a lot of the NHL props have duplicate propIds. Probably need to go to a dual primary key for NHL. PlayerId + PropId?

actnet module - duplicate propIds

MLB also has duplicate propIds; not in the same date yet but duplicate in the 2023 season (6/6/23 and 6/12/23) 59020366.

Need to generate custom propIds for all sports

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.