Git Product home page Git Product logo

nba_scraper's Introduction

Header

ย ย 

Hi there ๐Ÿ‘‹

I'm a python software engineer thats focused on building systems in the fields of data science or quantitative analysis. Much of my public work and writing is focused on helping newcomers in tech learn the skills they want to learn. I have strong areas of expertise in python and SQL, but I can branch out into front end development using JS/React. I've been known to dabble in Java but luckily it was a passing phase.

If I've helped you or you find my packages useful, please feel free to send me a tip, or let me know on twitter @matthew_barlowe. I post infrequently at my blog but there are some tutorials there for beginners.

nba_scraper's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

nba_scraper's Issues

'HOME' KeyError on Windows

in running the scraper on Windows Command Prompt, I get the following error:

Traceback (most recent call last):
File "", line 1, in
File "C:\Users*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\nba_scraper\nba_scraper.py", line 40, in
def scrape_date_range(date_from, date_to, data_format='pandas', data_dir=f"{os.environ['HOME']}/nbadata.csv"):
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.1520.0_x64__qbz5n2kfra8p0\lib\os.py", line 679, in getitem
raise KeyError(key) from None
KeyError: 'HOME'

Windows doesn't have the HOME environment variable; it uses USERPROFILE instead.

Issues with 19/20 season, game id 0020200577

The scraper seems to be having issues with this current season. Specifically something to do with the game of id 0020200577.

Traceback (most recent call last):
  File "<pyshell#9>", line 1, in <module>
    ns.scrape_date_range('2019-10-22', '2020-02-10', data_format='csv')
  File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\nba_scraper\nba_scraper.py", line 78, in scrape_date_range
    scraped_games.append(sf.main_scrape(game))
  File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\nba_scraper\scrape_functions.py", line 688, in main_scrape
    game_df = scrape_pbp(v2_dict)
  File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\nba_scraper\scrape_functions.py", line 112, in scrape_pbp
    if pbp_v2_df.game_id.unique()[0] == "0020200577":

Thanks

`nba_scraper` hangs at scrapetime

The nba_scraper package hangs at scraping the first game passed to it and never returns any data. This is due to the nba api requiring extra headers in the api call now this will be corrected in the next version and git commit push today

get_date_games function not pulling games before game id 021800110

The get_date_games function in scrape_functions module is not pulling the early season game_ids for the 2019 season. I discovered this in writing integration tests for the function. I'm assigning this to me but if someone wants to jump on it I'd be happy to accept a pull request. @HarryShomer if you have any advice that would help but if you're busy don't worry I'll handle it. Will need all tests to pass in the test_integration.py file before merging into master

Refactor Code

Source code needs to be refactored to allow proper testing for continuous integration as the NBA API blacklists a lot of IP addresses those services run on.

Fix bugs in WNBA Scraper

WNBA Scraper works but not for all games need to fix bugs that keep it from working for all games

Add functionality for all NBA season on API

The NBA api goes back to 1999 season. The main issue with pulling those season in is that the data.nba.com API which has the xy locations for events only goes back 4 years. Work on removing all calls to that API for seasons older than 2016.

Unable to scrape game 0021600559

Probably similar as issue #7 where player did nothing in game so unable to pull name

Traceback (most recent call last):
  File "get_season.py", line 8, in <module>
    ns.scrape_game([season], data_format='csv', data_dir=f'~/nbafiles/{season}nbapbp.csv')
  File "/Users/MattBarlowe/.virtualenvs/dataenv/lib/python3.6/site-packages/nba_scraper/nba_scraper.py", line 98, in scrape_game
    scraped_games.append(sf.main_scrape(f"00{game}"))
  File "/Users/MattBarlowe/.virtualenvs/dataenv/lib/python3.6/site-packages/nba_scraper/scrape_functions.py", line 1135, in main_scrape
    game_df))
  File "/Users/MattBarlowe/.virtualenvs/dataenv/lib/python3.6/site-packages/nba_scraper/scrape_functions.py", line 998, in get_lineup
    ['player1_name'].unique()[0]) for x in home_lineups[0]]
IndexError: list index out of range

Getting a JSONDecodeError

I'm receiving a JSONDecodeError after game ID 0021800254 when scraping the 2019 season...

JSONDecodeError                           Traceback (most recent call last)
<ipython-input-5-fa487e61689b> in <module>
----> 1 nba_df = ns.scrape_season(2019)

/opt/anaconda3/lib/python3.8/site-packages/nba_scraper/nba_scraper.py in scrape_season(season, data_format, data_dir)
    189         else:
    190             print(f"Scraping game id: 00{game}")
--> 191             scraped_games.append(sf.main_scrape(f"00{game}"))
    192 
    193     if len(scraped_games) == 0:

/opt/anaconda3/lib/python3.8/site-packages/nba_scraper/scrape_functions.py in main_scrape(game_id)
    705         game_df = game_df[game_df["period"] < 5]
    706     for period in range(1, game_df["period"].max() + 1):
--> 707         lineups = get_lineup_api(game_id, period)
    708         periods.append(
    709             get_lineup(game_df[game_df["period"] == period].copy(), lineups, game_df,)

/opt/anaconda3/lib/python3.8/site-packages/nba_scraper/scrape_functions.py in get_lineup_api(game_id, period)
    373 
    374     lineups_req = requests.get(url, headers=USER_AGENT)
--> 375     lineup_req_dict = json.loads(lineups_req.text)
    376 
    377     return lineup_req_dict

/opt/anaconda3/lib/python3.8/json/__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant,     object_pairs_hook, **kw)
    355             parse_int is None and parse_float is None and
    356             parse_constant is None and object_pairs_hook is None and not kw):
--> 357         return _default_decoder.decode(s)
    358     if cls is None:
    359         cls = JSONDecoder

/opt/anaconda3/lib/python3.8/json/decoder.py in decode(self, s, _w)
    335 
    336         """
--> 337         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338         end = _w(s, end).end()
    339         if end != len(s):

/opt/anaconda3/lib/python3.8/json/decoder.py in raw_decode(self, s, idx)
    353             obj, end = self.scan_once(s, idx)
    354         except StopIteration as err:
--> 355             raise JSONDecodeError("Expecting value", s, err.value) from None
    356         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Error scraping game 21800053

Looks like for some reason a players not found in the dataframe or the lineups are returning an empty list this needs to be looked at.

>>> test = ns.scrape_game([21800053])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mbarlowe/code/python/nba_scraper/nba_scraper/nba_scraper.py", line 98, in scrape_game
    scraped_games.append(sf.main_scrape(f"00{game}"))
  File "/Users/mbarlowe/code/python/nba_scraper/nba_scraper/scrape_functions.py", line 1138, in main_scrape
    game_df))
  File "/Users/mbarlowe/code/python/nba_scraper/nba_scraper/scrape_functions.py", line 959, in get_lineup
    ['player1_name'].unique()[0]) for x in away_lineups[0]]
  File "/Users/mbarlowe/code/python/nba_scraper/nba_scraper/scrape_functions.py", line 959, in <listcomp>
    ['player1_name'].unique()[0]) for x in away_lineups[0]]
IndexError: index 0 is out of bounds for axis 0 with size 0

Issue scraping game 0029900026

Issue when scraping game 0029900026 due to insufficient players to fill out which players are on the court fix incoming. But is odd that the API for players played in the period would be less than five.
Traceback (most recent call last):
File "scrape_seasons.py", line 8, in
ns.scrape_game([game], data_format="csv", data_dir="seasons/19992000")
File "/Users/MattBarlowe/.virtualenvs/historical_scrape/lib/python3.6/site-packages/nba_scraper/nba_scraper.py", line 109, in scrape_game
scraped_games.append(sf.main_scrape(f"00{game}"))
File "/Users/MattBarlowe/.virtualenvs/historical_scrape/lib/python3.6/site-packages/nba_scraper/scrape_functions.py", line 646, in main_scrape
get_lineup(game_df[game_df["period"] == period].copy(), lineups, game_df,)
File "/Users/MattBarlowe/.virtualenvs/historical_scrape/lib/python3.6/site-packages/nba_scraper/scrape_functions.py", line 621, in get_lineup
period_df.iat[i, 75] = away_ids_names[4][0]
IndexError: list index out of range

Invalid literal in get_lineup

I've been pulling data on a daily basis, but today I seem to be getting this error from the get_lineup function:

nba_df = ns.scrape_season(2019)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-fa487e61689b> in <module>
----> 1 nba_df = ns.scrape_season(2019)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nba_scraper/nba_scraper.py in scrape_season(season, data_format, data_dir)
    132     for game in game_ids:
    133         print(f"Scraping game id: 00{game}")
--> 134         scraped_games.append(sf.main_scrape(f"00{game}"))
    135 
    136     nba_df = pd.concat(scraped_games)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nba_scraper/scrape_functions.py in main_scrape(game_id)
   1177         periods.append(get_lineup(game_df[game_df['period'] == period].copy(),
   1178                                   home_lineup_dict, away_lineup_dict,
-> 1179                                   game_df))
   1180     game_df = pd.concat(periods).reset_index(drop=True)
   1181 

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nba_scraper/scrape_functions.py in get_lineup(period_df, home_lineup_dict, away_lineup_dict, dataframe)
   1121             print('home_ids:', home_ids_names[0][1])
   1122             period_df.iat[i, 62] = home_ids_names[0][0]
-> 1123             period_df.iat[i, 61] = home_ids_names[0][1]
   1124             period_df.iat[i, 64] = home_ids_names[1][0]
   1125             period_df.iat[i, 63] = home_ids_names[1][1]

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
   2285         key = list(self._convert_key(key, is_setter=True))
   2286         key.append(value)
-> 2287         self.obj._set_value(*key, takeable=self._takeable)
   2288 
   2289 

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in _set_value(self, index, col, value, takeable)
   2809             if takeable is True:
   2810                 series = self._iget_item_cache(col)
-> 2811                 return series._set_value(index, value, takeable=True)
   2812 
   2813             series = self._get_item_cache(col)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/series.py in _set_value(self, label, value, takeable)
   1221         try:
   1222             if takeable:
-> 1223                 self._values[label] = value
   1224             else:
   1225                 self.index._engine.set_value(self._values, label, value)

ValueError: invalid literal for int() with base 10: 'Al Horford'

Has the data changed from NBA side? I've made a few changes on my end, but I don't think it's from my code additions. I'll try to push those changes when I feel they're 100% necessary and correct.

Playoff Games: List Index Out of Range

I have been trying to get the play by play for playoff games and running into the following error:

nba_df = ns.scrape_game([41700151])
Scraping game id: 0041700151
Traceback (most recent call last):

  File "<ipython-input-28-ffa52b1d949b>", line 1, in <module>
    nba_df = ns.scrape_game([41700151])

  File "/anaconda3/lib/python3.7/site-packages/nba_scraper/nba_scraper.py", line 33, in scrape_game
    scraped_games.append(sf.scrape_pbp(f"00{game}"))

  File "/anaconda3/lib/python3.7/site-packages/nba_scraper/scrape_functions.py", line 635, in scrape_pbp
    clean_df = get_lineups(clean_df)

  File "/anaconda3/lib/python3.7/site-packages/nba_scraper/scrape_functions.py", line 211, in get_lineups
    away_ids_names = [(x, dataframe[dataframe['player1_id'] == x]['player1_name'].unique()[0]) for x in away_lineups[0]]

IndexError: list index out of range

From some very initial checking, it looks like regular season game game id's start with a 2 while playoff games start with a 4 in the game id.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.