mcbarlowe / nba_scraper Goto Github PK
View Code? Open in Web Editor NEWA scraper to scrape the NBA API and compile a play by play file
License: GNU General Public License v3.0
A scraper to scrape the NBA API and compile a play by play file
License: GNU General Public License v3.0
I have been trying to get the play by play for playoff games and running into the following error:
nba_df = ns.scrape_game([41700151])
Scraping game id: 0041700151
Traceback (most recent call last):
File "<ipython-input-28-ffa52b1d949b>", line 1, in <module>
nba_df = ns.scrape_game([41700151])
File "/anaconda3/lib/python3.7/site-packages/nba_scraper/nba_scraper.py", line 33, in scrape_game
scraped_games.append(sf.scrape_pbp(f"00{game}"))
File "/anaconda3/lib/python3.7/site-packages/nba_scraper/scrape_functions.py", line 635, in scrape_pbp
clean_df = get_lineups(clean_df)
File "/anaconda3/lib/python3.7/site-packages/nba_scraper/scrape_functions.py", line 211, in get_lineups
away_ids_names = [(x, dataframe[dataframe['player1_id'] == x]['player1_name'].unique()[0]) for x in away_lineups[0]]
IndexError: list index out of range
From some very initial checking, it looks like regular season game game id's start with a 2 while playoff games start with a 4 in the game id.
I've been pulling data on a daily basis, but today I seem to be getting this error from the get_lineup
function:
nba_df = ns.scrape_season(2019)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-fa487e61689b> in <module>
----> 1 nba_df = ns.scrape_season(2019)
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nba_scraper/nba_scraper.py in scrape_season(season, data_format, data_dir)
132 for game in game_ids:
133 print(f"Scraping game id: 00{game}")
--> 134 scraped_games.append(sf.main_scrape(f"00{game}"))
135
136 nba_df = pd.concat(scraped_games)
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nba_scraper/scrape_functions.py in main_scrape(game_id)
1177 periods.append(get_lineup(game_df[game_df['period'] == period].copy(),
1178 home_lineup_dict, away_lineup_dict,
-> 1179 game_df))
1180 game_df = pd.concat(periods).reset_index(drop=True)
1181
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/nba_scraper/scrape_functions.py in get_lineup(period_df, home_lineup_dict, away_lineup_dict, dataframe)
1121 print('home_ids:', home_ids_names[0][1])
1122 period_df.iat[i, 62] = home_ids_names[0][0]
-> 1123 period_df.iat[i, 61] = home_ids_names[0][1]
1124 period_df.iat[i, 64] = home_ids_names[1][0]
1125 period_df.iat[i, 63] = home_ids_names[1][1]
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
2285 key = list(self._convert_key(key, is_setter=True))
2286 key.append(value)
-> 2287 self.obj._set_value(*key, takeable=self._takeable)
2288
2289
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in _set_value(self, index, col, value, takeable)
2809 if takeable is True:
2810 series = self._iget_item_cache(col)
-> 2811 return series._set_value(index, value, takeable=True)
2812
2813 series = self._get_item_cache(col)
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/series.py in _set_value(self, label, value, takeable)
1221 try:
1222 if takeable:
-> 1223 self._values[label] = value
1224 else:
1225 self.index._engine.set_value(self._values, label, value)
ValueError: invalid literal for int() with base 10: 'Al Horford'
Has the data changed from NBA side? I've made a few changes on my end, but I don't think it's from my code additions. I'll try to push those changes when I feel they're 100% necessary and correct.
Issue when scraping game 0029900026 due to insufficient players to fill out which players are on the court fix incoming. But is odd that the API for players played in the period would be less than five.
Traceback (most recent call last):
File "scrape_seasons.py", line 8, in
ns.scrape_game([game], data_format="csv", data_dir="seasons/19992000")
File "/Users/MattBarlowe/.virtualenvs/historical_scrape/lib/python3.6/site-packages/nba_scraper/nba_scraper.py", line 109, in scrape_game
scraped_games.append(sf.main_scrape(f"00{game}"))
File "/Users/MattBarlowe/.virtualenvs/historical_scrape/lib/python3.6/site-packages/nba_scraper/scrape_functions.py", line 646, in main_scrape
get_lineup(game_df[game_df["period"] == period].copy(), lineups, game_df,)
File "/Users/MattBarlowe/.virtualenvs/historical_scrape/lib/python3.6/site-packages/nba_scraper/scrape_functions.py", line 621, in get_lineup
period_df.iat[i, 75] = away_ids_names[4][0]
IndexError: list index out of range
Probably similar as issue #7 where player did nothing in game so unable to pull name
Traceback (most recent call last):
File "get_season.py", line 8, in <module>
ns.scrape_game([season], data_format='csv', data_dir=f'~/nbafiles/{season}nbapbp.csv')
File "/Users/MattBarlowe/.virtualenvs/dataenv/lib/python3.6/site-packages/nba_scraper/nba_scraper.py", line 98, in scrape_game
scraped_games.append(sf.main_scrape(f"00{game}"))
File "/Users/MattBarlowe/.virtualenvs/dataenv/lib/python3.6/site-packages/nba_scraper/scrape_functions.py", line 1135, in main_scrape
game_df))
File "/Users/MattBarlowe/.virtualenvs/dataenv/lib/python3.6/site-packages/nba_scraper/scrape_functions.py", line 998, in get_lineup
['player1_name'].unique()[0]) for x in home_lineups[0]]
IndexError: list index out of range
WNBA Scraper works but not for all games need to fix bugs that keep it from working for all games
I had an issue with the scraper for game 0022000278. I think it might be an issue with other games in Jan of this year as well.
Looks like for some reason a players not found in the dataframe or the lineups are returning an empty list this needs to be looked at.
>>> test = ns.scrape_game([21800053])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mbarlowe/code/python/nba_scraper/nba_scraper/nba_scraper.py", line 98, in scrape_game
scraped_games.append(sf.main_scrape(f"00{game}"))
File "/Users/mbarlowe/code/python/nba_scraper/nba_scraper/scrape_functions.py", line 1138, in main_scrape
game_df))
File "/Users/mbarlowe/code/python/nba_scraper/nba_scraper/scrape_functions.py", line 959, in get_lineup
['player1_name'].unique()[0]) for x in away_lineups[0]]
File "/Users/mbarlowe/code/python/nba_scraper/nba_scraper/scrape_functions.py", line 959, in <listcomp>
['player1_name'].unique()[0]) for x in away_lineups[0]]
IndexError: index 0 is out of bounds for axis 0 with size 0
I tried scarping 19-20 season, but the program got stuck after game 0021900005. Is it because my IP got blocked from the NBA Stats?
I'm receiving a JSONDecodeError after game ID 0021800254 when scraping the 2019 season...
JSONDecodeError Traceback (most recent call last)
<ipython-input-5-fa487e61689b> in <module>
----> 1 nba_df = ns.scrape_season(2019)
/opt/anaconda3/lib/python3.8/site-packages/nba_scraper/nba_scraper.py in scrape_season(season, data_format, data_dir)
189 else:
190 print(f"Scraping game id: 00{game}")
--> 191 scraped_games.append(sf.main_scrape(f"00{game}"))
192
193 if len(scraped_games) == 0:
/opt/anaconda3/lib/python3.8/site-packages/nba_scraper/scrape_functions.py in main_scrape(game_id)
705 game_df = game_df[game_df["period"] < 5]
706 for period in range(1, game_df["period"].max() + 1):
--> 707 lineups = get_lineup_api(game_id, period)
708 periods.append(
709 get_lineup(game_df[game_df["period"] == period].copy(), lineups, game_df,)
/opt/anaconda3/lib/python3.8/site-packages/nba_scraper/scrape_functions.py in get_lineup_api(game_id, period)
373
374 lineups_req = requests.get(url, headers=USER_AGENT)
--> 375 lineup_req_dict = json.loads(lineups_req.text)
376
377 return lineup_req_dict
/opt/anaconda3/lib/python3.8/json/__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
355 parse_int is None and parse_float is None and
356 parse_constant is None and object_pairs_hook is None and not kw):
--> 357 return _default_decoder.decode(s)
358 if cls is None:
359 cls = JSONDecoder
/opt/anaconda3/lib/python3.8/json/decoder.py in decode(self, s, _w)
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()
339 if end != len(s):
/opt/anaconda3/lib/python3.8/json/decoder.py in raw_decode(self, s, idx)
353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The get_date_games
function in scrape_functions
module is not pulling the early season game_ids for the 2019 season. I discovered this in writing integration tests for the function. I'm assigning this to me but if someone wants to jump on it I'd be happy to accept a pull request. @HarryShomer if you have any advice that would help but if you're busy don't worry I'll handle it. Will need all tests to pass in the test_integration.py file before merging into master
The nba_scraper
package hangs at scraping the first game passed to it and never returns any data. This is due to the nba api requiring extra headers in the api call now this will be corrected in the next version and git commit push today
The scraper seems to be having issues with this current season. Specifically something to do with the game of id 0020200577.
Traceback (most recent call last):
File "<pyshell#9>", line 1, in <module>
ns.scrape_date_range('2019-10-22', '2020-02-10', data_format='csv')
File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\nba_scraper\nba_scraper.py", line 78, in scrape_date_range
scraped_games.append(sf.main_scrape(game))
File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\nba_scraper\scrape_functions.py", line 688, in main_scrape
game_df = scrape_pbp(v2_dict)
File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\nba_scraper\scrape_functions.py", line 112, in scrape_pbp
if pbp_v2_df.game_id.unique()[0] == "0020200577":
Thanks
Write tests for WNBA scraper functions and stat calculations and setup CI for them
Source code needs to be refactored to allow proper testing for continuous integration as the NBA API blacklists a lot of IP addresses those services run on.
in running the scraper on Windows Command Prompt, I get the following error:
Traceback (most recent call last):
File "", line 1, in
File "C:\Users*****\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\nba_scraper\nba_scraper.py", line 40, in
def scrape_date_range(date_from, date_to, data_format='pandas', data_dir=f"{os.environ['HOME']}/nbadata.csv"):
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.1520.0_x64__qbz5n2kfra8p0\lib\os.py", line 679, in getitem
raise KeyError(key) from None
KeyError: 'HOME'
Windows doesn't have the HOME environment variable; it uses USERPROFILE instead.
The NBA api goes back to 1999 season. The main issue with pulling those season in is that the data.nba.com API which has the xy locations for events only goes back 4 years. Work on removing all calls to that API for seasons older than 2016.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.