Comments (5)
I opened a new branch to work on this issue. This commit changes from using the native CSV operations of Python to using Pandas dataframes. In addition to being more performant, this also cleaned up my messy processing code.
Unfortunately, this change breaks some of the testing suite. I updated one of the variables in mp_api_response.py
to make one of the tests pass, but I'm still getting the following errors when running pytest:
============================================= test session starts ==============================================
platform linux -- Python 3.7.6, pytest-5.3.2, py-1.8.1, pluggy-0.13.1 --
collected 13 items
app/tests/test_endpoints.py::test_index PASSED [ 7%]
app/tests/test_errors.py::TestErrorHandlers::test_404 PASSED [ 15%]
app/tests/test_errors.py::TestErrorHandlers::test_mountain_project_api_exception PASSED [ 23%]
app/tests/test_errors.py::TestErrorHandlers::test_request_exception PASSED [ 30%]
app/tests/test_errors.py::TestErrorHandlers::test_database_exception FAILED [ 38%]
app/tests/test_mpv_helpers.py::TestDatabaseHelpers::test_connect PASSED [ 46%]
app/tests/test_mpv_helpers.py::TestDatabaseHelpers::test_failed_db_connection PASSED [ 53%]
app/tests/test_mpv_helpers.py::TestMountainProjectHandler::test_mp_api_user_data PASSED [ 61%]
app/tests/test_mpv_helpers.py::TestMountainProjectHandler::test_mp_api_user_ticks PASSED [ 69%]
app/tests/test_mpv_helpers.py::TestMountainProjectHandler::test_mp_dev_env_user_data PASSED [ 76%]
app/tests/test_mpv_helpers.py::TestMountainProjectHandler::test_mp_dev_env_parse_user_data PASSED [ 84%]
app/tests/test_mpv_helpers.py::TestMountainProjectHandler::test_mp_dev_env_ticks PASSED [ 92%]
app/tests/test_mpv_helpers.py::TestMountainProjectHandler::test_mp_dev_env_parse_ticks FAILED [100%]
=================================================== FAILURES ===================================================
__________________________________ TestErrorHandlers.test_database_exception ___________________________________
self = <app.tests.test_errors.TestErrorHandlers object at 0x7f70d9460250>, app = <Flask 'app'>
def test_database_exception(self, app: pytest.fixture) -> None:
"""
Confirms the correct response status and safe error message of DatabaseException. The test config settings
do not have database credentials, therefore an error will be raised when the attempting to process test user
data.
"""
client = app.test_client()
> response = client.post("/data", data={'test': 'yes'})
app/tests/test_errors.py:70:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../anaconda3/envs/mpv/lib/python3.7/site-packages/werkzeug/test.py:1039: in post
return self.open(*args, **kw)
../../../anaconda3/envs/mpv/lib/python3.7/site-packages/flask/testing.py:227: in open
follow_redirects=follow_redirects,
../../../anaconda3/envs/mpv/lib/python3.7/site-packages/werkzeug/test.py:993: in open
response = self.run_wsgi_app(environ.copy(), buffered=buffered)
../../../anaconda3/envs/mpv/lib/python3.7/site-packages/werkzeug/test.py:884: in run_wsgi_app
rv = run_wsgi_app(self.application, environ, buffered=buffered)
../../../anaconda3/envs/mpv/lib/python3.7/site-packages/werkzeug/test.py:1119: in run_wsgi_app
app_rv = app(environ, start_response)
../../../anaconda3/envs/mpv/lib/python3.7/site-packages/flask/app.py:2463: in __call__
return self.wsgi_app(environ, start_response)
../../../anaconda3/envs/mpv/lib/python3.7/site-packages/flask/app.py:2449: in wsgi_app
response = self.handle_exception(e)
../../../anaconda3/envs/mpv/lib/python3.7/site-packages/flask/app.py:1866: in handle_exception
reraise(exc_type, exc_value, tb)
../../../anaconda3/envs/mpv/lib/python3.7/site-packages/flask/_compat.py:39: in reraise
raise value
../../../anaconda3/envs/mpv/lib/python3.7/site-packages/flask/app.py:2446: in wsgi_app
response = self.full_dispatch_request()
../../../anaconda3/envs/mpv/lib/python3.7/site-packages/flask/app.py:1951: in full_dispatch_request
rv = self.handle_user_exception(e)
../../../anaconda3/envs/mpv/lib/python3.7/site-packages/flask/app.py:1820: in handle_user_exception
reraise(exc_type, exc_value, tb)
../../../anaconda3/envs/mpv/lib/python3.7/site-packages/flask/_compat.py:39: in reraise
raise value
../../../anaconda3/envs/mpv/lib/python3.7/site-packages/flask/app.py:1949: in full_dispatch_request
rv = self.dispatch_request()
../../../anaconda3/envs/mpv/lib/python3.7/site-packages/flask/app.py:1935: in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
app/__init__.py:64: in data
csv = api.parse_tick_list(dev_env=dev_env)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <app.helpers.mountain_project.MountainProjectHandler object at 0x7f70d94d4f50>, dev_env = True
def parse_tick_list(self, dev_env: bool = False) -> Dict:
"""Parse the request data into a Pandas dataframe to clean."""
if dev_env:
with open(_DEV_TEST_TICKS) as ticklist:
tick_list_file = list(csv.reader(ticklist, delimiter=','))
else:
try:
tick_list_file = self.api_data.get("tick_list").content.decode("utf-8")
except (AttributeError, UnicodeDecodeError) as e:
raise MPAPIException
columns = ["Date", "Route", "Pitches", "Style",
"Lead Style", "Route Type", "Length", "Rating Code"]
df = pd.read_csv(io.StringIO(tick_list_file),
usecols=columns, na_filter=False)
> return {"status": 0, "data": df.values.tolist()}
E UnboundLocalError: local variable 'df' referenced before assignment
app/helpers/mountain_project.py:54: UnboundLocalError
____________________________ TestMountainProjectHandler.test_mp_dev_env_parse_ticks ____________________________
self = <app.tests.test_mpv_helpers.TestMountainProjectHandler object at 0x7f70d914acd0>
def test_mp_dev_env_parse_ticks(self) -> None:
"""Ensure when dev_env=True that the processed test_ticks.csv file is the output of parse_tick_list()"""
> data = self.api_dev.parse_tick_list(dev_env=True)
app/tests/test_mpv_helpers.py:102:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <app.helpers.mountain_project.MountainProjectHandler object at 0x7f70d914aed0>, dev_env = True
def parse_tick_list(self, dev_env: bool = False) -> Dict:
"""Parse the request data into a Pandas dataframe to clean."""
if dev_env:
with open(_DEV_TEST_TICKS) as ticklist:
tick_list_file = list(csv.reader(ticklist, delimiter=','))
else:
try:
tick_list_file = self.api_data.get("tick_list").content.decode("utf-8")
except (AttributeError, UnicodeDecodeError) as e:
raise MPAPIException
columns = ["Date", "Route", "Pitches", "Style",
"Lead Style", "Route Type", "Length", "Rating Code"]
df = pd.read_csv(io.StringIO(tick_list_file),
usecols=columns, na_filter=False)
> return {"status": 0, "data": df.values.tolist()}
E UnboundLocalError: local variable 'df' referenced before assignment
app/helpers/mountain_project.py:54: UnboundLocalError
=============================================== warnings summary ===============================================
/home/zachtheclimber/anaconda3/envs/mpv/lib/python3.7/site-packages/flask_wtf/recaptcha/widgets.py:5
/home/zachtheclimber/anaconda3/envs/mpv/lib/python3.7/site-packages/flask_wtf/recaptcha/widgets.py:5: DeprecationWarning: The import 'werkzeug.url_encode' is deprecated and will be removed in Werkzeug 1.0. Use 'from werkzeug.urls import url_encode' instead.
from werkzeug import url_encode
-- Docs: https://docs.pytest.org/en/latest/warnings.html
=================================== 2 failed, 11 passed, 1 warning in 1.37s ====================================
I find the UnboundLocalError: local variable 'df' referenced before assignment
line to be odd, as this error doesn't throw when running MPV normally.
@benjpalmer : Any ideas on why this is happening?
from mpv.
The next slowest aspect of parse_user_data()
is tick_list_file = self.api_data.get("tick_list").content.decode("utf-8")
as evidenced by:
Get ticklist Took: 3.5762786865234375e-06
Decode Took: 5.98429274559021
Dataframe Took: 0.024965286254882812
I spent some time trying to decrease the decoding time, using several different methods (including io.BytesIO()
instead of `io.StringIO() in addition to seeing if the Requests module could give us the tick list content without encoding/dencoding). Everything seemed to produce similar times as the existing code.
I feel like multithreaded decoding (if such a thing exists) would provide better performance, but I'm unsure how to implement it, and preliminary Google searches came up empty.
If anyone has ideas/solutions, I'd be happy to hear them!
from mpv.
@zachtheclimber I think what is happening with this
I find the UnboundLocalError: local variable 'df' referenced before assignment line to be odd, as this error doesn't throw when running MPV normally.
I believe you are getting that error because you are always returning that dataframe object from the parse_tick_list()
and there is only a reference to that object when dev_env
is `False. I think you just have a small indentation error.
This might work for you:
def parse_tick_list(self, dev_env: bool = False) -> Dict:
"""Parse the request data into a Pandas dataframe to clean."""
if dev_env:
with open(_DEV_TEST_TICKS) as ticklist:
tick_list_file = list(csv.reader(ticklist, delimiter=','))
else:
try:
tick_list_file = self.api_data.get("tick_list").content.decode("utf-8")
except (AttributeError, UnicodeDecodeError) as e:
raise MPAPIException
columns = ["Date", "Route", "Pitches", "Style",
"Lead Style", "Route Type", "Length", "Rating Code"]
df = pd.read_csv(io.StringIO(tick_list_file),
usecols=columns, na_filter=False)
return {"status": 0, "data": df.values.tolist()}
Notice at the bottom of the function after the except
block, those remaining lines have been un-indented and now the dataframe object is being created no matter if your in a dev env or not.
I'm admittedly not very familiar with Pandas, but I would suspect you might also want to be careful and catch any errors that the last few lines might raise.
Sorry I haven't been contributing as much lately, I am very much still interested, just busy at moment. I am happy to help out however I can! Nice work on identifying all of the performance issues!!
from mpv.
@benjpalmer Awesome! Thank you so much. I wasn't thinking about how the df = ...
code wasn't running when in dev_env
. I had it stuck in my head it was something related to the testing, since the code worked, but I never tried it in dev mode.
I updated the dev_env
block to use a dataframe as well, just to make it consistent with the rest of the function, since pandas can open csv files directly. I also added in relevant error catching in the except
block.
All tests pass (after slightly modifying the input data, because of how the dataframe doesn't quote single digit ints) and MPV visibly functions as expected.
No worries on not having time to contribute. I just work on this when able as well. I've been applying for SWE jobs as well as still studying, so I've been busy too. I appreciate all the work you've put into this so far! Thank for pointing out how my changes should have error catching as well. That is something I need to get in the mindset of.
I'm gonna leave the improve-ticklist-funcs
branch open for awhile to work on the slow decoding, but if I (or we) can't figure anything out within the next few weeks, I might just merge it and open a new one to work on the issue.
from mpv.
Dug into this a bit more today.
It seems that while adding stream=True
has decreased the runtime of the _mp_generic_request()
function, it is still taking the same amount of time to download the file (the function just passes the mp_request
variable on before the file is completely downloaded, at least from how I understand it).
And I thought the decode was causing the load time, but it appears to actually still just be the HTTP request resolving. So back to square one. I'm going to close this issue and incorporate the previous improvements into master
Reopening issue 30 to address the file download speed.
from mpv.
Related Issues (20)
- Change how empty route heights are calculated HOT 1
- Max Redpoint / Onsight / Flash indicators HOT 1
- Convert YDS/V-Scale climbing grades to other common grading systems
- Implement test suite and configure testing environment HOT 3
- Type hinting and function annotations HOT 2
- Overall 'pythonic' refactoring HOT 2
- Add test coverage for helpers HOT 5
- Implement test for email regex HOT 1
- Improve Handling of Application Errors HOT 2
- Log errors to file HOT 1
- Obsolete README line? HOT 2
- Loading progress indicator not displaying after clicking "visualize" HOT 1
- Reduce redundant database operations HOT 2
- Speeding up CSV file download HOT 3
- Data template script tag generating a 404.
- Dockerize MPV HOT 1
- Broken redirect in proxied MPV Docker deployment HOT 1
- Broken tests in test_errors.py HOT 1
- Compare stats to that of a friend
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mpv.