Comments (5)
It is good to support asyncio in Python 3.4 or later.
https://docs.python.org/3/library/asyncio.html
https://www.python.org/dev/peps/pep-0492/
https://github.com/aio-libs
from pyathena.
That would also be good. To be clear though, my original suggestion (with poor naming choice and description) would work with 2.7 and no other libraries required.
TL;DR; I want access to the query_id
before we start polling for query results.
My understanding is cursor.execute()
currently does:
-
Create query in athena (returns a
query_id
) -
Poll the status of that query with the
query_id
-
When complete, returns the query's result in the right format
I'm proposing 2 new methods to the Cursor
api.
-
cursor.create_query()
(name TBD)- Does everything in the current
cursor.execute()
up to (but not including)cursor._poll()
- Does everything in the current
-
cursor.collect_query_results()
(name TBD)- Does everything in the current
cursor.execute()
starting fromcursor._poll()
- Does everything in the current
It might also make sense to give Cursor.__init__()
another kwarg query_id
for rebuilding a cursor to the state after cursor.create_query()
. Additionally it would also be nice to implement cursor.cancel_query()
in this library.
Here's a use case:
-
User hits a webserver endpoint with a query
-
Webserver calls
cursor.create_query()
, puts the query_id in some database/queue, returns 200 to the user -
Some background process gets the query_id from the queue and periodically checks on the status of that query, updating some database with the results when ready
-
Alternatively, after the query execute starts but before it ends (it's a long athena query), the user changes their mind and indicates this query should be cancelled.
-
A different webserver endpoint is hit and query_id is fetched from the database
-
The
query_id
is used to cancel the athena query with theStopQueryExecution
Athena API.
-
from pyathena.
If you support Python 2.7 it is good to use concurrent.futures.
https://docs.python.org/3/library/concurrent.futures.html
It is backported to Python 2.7.
https://github.com/agronholm/pythonfutures
Since the design of the current cursor class is stateful, I think it would be better to create a new stateless cursor class(asynchronous cursor).
e.g. https://github.com/PyMySQL/PyMySQL/blob/master/pymysql/cursors.py#L399
And it would be better to be able to select the cursor class to use for the connection object.
The asynchronous cursor class's execute method is implemented to return the future object and the query id.
from pyathena.
Ah okay I see what you're suggesting. My apologies. I didn't realize there was already a cancel
method implemented and the query_id can be accessed in a thread-safe way. No changes required for my above use case.
Thanks!
from pyathena.
It does not comply to DB-API, but I tried implementing asynchronous cursor class.
Please check the following pull requests :)
/pull/21
TODO: tests
from pyathena.
Related Issues (20)
- Support for Python 3.12
- Implement all fsspec specs in the s3 file system HOT 4
- Mypy Error When using Connection.cursor method to instantiate cursor HOT 2
- Add custom filesystem object to arrow engine HOT 2
- Compatibility issue with SQLAlchemy<1.4 HOT 2
- `UUID` in a query gets garbled HOT 3
- Add support for Spark calculations HOT 8
- Add Endpoint_URL param to SQLAlchemy HOT 2
- SQLAlchemy dialect uses deprecated dbapi() method HOT 1
- Create documents in Sphinx and publish them on GitHub Pages HOT 1
- Breaking change in the release between 3.0.10 and 3.1.0 HOT 6
- Okta authentication support HOT 1
- Integer variant types incorrectly rendered in DDL HOT 5
- Support for Iceberg FOR SYSTEM_VERSION AS OF HOT 7
- AWS Error NETWORK_CONNECTION during HeadObject operation: curlCode: 28, Timeout was reached HOT 2
- FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas.
- Latest PyAthena no longer compatible with SQLAlchemy 1.4
- [Warning] SADeprecationWarning: The dbapi() classmethod on dialect classes has been renamed to import_dbapi(). HOT 1
- pyathena.error.DatabaseError: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:3242: mismatched input 'OFFSET'. Expecting: <EOF> HOT 2
- If a value for a partition key is None, to_sql doesn't warn you and no data is written
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyathena.