Git Product home page Git Product logo

Comments (5)

laughingman7743 avatar laughingman7743 commented on May 24, 2024

It is good to support asyncio in Python 3.4 or later.
https://docs.python.org/3/library/asyncio.html
https://www.python.org/dev/peps/pep-0492/
https://github.com/aio-libs

from pyathena.

SamProtas avatar SamProtas commented on May 24, 2024

That would also be good. To be clear though, my original suggestion (with poor naming choice and description) would work with 2.7 and no other libraries required.

TL;DR; I want access to the query_id before we start polling for query results.

My understanding is cursor.execute() currently does:

  1. Create query in athena (returns a query_id)

  2. Poll the status of that query with the query_id

  3. When complete, returns the query's result in the right format

I'm proposing 2 new methods to the Cursor api.

  • cursor.create_query() (name TBD)

    • Does everything in the current cursor.execute() up to (but not including) cursor._poll()
  • cursor.collect_query_results() (name TBD)

    • Does everything in the current cursor.execute() starting from cursor._poll()

It might also make sense to give Cursor.__init__() another kwarg query_id for rebuilding a cursor to the state after cursor.create_query(). Additionally it would also be nice to implement cursor.cancel_query() in this library.

Here's a use case:

  • User hits a webserver endpoint with a query

  • Webserver calls cursor.create_query(), puts the query_id in some database/queue, returns 200 to the user

  • Some background process gets the query_id from the queue and periodically checks on the status of that query, updating some database with the results when ready

  • Alternatively, after the query execute starts but before it ends (it's a long athena query), the user changes their mind and indicates this query should be cancelled.

    • A different webserver endpoint is hit and query_id is fetched from the database

    • The query_id is used to cancel the athena query with the StopQueryExecution Athena API.

from pyathena.

laughingman7743 avatar laughingman7743 commented on May 24, 2024

If you support Python 2.7 it is good to use concurrent.futures.
https://docs.python.org/3/library/concurrent.futures.html
It is backported to Python 2.7.
https://github.com/agronholm/pythonfutures

Since the design of the current cursor class is stateful, I think it would be better to create a new stateless cursor class(asynchronous cursor).
e.g. https://github.com/PyMySQL/PyMySQL/blob/master/pymysql/cursors.py#L399

And it would be better to be able to select the cursor class to use for the connection object.
The asynchronous cursor class's execute method is implemented to return the future object and the query id.

from pyathena.

SamProtas avatar SamProtas commented on May 24, 2024

Ah okay I see what you're suggesting. My apologies. I didn't realize there was already a cancel method implemented and the query_id can be accessed in a thread-safe way. No changes required for my above use case.

Thanks!

from pyathena.

laughingman7743 avatar laughingman7743 commented on May 24, 2024

It does not comply to DB-API, but I tried implementing asynchronous cursor class.
Please check the following pull requests :)
/pull/21

TODO: tests

from pyathena.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.