Git Product home page Git Product logo

Comments (6)

laughingman7743 avatar laughingman7743 commented on June 4, 2024

If the query contains % characters it will be doubly escaped. 😭
Your modification seems to be good. 👍

from pyathena.

laughingman7743 avatar laughingman7743 commented on June 4, 2024

If you set the _double_percents option of IdentifierPreparer to False, an error will occur if the % character and the query parameter are contained.

import sqlalchemy
from urllib.parse import quote_plus
from sqlalchemy.engine import create_engine


def main():
    conn_str = 'awsathena+rest://:@athena.{region_name}.amazonaws.com:443/' \
               '{schema_name}?s3_staging_dir={s3_staging_dir}'
    engine = create_engine(conn_str.format(
        region_name='us-west-2',
        schema_name='default',
        s3_staging_dir=quote_plus('s3://BUCKET/path/to/')))
    session = engine.connect()

    query = sqlalchemy.sql.text("select date_parse('20191030', '%Y%m%d')")
    print(session.execute(query).fetchall())  # return [(datetime.datetime(2019, 10, 30, 0, 0),)]

    query = sqlalchemy.sql.text("select date_parse('20191030', '%Y%m%d'), :word")
    print(session.execute(query, word='cat').fetchall())  # raise exception


if __name__ == '__main__':
    main()
Traceback (most recent call last):
  File "/Users/laughingman7743/github/PyAthena/sqla_query_with_parameter.py", line 28, in <module>
    main()
  File "/Users/laughingman7743/github/PyAthena/sqla_query_with_parameter.py", line 24, in main
    print(session.execute(query, word='cat').fetchall())
  File "/Users/laughingman7743/.virtualenvs/PyAthena-lwXSqVhO/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 948, in execute
    return meth(self, multiparams, params)
  File "/Users/laughingman7743/.virtualenvs/PyAthena-lwXSqVhO/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/Users/laughingman7743/.virtualenvs/PyAthena-lwXSqVhO/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/Users/laughingman7743/.virtualenvs/PyAthena-lwXSqVhO/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/Users/laughingman7743/.virtualenvs/PyAthena-lwXSqVhO/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1416, in _handle_dbapi_exception
    util.reraise(*exc_info)
  File "/Users/laughingman7743/.virtualenvs/PyAthena-lwXSqVhO/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 249, in reraise
    raise value
  File "/Users/laughingman7743/.virtualenvs/PyAthena-lwXSqVhO/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/Users/laughingman7743/.virtualenvs/PyAthena-lwXSqVhO/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
    cursor.execute(statement, parameters)
  File "/Users/laughingman7743/github/PyAthena/pyathena/util.py", line 28, in _wrapper
    return wrapped(*args, **kwargs)
  File "/Users/laughingman7743/github/PyAthena/pyathena/cursor.py", line 38, in execute
    self._query_id = self._execute(operation, parameters)
  File "/Users/laughingman7743/github/PyAthena/pyathena/common.py", line 154, in _execute
    query = self._formatter.format(operation, parameters)
  File "/Users/laughingman7743/github/PyAthena/pyathena/formatter.py", line 112, in format
    return (operation % kwargs).strip() if kwargs else operation.strip()
ValueError: unsupported format character 'Y' (0x59) at index 32

If formatting is done regardless of the query parameters as follows, the _double_percents option seems to be OK with default True setting.
https://github.com/laughingman7743/PyAthena/blob/master/pyathena/formatter.py#L112

return (operation % kwargs).strip()

from pyathena.

mister-average avatar mister-average commented on June 4, 2024

Argh! Of course that was too easy to work!

My use case is that I want to be able to copy sql statements that work in the Athena console and paste them directly into a Jupyter notebook, where the %%sql magic can execute them against Athena unmodified. Users can't be bothered to reformat the text with double percents.

Could it be that the real problem is that the visit_textclause() function in the default SQLCompiler class from sqlalchemy/sql/compiler.py calls self.post_process_text(textclause.text) -- which doubles percents -- REGARDLESS of whether there are any bind parameters or not? This behavior can be overridden and corrected by your AthenaCompiler class in pyathena/sqlalchemy_athena.py.

With the following changes only (the previous change I suggested is not needed):

from sqlalchemy.sql.compiler import BIND_PARAMS_ESC, BIND_PARAMS

class AthenaCompiler(SQLCompiler):
    """PrestoCompiler

    https://github.com/dropbox/PyHive/blob/master/pyhive/sqlalchemy_presto.py"""
    def visit_char_length_func(self, fn, **kw):
        return 'length{0}'.format(self.function_argspec(fn, **kw))

    def visit_textclause(self, textclause, **kw):
        def do_bindparam(m):
            name = m.group(1)
            if name in textclause._bindparams:
                return self.process(textclause._bindparams[name], **kw)
            else:
                return self.bindparam_string(name, **kw)

        if not self.stack:
            self.isplaintext = True

        if len(textclause._bindparams) == 0:
            return textclause.text

        else:
            # un-escape any \:params
            return BIND_PARAMS_ESC.sub(
                lambda m: m.group(1),
                BIND_PARAMS.sub(
                    do_bindparam,
                    self.post_process_text(textclause.text))
            )

I now get this:

>>> import sqlalchemy
>>> engine = sqlalchemy.create_engine(....)
>>> session = engine.connect()
>>> session.execute(sqlalchemy.text("select date_parse('20191030', '%Y%m%d' )")).fetchall()
[(datetime.datetime(2019, 10, 30, 0, 0),)]
>>> session.execute(sqlalchemy.text("select date_parse('20191030', '%Y%m%d' ), 'cat'")).fetchall()
[(datetime.datetime(2019, 10, 30, 0, 0), 'cat')]
>>> session.execute(sqlalchemy.text("select date_parse('20191030', '%Y%m%d' ), :word"), {'word':"cat"}).fetchall()
[(datetime.datetime(2019, 10, 30, 0, 0), 'cat')]

Thank you for your time in looking at this.

from pyathena.

laughingman7743 avatar laughingman7743 commented on June 4, 2024

It is the following method to override.
https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/sql/compiler.py#L759

It seems just to add the following conditions.

if len(textclause._bindparams) == 0:
    return textclause.text

It feels good. 👍
I will add some test cases and check if there is no problem.

from pyathena.

laughingman7743 avatar laughingman7743 commented on June 4, 2024

I fixed it with the following pull request. And a few test cases are added.
#57

Please check it.

from pyathena.

mister-average avatar mister-average commented on June 4, 2024

Thank you so much! It works well for me!

from pyathena.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.