lalinsky / python-phoenixdb Goto Github PK

Phoenix database adapter for Python (migrated to the Apache Phoenix repo)

Home Page: https://issues.apache.org/jira/browse/PHOENIX-4636

License: Apache License 2.0

Python 99.18% Shell 0.82%

python-phoenixdb's Introduction

Phoenix database adapter for Python

phoenixdb is a Python library for accessing the Phoenix SQL database using the remote query server. The library implements the standard DB API 2.0 interface, which should be familiar to most Python programmers.

Installation

The easiest way to install the library is using pip:

pip install phoenixdb

You can also download the source code from GitHub, extract the archive and then install it manually:

cd /path/to/python-phoenix-x.y.z/
python setup.py install

Usage

The library implements the standard DB API 2.0 interface, so it can be used the same way you would use any other SQL database from Python, for example:

import phoenixdb
import phoenixdb.cursor

database_url = 'http://localhost:8765/'
conn = phoenixdb.connect(database_url, autocommit=True)

cursor = conn.cursor()
cursor.execute("CREATE TABLE users (id INTEGER PRIMARY KEY, username VARCHAR)")
cursor.execute("UPSERT INTO users VALUES (?, ?)", (1, 'admin'))
cursor.execute("SELECT * FROM users")
print(cursor.fetchall())

cursor = conn.cursor(cursor_factory=phoenixdb.cursor.DictCursor)
cursor.execute("SELECT * FROM users WHERE id=1")
print(cursor.fetchone()['USERNAME'])

Setting up a development environment

If you want to quickly try out the included examples, you can set up a local virtualenv with all the necessary requirements:

virtualenv e
source e/bin/activate
pip install -r requirements.txt
python setup.py develop

To create or update the Avatica protobuf classes, change the tag in gen-protobuf.sh and run the script.

If you need a Phoenix query server for experimenting, you can get one running quickly using Docker:

docker-compose up

Or if you need an older version of Phoenix:

PHOENIX_VERSION=4.9 docker-compose up

Interactive SQL shell

There is a Python-based interactive shell include in the examples folder, which can be used to connect to Phoenix and execute queries:

./examples/shell.py http://localhost:8765/
db=> CREATE TABLE test (id INTEGER PRIMARY KEY, name VARCHAR);
no rows affected (1.363 seconds)
db=> UPSERT INTO test (id, name) VALUES (1, 'Lukas');
1 row affected (0.004 seconds)
db=> SELECT * FROM test;
+------+-------+
|   ID | NAME  |
+======+=======+
|    1 | Lukas |
+------+-------+
1 row selected (0.019 seconds)

Running the test suite

The library comes with a test suite for testing Python DB API 2.0 compliance and various Phoenix-specific features. In order to run the test suite, you need a working Phoenix database and set the PHOENIXDB_TEST_DB_URL environment variable:

export PHOENIXDB_TEST_DB_URL='http://localhost:8765/'
nosetests

Commits to the master branch are automatically tested against all supported versions of Phoenix. You can see the results here.

Known issues

You can only use the library in autocommit mode. The native Java Phoenix library also implements batched upserts, which can be committed at once, but this is not exposed over the remote server. (CALCITE-767)
TIME and DATE columns in Phoenix are stored as full timestamps with a millisecond accuracy, but the remote protocol only exposes the time (hour/minute/second) or date (year/month/day) parts of the columns. (CALCITE-797, CALCITE-798)
TIMESTAMP columns in Phoenix are stored with a nanosecond accuracy, but the remote protocol truncates them to milliseconds. (CALCITE-796)
ARRAY columns are not supported. (CALCITE-1050, PHOENIX-2585)

python-phoenixdb's People

Contributors

Stargazers

Watchers

Forkers

pirionfr rtvt123 yugantar7 zhanglei738 acidburn0zzz ankitsinghal joshelser yimian bakea murata100 thinkclaves nuaa johanwasserman nciefeiniu rajeshkumark2696 rbeauchemin eslywadan

python-phoenixdb's Issues

set connection timeout

Hello,

Right now, I get the error, java.net.SocketTimeoutException: callTimeout=60000 when trying to read large data from Phoenix.

Is there a way to set the connection timeout for a session?
The API docs, https://python-phoenixdb.readthedocs.io/en/latest/api.html doesn't seem to have an option.

Apache Phoenix database adapter for Python authentication issue

I just started with Phoenix and found this adapter for python.

import phoenixdb
import phoenixdb.cursor

database_url = 'http://localhost:8765/'
conn = phoenixdb.connect(database_url, autocommit=True)

Mentioned above is the connection string to connect.
But this connection string doesn't have any authentication as userid or password?

I could not find any leads in their documentation page too : https://python-phoenixdb.readthedocs.io/en/latest/

My question is that does it only required db_url to connect or is there any way we can implement authentication??

phoenixdb can't connect hbase

hi, all!
I have a problem to solve.I encountered an error when I used phoenixdb.I have opened the queryserver service.
my code:
import phoenixdb
database_url = 'http://3.9.8.130:8765/'
conn = phoenixdb.connect(database_url, autocommit=True)
cursor = conn.cursor()

cursor.execute('''select * from SYSTEM."TABLE"''')
print cursor.fetchall()
the error:
phoenixdb.errors.InternalError: (u'RuntimeException: java.sql.SQLException: No suitable driver found for start -> SQLException: No suitable driver found for start', None, None, None)

when configured load-balance for PQS, python client get an error if connnection idle time exceed tcp keepalive time

Error Message on Python Shutdown

If you open a connection and then sys.exit() or ctrl+D, you will always get this message from connections not being able to be closed from the __del__ method in connection.py.

Exception ignored in: <function Connection.__del__ at 0x102eeb280>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/phoenixdb/connection.py", line 60, in __del__
  File "/usr/local/lib/python3.9/site-packages/phoenixdb/connection.py", line 114, in close
  File "/usr/local/lib/python3.9/site-packages/phoenixdb/avatica/client.py", line 397, in close_connection
  File "/usr/local/lib/python3.9/site-packages/phoenixdb/avatica/client.py", line 218, in _apply
  File "/usr/local/lib/python3.9/site-packages/phoenixdb/avatica/client.py", line 188, in _post_request
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 590, in post
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 532, in request
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 711, in merge_environment_settings
  File "/usr/local/lib/python3.9/site-packages/requests/utils.py", line 797, in get_environ_proxies
  File "/usr/local/lib/python3.9/site-packages/requests/utils.py", line 781, in should_bypass_proxies
  File "/usr/local/Cellar/[email protected]/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 2647, in proxy_bypass
  File "/usr/local/Cellar/[email protected]/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 2624, in proxy_bypass_macosx_sysconf
  File "/usr/local/Cellar/[email protected]/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 2566, in _proxy_bypass_macosx_sysconf
ImportError: sys.meta_path is None, Python is likely shutting down

python-phoenixdb version for phoenix4.13

Hello, I`m trying to get connect to phoenix server, phoenix version is 4.13(And spark(2.3.0,+pyspark) Hbase(1.2) are used, python version is 2.7.5). With the newest version(v.0.7) for python-phoenixdb, and I got "RPC request failed" error(However, server is working).

According to changelog, v.0.7 covers phoenix 4.8 to 4.11, but I`m wondering there is any compatibility solution for phoenix 4.13.

Thanks,

Follow are my code and entire error message.

import phoenixdb
import phoenixdb.cursor

database_url = 'http://localhost:12345/xxxx'
conn = phoenixdb.connect(database_url, autocommit=True)

---------------------------------------------------------------------------
InterfaceError                            Traceback (most recent call last)
<ipython-input-1-abddd0c12576> in <module>()
      3 
      4 database_url = 'http://localhost:12345:/xxxx'
----> 5 conn = phoenixdb.connect(database_url, autocommit=True)

/usr/lib/python2.7/site-packages/phoenixdb/__init__.pyc in connect(url, max_retries, **kwargs)
     65     client = AvaticaClient(url, max_retries=max_retries)
     66     client.connect()
---> 67     return Connection(client, **kwargs)

/usr/lib/python2.7/site-packages/phoenixdb/connection.pyc in __init__(self, client, cursor_factory, **kwargs)
     54             else:
     55                 self._filtered_args[k] = kwargs[k]
---> 56         self.open()
     57         self.set_session(**self._filtered_args)
     58 

/usr/lib/python2.7/site-packages/phoenixdb/connection.pyc in open(self)
     71         """Opens the connection."""
     72         self._id = str(uuid.uuid4())
---> 73         self._client.open_connection(self._id, info=self._connection_args)
     74 
     75     def close(self):

/usr/lib/python2.7/site-packages/phoenixdb/avatica.pyc in open_connection(self, connection_id, info)
    327                 request.info[k] = v
    328 
--> 329         response_data = self._apply(request)
    330         response = responses_pb2.OpenConnectionResponse()
    331         response.ParseFromString(response_data)

/usr/lib/python2.7/site-packages/phoenixdb/avatica.pyc in _apply(self, request_data, expected_response_type)
    205         headers = {'content-type': 'application/x-google-protobuf'}
    206 
--> 207         response = self._post_request(body, headers)
    208         response_body = response.read()
    209 

/usr/lib/python2.7/site-packages/phoenixdb/avatica.pyc in _post_request(self, body, headers)
    184                     retry_count -= 1
    185                     continue
--> 186                 raise errors.InterfaceError('RPC request failed', cause=e)
    187             else:
    188                 if response.status == httplib.SERVICE_UNAVAILABLE:

InterfaceError: ('RPC request failed', None, None, BadStatusLine("''",))

In avatica/client.py, response_body is of type bytes, but JettyErrorPageParser().feed(html) expects type string.

In avatica/client.py, line 509, we find:

response_body = response.read()

if response.status != httplib.OK:
    logger.debug("Received response\n%s", response_body)
    if b'<html>' in response_body:
        parse_error_page(response_body)
    else:
        # assume the response is in protobuf format
        parse_error_protobuf(response_body)
    raise errors.InterfaceError('RPC request returned invalid status code', response.status)

The problem is that the response_body is of type bytes while for the
function parse_error_page()'s call to parser.feed(html) expects a
string (parser being instantiated from a call to JettyErrorPageParser()).

Below a full stacktrace showing the problem. Locally, I modified the file
avatica.py (not sure why my dependency downloaded from pip is not
client.py, but the files are similar and the bug are in both files) and changed:

response_body = response.read().decode('utf-8')

...and it worked; I finally got the error message showing up. Not sure that
this simple fix would work with all uses of response_body. Further code
review and testing would be required. Also, here we make the assumption
(probably good) that the Phoenix server always return UTF-8 documents, but that
must be validated.

Traceback (most recent call last):
  File "./pythonscript", line 224, in mainwrapper
    main()
  File "./pythonscript", line 210, in main
    phoenix_cursor = phoenix_connect().cursor()
  File "./pythonscript", line 51, in phoenix_connect
    return phoenixdb.connect(phoenix_url, autocommit=True)
  File "/home/myuser/.local/lib/python3.6/site-packages/phoenixdb/__init__.py", line 67, in connect
    return Connection(client, **kwargs)
  File "/home/myuser/.local/lib/python3.6/site-packages/phoenixdb/connection.py", line 56, in __init__
    self.open()
  File "/home/myuser/.local/lib/python3.6/site-packages/phoenixdb/connection.py", line 73, in open
    self._client.open_connection(self._id, info=self._connection_args)
  File "/home/myuser/.local/lib/python3.6/site-packages/phoenixdb/avatica.py", line 329, in open_connection
    response_data = self._apply(request)
  File "/home/myuser/.local/lib/python3.6/site-packages/phoenixdb/avatica.py", line 212, in _apply
    if b'<html>' in response_body:
TypeError: 'in <string>' requires string as left operand, not bytes

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./pythonscript", line 229, in <module>
    mainwrapper()
  File "./pythonscript", line 227, in mainwrapper
    logger.error("Exception lancée:  %s", exception.message)
AttributeError: 'TypeError' object has no attribute 'message'
Exception ignored in: <bound method Connection.__del__ of <phoenixdb.connection.Connection object at 0x7fa90be2ce80>>
Traceback (most recent call last):
  File "/home/myuser/.local/lib/python3.6/site-packages/phoenixdb/connection.py", line 61, in __del__
    self.close()
  File "/home/myuser/.local/lib/python3.6/site-packages/phoenixdb/connection.py", line 89, in close
    self._client.close_connection(self._id)
  File "/home/myuser/.local/lib/python3.6/site-packages/phoenixdb/avatica.py", line 341, in close_connection
    self._apply(request)
  File "/home/myuser/.local/lib/python3.6/site-packages/phoenixdb/avatica.py", line 212, in _apply
    if b'<html>' in response_body:
TypeError: 'in <string>' requires string as left operand, not bytes

What is "got an empty frame, but the statement is not done yet" ?

Occasionally, what is the reason?

Python 3.6.0
phoenixdb 0.7
Hbase 1.2
Phoenix 4.14.1

Got `ResultSet is closed` error while iterating over cursor to fetch large result after 10 min

My Phoenix version is 4.14.0-cdh5.14.2.

I know that the default phoenix.query.timeoutMs is 10 minutes, so added the following configurations to the hbase-site.xml. But this didn't work.

property	value
`phoenix.query.timeoutMs`	3600000 (1 hour)
`hbase.client.scanner.timeout.period`	1200000 (20 minutes)
`hbase.rpc.timeout`	1200000 (20 minutes)
`hbase.client.scanner.timeout.period`	1200000 (20 minutes)

Is there any way to set the query timeout through the Python driver?

No records written, no error thrown

I am having strange issue. I have table primary key VARCHAR and some data TIMESTAMPS:

CREATE TABLE IF NOT EXISTS public.OBJECT_TRACKER (
                object_path      VARCHAR,
                ready_to_process TIMESTAMP, 
                processed        TIMESTAMP, 
                error            TIMESTAMP, 
                error_message    VARCHAR, 
                s3_last_modified TIMESTAMP 
                CONSTRAINT object_tracker_pk PRIMARY KEY(object_path)
                )

When using Phoenix client (sqlline.py), all works as expected, but using phoenixdb I have strange behaviors:

When calling "UPSERT INTO public.object_tracker(object_path) VALUES (?)" and pass some string, in database there only appears key with first character from long string. "evolution" will store "e" in HBase.
No error is thrown
Its even more tricky. When I try to insert VARCHAR (PK) and TIMESTAMP type via
statement = "UPSERT INTO public.object_tracker(object_path,ready_to_process) VALUES (?,?)"
and do:
cursor.execute(statement, (obj.object_full_path, datetime.now())) it didn't wrote anything into database....

And now for the end I did in phoenix cli "Delete from table" and tried via phoenixdb same upserts and now it works :-), not sure if scenario is reproducible and if it is issue in phoenix or elsewhere in hadoop infra.

Add support for HA failover

https://community.hortonworks.com/articles/9377/deploying-the-phoenix-query-server-in-production-e.html#comment-9387

Lukas Lalinsky · Jan 11, 2016 at 12:49 PM 0
I have a question about the stateless load balancing. In Avatica 1.5, there is an explicit request for opening a connection. I'm assuming the connection only exists in memory on one server. Even before 1.5, the connection had some parameters associated with it and they also only live on one server. How can it resume the state on another server?

https://calcite.apache.org/docs/avatica_protobuf_reference.html#openconnectionrequest

https://calcite.apache.org/docs/avatica_protobuf_reference.html#connectionsyncrequest

Josh Elser Lukas Lalinsky · Jan 11, 2016 at 02:42 PM 0
The big pieces needed to recover the state for a query automatically are the connection and statement missing in the server's memory (as you point out). In 1.5 (I believe), I added the ability to pass back specific context on the relevant RPC messages stating when this necessary state was missing in the server. This allows the client to send the necessary RPC to recreate the state and try to fetch the results again.

Josh Elser Lukas Lalinsky · Jan 11, 2016 at 02:43 PM 0
For example, take a look at the recently updated FetchResponse docs. The missingStatement attribute signifies that the Statement in the server's memory is missing. Likewise, the missingResults attribute signifies that the ResultSet in the server's memory is missing. It is up to the client to hold on to the necessary information and recover from these cases.

python file not connecting

support protobuf c++ implementation?

I pull a large number of data(70w+ rows)，found it pretty slow when it runs to
client.py:506 (1~2 second dealing with one bugket data in a fetch())

        response_data = self._apply(request)
        response = responses_pb2.FetchResponse()
        response.ParseFromString(response_data)

I guess pure python protobuf make it slow.
python protobuf c++ implement may help
would u share the .proto files so that I can make a test by myself?

Regards,
Rafer

Add array support

If I create table 'USER', and it's schema defined 'PERSONA'.

database_url = "http://master:8765/"
conn = phoenixdb.connect(database_url, autocommit=True)
cursor = conn.cursor()
cursor.execute("SELECT * FROM USER")

It has a error: Table undefined , tableName='USER'.
But When I use below query:

SELECT * FROM PERSON.USER

this will have a ERROR: UnsupportedOperationException: Currently not supported.

When I see the source code and not found anywhere I can specify the schema parameter.

How to solve this? Any idea?