exasol / pyexasol Goto Github PK

Exasol Python driver with low overhead, fast HTTP transport and compression

License: MIT License

Python 100.00%

exasol database driver websocket-client python data-science exasol-integration

pyexasol's Introduction

Pyexasol

PyEXASOL is the officially supported Python connector for Exasol. It helps to handle massive volumes of data commonly associated with this DBMS.

You may expect significant performance improvement over ODBC in a single process scenario involving pandas.

PyEXASOL provides API to read & write multiple data streams in parallel using separate processes, which is necessary to fully utilize hardware and achieve linear scalability. With PyEXASOL you are no longer limited to a single CPU core.

Quick links

PyEXASOL main concepts

Based on WebSocket protocol;
Optimized for minimum overhead;
Easy integration with pandas via HTTP transport;
Compression to reduce network bottleneck;

System requirements

Exasol >= 7.1
Python >= 3.8

Getting started

Install PyEXASOL:

pip install pyexasol[pandas]

Run basic query:

import pyexasol

C = pyexasol.connect(dsn='<host:port>', user='sys', password='exasol')

stmt = C.execute("SELECT * FROM EXA_ALL_USERS")

for row in stmt:
    print(row)

Load data into pandas.DataFrame:

import pyexasol

C = pyexasol.connect(dsn='<host:port>', user='sys', password='exasol', compression=True)

df = C.export_to_pandas("SELECT * FROM EXA_ALL_USERS")
print(df.head())

You may set up local config to store your personal Exasol credentials and connection options:

import pyexasol

C = pyexasol.connect_local_config('my_config')

stmt = C.execute("SELECT CURRENT_TIMESTAMP")
print(stmt.fetchone())

Connect to Exasol SAAS using OpenID token for authentication:

import pyexasol

C = pyexasol.connect(dsn='<host:port>', user='sys', refresh_token='<token>')

stmt = C.execute("SELECT * FROM EXA_ALL_USERS")

for row in stmt:
    print(row)

Created by

Vitaly Markov, 2018 — 2022

Enjoy!

Maintained by

Exasol 2023 — Today

pyexasol's People

Contributors

Stargazers

Watchers

pyexasol's Issues

Ignore duplicates on import_from_iterable

Hello,

is there a way to ignore/not insert entries with duplicate primary keys when using import_from_iterable?

Best regards
Niklaus

Import from iterable swallows ETL exception

I can not exactly pinpoint where the exception is happening.
When importing form an iterable and a string truncation occurs (ETL-3003) or an integer is out of range (ETL-3050) pyexasol is not throwing the corresponding exception.

Instead the user is notified that the server closed the connection (Broken Pipe Error) and the program terminates. I suspect an unhandled exception somewhere in the connection.py module?

import_from_pandas Couldn't connect to server error

I'm trying to import a Pandas df to a table on a Windows computer:

conn.import_from_pandas(
        df,
        ("MY_SCHEMA", "MY_TABLE"),
        import_params={
            "column_separator": ",",
            "line_separator": "\n",

        }

But for some reason this doesn't work.
The message of the ExaQueryError (ip and sessionid replaced) is:

ETL-5105: Following error occured while reading data from external connection [http://(myip):42100/000.csv failed after 0 bytes. [Failed to connect to (myip) port 42100: Connection refused],[7],[Couldn't connect to server]] (Session: (sessionid))

Could this be a bug in pyexasol?

DELETE and UPDATE Statements throw an error

Hey,

Is there a way to execute Delete and Update Statements?
Executing a query like: "UPDATE SCHEMA.TABLE SET SUBS= 1 WHERE USER_ID = 2"
throws the error: unexpected UPDATE.

Thanks in advance,
Luca

How to use pyexasol behind a proxy

Is it possible to use pyexasol behind a proxy?
If it is, how do you pass the proxy to pyexasol.connect?

SSL connection with certificate verification

Hi @wildraid,

We had a look into SSL certificate verification with the websocket api, and we saw that pyexasol uses ssl.CERT_NONE in case of encryption with no way to change it.

https://github.com/badoo/pyexasol/blob/fbb1949f0ff613cae709b3f8d9e0d0d599fedc85/pyexasol/connection.py#L676

We tested if certificate verification works in general with websockets and it seems to work. To test it, we created our own private key and certificates (Server certificate testExasolChain.pem which was signed by the Root CA testRootCA.pem) and specified them as follows.

In the Exasol Database we need to add the following parameter:

-tlsPrivateKeyPath=/tmp/certs/testExasol.key -tlsCertificatePath=/tmp/certs/testExasolChain.pem

Websocket client:

ws = websocket.create_connection(host, sslopt={"cert_reqs": ssl.CERT_REQUIRED, "ca_certs": "/tmp/certs/testRootCA.pem"})

Can we add an option to the connection which allows the user to specify a RootCA?

Insert statement for meta data ok?

I understand that when working with lots of data it makes sense to not use single insert statements but import_/export_ statements.
But is it ok to execute the insert statement for managing master data with pyexasol sind import_from_iterable doesn't seem to be flexible enough?
Or should a different library like pyodbc be used in this case?

Query text before execution

We can get the query text after the query has executed.

q = "select name, val from {sn!i}.{tn!i}"
p = { "sn": "ETL", "tn": "METADATA" }
c.execute(q, p).fetchall()
c.last_statement().query

Is there a way to get the query text substituted with the parameter values before the query execution?

Import Parameters

I am trying to use the import_from_iterable functionality but getting an error for the timestamp format. Is there a way to specify the timestamp format similar when using import_from_iterable function? I tried searching through the docs but didn't find any example for that.

Performance of import_from_pandas for small data frames

Hello @wildraid,

I got a report, that the function import_from_pandas for small data frames (less than 100 rows) is compared with larger data frames quite slow.

I wrote a small test script and measured the performance of different data frame sizes and for very small data frames the overhead is quite large. You can find the script and results here

In the specific use case the data frame size is given by external factors and can't be changed and might fluctuate between very small and very large. So the question is, if we might want to optimize the import_from_pandas function for different sizes. For large data frames the import via CSV is the best way, for smaller data frames insert statements or prepared statements might offer a better performance. I checked the documentation of the websocket-api of the ExasolDB and it provides an API for PreparedStatements were you add the tuples to JSON of the requests.

What do you think about this? Do you have other ideas?

Thanks,
tkilias

Undetermined behavior of SCRIPT_OUTPUT

The following Python3 UDF script looks for the shortest string in a given input set.

CREATE OR REPLACE PYTHON3 SET SCRIPT shortest_string(input_string VARCHAR(2000000)) RETURNS VARCHAR(2000000) AS 
def run(ctx):
 short = ""
 l = 2e6
 
 print('outer')
 while True:
  _l = len(ctx.input_string)
  print('inner')
  if _l < l:
        l = _l
        short = ctx.input_string
        print('if_inner')
  
  if not ctx.next(): break
 return short
/

When pyexasols script_output is listening it prints:

1 * 'outer'
x * 'inner'
y < x * 'if_inner'

Where x depends on the number of rows in the set as it should be.

However if "print('inner')" is commented out nothing is printed at all to the listener. Neither 'outer' nor 'if_inner'. I can't explain this behaviour.

How to handle identity/ autoincrement columns when writing to Exasol DB?

Hi guys,
stackoverflow could help me with the problem so I'm hopping you can give me an advice.

I have an Exasol table with an autoincrement primary key and want to fill this table using import_from_pandas from pyexasol.

By importing to exasol table I have to leave the RKEY out since Exasol auto increments it. But leaving rkey empty or not creating the rkey column in pandas at all doesn’t work for me.

I went through pandas.to_csv and it doesn’t seem to have an option to leave out columns when writing to csv.
A minimal exampe is below.

import pandas as pd
import pyexasol

score = pd.DataFrame(columns = ['rkey','score'])
score['score'] = list(range(10))
conn = pyexasol.connect(dsn=dsn,user=user, password=pw,compression=True)
conn.import_from_pandas(score,(schema, table))

export to pandas with chunksize

I found an issue when passing chunksize as callback parameter to export_to_pandas

import pyexasol
import pandas

Output of pandas.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.10.0-42-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: 3.7.1
pip: 19.0.1
setuptools: 39.0.1
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.2.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

conn = pyexasol.connect_local_config('my_exasol')
# create dummy data on DB
conn.open_schema('abt_ds')
conn.execute("delete from test")
conn.import_from_file('dummy.csv', 'TEST')

Exporting to pandas without chunking works as expected:

query = "select * from test order by id"
df = conn.export_to_pandas(query)
print(df)

   ID  START_DATE
0   1  2018-11-19
1   2  2018-01-10
2   3  2017-07-26
3   4  2018-08-26

With chunking, pandas returns a TextFileReader (as expected)

cb_params = {'chunksize': 2}
df_chunks = conn.export_to_pandas(query, callback_params=cb_params)
type(df_chunks)

pandas.io.parsers.TextFileReader

However, iterating results in an error:

for chunk in df_chunks:
    print(chunk)

Output

   ID  START_DATE
0   1  2018-11-19
1   2  2018-01-10
   ID  START_DATE
2   3  2017-07-26
3   4  2018-08-26
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-38-c7f26f9d1458> in <module>()
----> 1 for chunk in df_chunks:
      2     print(chunk)

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __next__(self)
   1040     def __next__(self):
   1041         try:
-> 1042             return self.get_chunk()
   1043         except StopIteration:
   1044             self.close()

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in get_chunk(self, size)
   1104                 raise StopIteration
   1105             size = min(size, self.nrows - self._currow)
-> 1106         return self.read(nrows=size)
   1107 
   1108 

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in read(self, nrows)
   1067                 raise ValueError('skipfooter not supported for iteration')
   1068 
-> 1069         ret = self._engine.read(nrows)
   1070 
   1071         if self.options.get('as_recarray'):

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in read(self, nrows)
   1837     def read(self, nrows=None):
   1838         try:
-> 1839             data = self._reader.read(nrows)
   1840         except StopIteration:
   1841             if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()

ValueError: I/O operation on closed file

My impression is that under the hood pandas.read_csv receives an open pipe, however this pipe is closed before the output of pandas.read_csv is returned to the caller of ExaConnection.export_to_pandas.

Actually, you can reproduce this type of error with pandas alone:

# good
with open('dummy2.csv') as file:
    my_df = pandas.read_csv(file, chunksize=2)
    for chunk in my_df:
        print(chunk)

   ID        DATE
0   1  2018-11-19
1   2  2018-01-10
   ID        DATE
2   3  2017-07-26
3   4  2018-08-26

# bad
with open('dummy2.csv') as file:
    my_df = pandas.read_csv(file, chunksize=2)
for chunk in my_df:
    print(chunk)

Output

   ID        DATE
0   1  2018-11-19
1   2  2018-01-10
   ID        DATE
2   3  2017-07-26
3   4  2018-08-26
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-57-ce497a3f68b5> in <module>()
      2 with open('dummy2.csv') as file:
      3     my_df = pandas.read_csv(file, chunksize=2)
----> 4 for chunk in my_df:
      5     print(chunk)

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __next__(self)
   1040     def __next__(self):
   1041         try:
-> 1042             return self.get_chunk()
   1043         except StopIteration:
   1044             self.close()

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in get_chunk(self, size)
   1104                 raise StopIteration
   1105             size = min(size, self.nrows - self._currow)
-> 1106         return self.read(nrows=size)
   1107 
   1108 

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in read(self, nrows)
   1067                 raise ValueError('skipfooter not supported for iteration')
   1068 
-> 1069         ret = self._engine.read(nrows)
   1070 
   1071         if self.options.get('as_recarray'):

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in read(self, nrows)
   1837     def read(self, nrows=None):
   1838         try:
-> 1839             data = self._reader.read(nrows)
   1840         except StopIteration:
   1841             if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()

ValueError: I/O operation on closed file.

all forms of integers maps to str when using fetch_dict=True

When we use the fetch_dict=True flag then all integers turns into strings.
Decimal/float/double seems to work as intended, but not integer, smallint, tinyint, bigint.

From what I can see in the mapper.py document there's nothing specified about any form of integers.

Executing multiple statements

How about supporting executing multiple statements with one call of execute?
The statements could be separated by semicolons.
This is e.g. supported by Pentaho Data Integration.

Broken pipe exception not caught

Hi,

Thanks for this amazing tool! We've been using it for all of our python - exasol related projects.
Since a while we have moved to encrypted connections with Exasol and we sometimes see the following exception:

  File "/opt/conda/envs/pdash/lib/python3.7/site-packages/pyexasol/connection.py", line 179, in execute
    return self.cls_statement(self, query, query_params)
  File "/opt/conda/envs/pdash/lib/python3.7/site-packages/pyexasol/statement.py", line 51, in __init__
    self._execute()
  File "/opt/conda/envs/pdash/lib/python3.7/site-packages/pyexasol/statement.py", line 155, in _execute
    'sqlText': self.query,
  File "/opt/conda/envs/pdash/lib/python3.7/site-packages/pyexasol/connection.py", line 516, in req
    self._ws_send(send_data)
  File "/opt/conda/envs/pdash/lib/python3.7/site-packages/pyexasol/connection.py", line 606, in <lambda>
    self._ws_send = lambda x: self._ws.send_binary(zlib.compress(x.encode(), 1))
  File "/opt/conda/envs/pdash/lib/python3.7/site-packages/websocket/_core.py", line 285, in send_binary
    return self.send(payload, ABNF.OPCODE_BINARY)
  File "/opt/conda/envs/pdash/lib/python3.7/site-packages/websocket/_core.py", line 253, in send
    return self.send_frame(frame)
  File "/opt/conda/envs/pdash/lib/python3.7/site-packages/websocket/_core.py", line 279, in send_frame
    l = self._send(data)
  File "/opt/conda/envs/pdash/lib/python3.7/site-packages/websocket/_core.py", line 449, in _send
    return send(self.sock, data)
  File "/opt/conda/envs/pdash/lib/python3.7/site-packages/websocket/_socket.py", line 157, in send
    return _send()
  File "/opt/conda/envs/pdash/lib/python3.7/site-packages/websocket/_socket.py", line 139, in _send
    return sock.send(data)
  File "/opt/conda/envs/pdash/lib/python3.7/ssl.py", line 1003, in send
    return self._sslobj.write(data)
BrokenPipeError: [Errno 32] Broken pipe

This is using pyexasol version 0.12.0.
I was wondering if this type of exception could be transformed into a ExaCommunicationError as other exceptions related to websocket connections failing are?

WebSocketConnectionClosedException: Connection is already closed.

Hi,

I am trying to make a connection with Exasol to pull some data into a dataframe, however, when I attempt to establish a connection I get an error:

WebSocketConnectionClosedException: Connection is already closed.

Which then is followed by the error ExaConnectionFailedError.

The information provided to open exasol is correct as it works when logging directly into exasol. Has anyone had any similar issues?

Thanks,
Ryan

Broken pipe in import_from_iterable

Hi Vitaly,

I have run into this error several times when importing a large dataset using import_from_iterable.

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pyexasol/connection.py", line 247, in import_from_iterable
    return self.import_from_callback(cb.import_from_iterable, src, table, None, import_params)
  File "/usr/local/lib/python3.7/site-packages/pyexasol/connection.py", line 339, in import_from_callback
    raise e
  File "/usr/local/lib/python3.7/site-packages/pyexasol/connection.py", line 323, in import_from_callback
    result = callback(http_proc.write_pipe, src, **callback_params)
  File "/usr/local/lib/python3.7/site-packages/pyexasol/callback.py", line 66, in import_from_iterable
    writer.writerow(row)
BrokenPipeError: [Errno 32] Broken pipe

I am using pyexasol 0.6.4.

Should this be caught by the library?

Thanks.

potential bug with split_part function

I have encountered an error with pyexasol when using the split_part in the query. Here is the log message:

(
    message     =>  function or script SPLIT_PART not found [line 1, column 8] (Session: 1637404284587589169)
    dsn         =>  localhost:8564
    user        =>  canhduong
    schema      =>
    code        =>  42000
    session_id  =>  1637404284587589169
    query       =>  select split_part(id, "-", 1) from users limit 10
)

I am using pyexasol version: 0.6.3

Please expose resultType property for responseData

Exasol WebSockets API returns a property called resultType in response to the Execute command. Please expose this property in pyexasol. It will help in determining whether the query returned a resultset so it can be iterated on or the query returned a rowCount.

results (object[]) => array of result objects

resultType (string) => type of result: "resultSet" or "rowCount"
rowCount (number, optional) => present if resultType is "rowCount", number of rows
resultSet (object, optional) => present if resultType is "resultSet", result set

Openshift getpwuid(): uid not found

Hello everybody,
when I try to start my application with PyExasol on Openshift, it always crashes. The reason is that connection.py tries to 'clientOsUsername': getpass.getuser ()
access. Since the UID is random in OpenShift, this is not possible. It would be nice if you could parameterize clientOsUsername for OpenShift or another alternative.

Thank you and hope for a solution or implementation

Add comments to queries

For purposes of auditing and profiling, I would like to add comments to some queries.

(For clarity, this type of comments: https://docs.exasol.com/sql_references/basiclanguageelements.htm#Comments_in_SQL)

For cases where the query is directly executed, like when running connection.execute("SELECT * FROM FOO"), this is easy.

However, for import_* and export_* function, as well as for metadata access functions, the actual query is built in pyexasol, without possibility for us to add comments to it.

Is that something that you'd be willing to integrate in pyexasol? If so, I'd be willing to help

Question: Regarding using pyexasol in a docker environment

Hi there,

I don't know if this here is the right place for asking questions, but I have a potentially pretty stupid question so here it is:

If I wanted to use pyexasol in a script that runs in some docker container built from an image like I don't know maybe python:3.9-slim-buster + installing pyexasol via pip would it be possible to connect to an Exasol instance?

Or do I have to make sure and install an ODBC driver inside the container?

Cheers for your help

Pandas import error

I am using the below script to move the data from local file into exasol
The error appears in the import part

import boto3
import botocore
import pyexasol
import pandas as pd
import exa_cred as exa
import Bucket_info as Bk_info
import datetime
import pandas as pd
import base64

#reading credential info and bucket info from local file
exuser = exa.cred['exuser']
expwd = exa.cred['expwd']
exdsn = exa.cred['exdsn']
schema=exa.cred['schema']

C = pyexasol.connect(dsn=exdsn, user=exuser, password=expwd,schema=schema)

#Bucket information
Bucket = Bucket_name
Key = "file.csv"
outPutName = "file.csv"

#Handling exception of not found and downloading the file locally
s3 = boto3.resource('s3')
try:
s3.Bucket(Bucket).download_file(Key, outPutName)
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise

#reading the data with pandas
print(str(datetime.datetime.now()) + " - Start script ...\n")

df=pd.DataFrame
df =pd.read_csv(outPutName,delimiter=';',skipinitialspace=True,engine='python', quotechar ='"')

print("Testing that pandas is working")
print(df.head(10))

#Import part of pandas DataFrame into Exasol table
print(df.iloc[[0,1], [0,1,2]])
C.import_from_pandas(df.iloc[[0], [0,1,2]], 'file')

Import Error which is

Traceback (most recent call last):
File "C:/...../script.py", line 55, in
C.import_from_pandas(df.iloc[[0], [0,1,2]], 'script')
File "C:....\lib\site-packages\pyexasol\connection.py", line 213, in import_from_pandas
return self.import_from_callback(cb.import_from_pandas, src, table, callback_params)
File "C:\lib\site-packages\pyexasol\connection.py", line 300, in import_from_callback
raise sql_thread.exc
File "C:...\site-packages\pyexasol\connection.py", line 290, in import_from_callback
sql_thread.join()
File "C:...\site-packages\pyexasol\http_transport.py", line 53, in join
raise self.exc
File "C:...\site-packages\pyexasol\http_transport.py", line 37, in run
self.run_sql()
File "C:...\site-packages\pyexasol\http_transport.py", line 165, in run_sql
self.connection.execute(query)
File "C:...\site-packages\pyexasol\connection.py", line 140, in execute
self.last_stmt = self.cls_statement(self, query, query_params)
File "C:...\site-packages\pyexasol\statement.py", line 47, in init
self._execute()
File "C:.....\statement.py", line 141, in _execute
'sqlText': self.query,
File "C:...\site-packages\pyexasol\connection.py", line 442, in req
raise cls_err(self, req['sqlText'], ret['exception']['sqlCode'], ret['exception']['text'])
pyexasol.exceptions.ExaQueryError:
(
'] (Session: xxxx)
dsn => xxxx
user => xxxx
schema => xxxx
code => xxxx
session_id => xxxx
query => xxxx
AT 'Server_IP:port' FILE '000.csv'
)

Process finished with exit code 1

Following versions are used:
Exasol = 6.0.8
Python = Python 3.7.2
rsa = 3.4.2
pandas = 0.24.1
websocket-client = 0.55.0
pyexasol = 0.5.2
OS = Windows

import_from_file destination formatting

Hello, I am working on an ETL process to import csv and xlsx data into our Exasol data warehouse. I am able to successfully load the data using pyexasol, but I'm having trouble attempting to get it into a destination table that is not all caps (our database nomenclature uses camel case).

I have tried the quote_ident function, but it returns the same Value ["TestTable"] is not a safe identifier exception that just passing '"TestTable"' as the destination table argument to the import_from_file function does.

Am I misunderstanding the purpose of this function? Or am I not using it properly? I've been through all of the documentation, but haven't found a way to import with this package to a destination with lower case characters in the string.

cnxn.import_from_file(open(importFile, 'rb'), '"TestTable"')

dest = pyexasol.ExaFormatter.quote_ident('TestTable')
cnxn.import_from_file(open(importFile, 'rb'), dest)

Many thanks for any advice you can give.

Using the export_to_pandas function of pyexasol in uwsgi results in a pyexasol.exceptions.ExaQueryError

Hello, I have found that the usage of export_to_pandas (or any export_to_* which in turn calls export_to_callback) results in an ExaQueryError when used in a uwsgi based web application. Suspecting that this could be due to the multi-threading operation happening within the function. Following is the stack-trace and repro steps:

Start a sample uwsgi server using pyex_dummy.py using the following command:

uwsgi --http :9090 --wsgi-file pyex_dummy.py

Hit the newly setup endpoint with the curl call in a separate terminal window:

curl http://0.0.0.0:9090

pyex_dummy_no_creds.py.zip

The following stacktrace is observed within the server window:

Traceback (most recent call last):
  File "/opt/conda3/lib/python3.7/site-packages/pyexasol/connection.py", line 313, in export_to_callback
    result = callback(http_proc.read_pipe, dst, **callback_params)
  File "/opt/conda3/lib/python3.7/site-packages/pyexasol/callback.py", line 42, in export_to_pandas
    return pandas.read_csv(pipe, skip_blank_lines=False, **kwargs)
  File "/opt/conda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 685, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/opt/conda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 457, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/opt/conda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 895, in __init__
    self._make_engine(self.engine)
  File "/opt/conda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1135, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/opt/conda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1917, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 545, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 14, in application
    df = connection.export_to_pandas(query)
  File "/opt/conda3/lib/python3.7/site-packages/pyexasol/connection.py", line 271, in export_to_pandas
    return self.export_to_callback(cb.export_to_pandas, None, query_or_table, query_params, callback_params, export_params)
  File "/opt/conda3/lib/python3.7/site-packages/pyexasol/connection.py", line 335, in export_to_callback
    raise sql_thread.exc
  File "/opt/conda3/lib/python3.7/site-packages/pyexasol/http_transport.py", line 34, in run
    self.run_sql()
  File "/opt/conda3/lib/python3.7/site-packages/pyexasol/http_transport.py", line 153, in run_sql
    self.connection.execute("\n".join(parts))
  File "/opt/conda3/lib/python3.7/site-packages/pyexasol/connection.py", line 186, in execute
    return self.cls_statement(self, query, query_params)
  File "/opt/conda3/lib/python3.7/site-packages/pyexasol/statement.py", line 55, in __init__
    self._execute()
  File "/opt/conda3/lib/python3.7/site-packages/pyexasol/statement.py", line 159, in _execute
    'sqlText': self.query,
  File "/opt/conda3/lib/python3.7/site-packages/pyexasol/connection.py", line 572, in req
    raise cls_err(self, req['sqlText'], ret['exception']['sqlCode'], ret['exception']['text'])
pyexasol.exceptions.ExaQueryError: 
(
    message     =>  ETL-5106: Following error occured while writing data to external connection [http://000.gz/ failed after 0 bytes. [Could not resolve host: 000.gz],[6],[Couldn't resolve host name]] (Session: 1698365725588979714)
    dsn         =>  DEMODB.EXASOL.COM
    user        =>  PUB3511
    schema      =>  
    session_id  =>  1698365725588979714
    code        =>  42636
    query       =>  EXPORT (
SELECT * FROM EXA_SYSCAT
) INTO CSV
AT 'http://' FILE '000.gz'
WITH COLUMN NAMES
)

We are running this in a container that runs a RHEL release version 7.9 OS

Why does the user need execute rights

Why does the user need execute rights in your library if you only want to read data?
If there is no good reason, the permissions should be read-only

pyexasol.ExaConnection.export_to_pandas(), .export_to_list(), .export_to_file() intermittent failures

pyexasol.ExaConnection.export_to_pandas(), .export_to_list(), .export_to_file() may cause failures for long executing queries, when running on the setup with stateful firewalls and other functionality depending on connection tracking.

As far as it looks like, when running export, pyexasol starts a socket server in a separate process, which first sends out special TCP packet to which Exasol server responds with a proxy address and port (https://github.com/badoo/pyexasol/blob/master/pyexasol_utils/http_transport.py#L32). Then, this address and port from one of the exasol nodes is used in the EXPORT query (e.g. EXPORT <SELECT * FROM TABLE> INTO CSV AT 'http://10.0.0.1:45223' FILE '000.gz) and the result of it is later HTTP Streamed to that socket server by initialising it with a PUT request with Transfer-Encoding: chunked header set. That way pyexasol manages to circumvent firewall on the client side. Exasol does its magic on its side by proxying data from that node to the socket on the client side (IDK for sure, just guessing this part).

Tracing the network frames, we can see that after establishing TCP connection and getting the proxy address/port the connection is quiet until the query is executed and that PUT is sent (frame 773):
10.54.12.1 - client
10.0.0.1 - exasol

No. Time  Source  Destination Protocol  Length  Info
53  29.671637 10.54.12.1  10.0.0.1  TCP 74  33514 → 8888 [SYN] Seq=0 Win=65320 Len=0 MSS=1420 SACK_PERM=1 TSval=4107909024 TSecr=0 WS=128
54  29.672398 10.0.0.1  10.54.12.1  TCP 74  8888 → 33514 [SYN, ACK] Seq=0 Ack=1 Win=28160 Len=0 MSS=1420 SACK_PERM=1 TSval=3286892237 TSecr=4107909024 WS=128
55  29.672427 10.54.12.1  10.0.0.1  TCP 66  33514 → 8888 [ACK] Seq=1 Ack=1 Win=65408 Len=0 TSval=4107909025 TSecr=3286892237
56  29.672465 10.54.12.1  10.0.0.1  TCP 78  33514 → 8888 [PSH, ACK] Seq=1 Ack=1 Win=65408 Len=12 TSval=4107909025 TSecr=3286892237
57  29.672599 10.0.0.1  10.54.12.1  TCP 66  8888 → 33514 [ACK] Seq=1 Ack=13 Win=28160 Len=0 TSval=3286892237 TSecr=4107909025
58  29.673306 10.0.0.1  10.54.12.1  TCP 90  8888 → 33514 [PSH, ACK] Seq=1 Ack=13 Win=28160 Len=24 TSval=3286892237 TSecr=4107909025
59  29.673331 10.54.12.1  10.0.0.1  TCP 66  33514 → 8888 [ACK] Seq=13 Ack=25 Win=65408 Len=0 TSval=4107909025 TSecr=3286892237
773 288.024444  10.0.0.1  10.54.12.1  TCP 194 8888 → 33514 [PSH, ACK] Seq=25 Ack=13 Win=28160 Len=128 TSval=3286956825 TSecr=4107909025 [TCP segment of a reassembled PDU]

This TCP connection will be idle for us long as query executes.
Now this may cause issues for stateful firewalls somewhere in the middle that use some form of connection tracking when query runs longer than tracking timeout.

In our particular case, we faced this issue with GCP VPC Firewall, which implicitly allows all egress by default and denies ingress. Returning traffic is allowed for established connections, but after 600 seconds of idle time the connection information is dropped and this implicit deny kicks in. (https://cloud.google.com/compute/docs/troubleshooting/general-tips#communicatewithinternet) Since GCP Firewalls are implemented on the node via virtual switch, we do not have ability control timeouts on that side. Currently, we circumvent that by adding an allow ingress rule for TCP protocol on high ports.

This issue may also arise in either manual setup using Open vSwitch or netfilter/conntrack to provide Firewall, NAT, etc. E.g. sNAT-ing kubernetes pods may be stateful.

It would be nice to set following options on the socket:

socket.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, idle_timeout)

Where, idle_timeout is some value under 600, since it is a default value for nf_conntrack_generic_timeout in linux kernel. The default interval and number of probes actually works fine with default conntrack values, I suppose with other connection tracking implementations too.

It could be also an optionally setable value from the ExaConnection level to enable keepalive on the export socket and set the idle timeout, interval, etc.

Thank you very much!

Limit of 30 websocket connections?

Thank you very much for your awesome work on pyexasol.

There is a websocket connection limit of 30 in Exasol. If there are 30 open websocket connections, you can't open annother one. Do you know how to increase the Exasol websocket connection limit on the server side? If yes, an update of the docs would be helpful for everybody else running into this limit.

Thanks a lot!

How to import only selected columns of csv file

Is there a way to import only selected columns of a csv file into exasol?
If I understand correctly, Exasol supports this using the the csv_cols block in the import statement, as described in the documentations.
But is it possible to use this feature with pyexasol?

connection pool support

Hello,

I am developing a web application in which I use pyexasol to query for data from Exasol.

Is it possible to use a connection pool in pyexasol?

Having a connection pool would allow the web server to maintain long-running connections in memory for efficient re-use.

Thanks

stderr does not work with Jupyter Notebooks

Hi,

Since the most recent update of pyexasol (0.3.26), the use of stderr means pyexasol is no longer able to import data into a jupyter notebook.

Please see below the full error readout:

~/venvp3/lib/python3.6/site-packages/pyexasol/connection.py in export_to_pandas(self, query_or_table, query_params, callback_params)
    211 
    212     def export_to_pandas(self, query_or_table, query_params=None, callback_params=None):
--> 213         return self.export_to_callback(cb.export_to_pandas, None, query_or_table, query_params, callback_params, {'with_column_names': True})
    214 
    215     def import_from_file(self, src, table, import_params=None):

~/venvp3/lib/python3.6/site-packages/pyexasol/connection.py in export_to_callback(self, callback, dst, query_or_table, query_params, callback_params, export_params)
    266                 raise sql_thread.exc
    267 
--> 268             raise e
    269 
    270     def import_from_callback(self, callback, src, table, callback_params=None, import_params=None):

~/venvp3/lib/python3.6/site-packages/pyexasol/connection.py in export_to_callback(self, callback, dst, query_or_table, query_params, callback_params, export_params)
    244         try:
    245             http_proc = ExaHTTPProcess(self.ws_host, self.ws_port, compression, self.encryption, HTTP_EXPORT)
--> 246             http_proc.start()
    247 
    248             sql_thread = ExaSQLExportThread(self, http_proc.get_proxy(), compression, query_or_table, export_params)

~/venvp3/lib/python3.6/site-packages/pyexasol/http_transport.py in start(self)
    199             args.append('--encryption')
    200 
--> 201         self.proc = subprocess.Popen(args, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=sys.stderr)
    202         self.proxy = self.proc.stdout.readline().decode().rstrip('\n')
    203 

/usr/local/Cellar/python/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors)
    665         (p2cread, p2cwrite,
    666          c2pread, c2pwrite,
--> 667          errread, errwrite) = self._get_handles(stdin, stdout, stderr)
    668 
    669         # We wrap OS handles *before* launching the child, otherwise a

/usr/local/Cellar/python/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py in _get_handles(self, stdin, stdout, stderr)
   1199             else:
   1200                 # Assuming file-like object
-> 1201                 errwrite = stderr.fileno()
   1202 
   1203             return (p2cread, p2cwrite,

UnsupportedOperation: fileno

I encountered this error when trying to use the 'export_to_pandas' function.

This error should be reproducible for a simple select statement, ie

select 1 from dual

Thanks!

Python Version - 3.6.3
pyexasol version - 0.3.26
Pandas Version - 0.23.0

Question regarding the subprocess for http_transport and non default module search paths

Hi @wildraid,

we saw a bit strange behavior of pyexasol in AWS Lamda. Everything except the http_transport works.
We saw the following error message:

/var/lang/bin/python3.7: Error while finding module specification for 'pyexasol_utils.http_transport' (ModuleNotFoundError: No module named

With the following stacktrace:

df = C.export_to_pandas(stmt)                                         # Get the result (prefix/folders to be imported) of the stmt-query into a dataframe
File "/opt/python/pyexasol/connection.py", line 271, in export_to_pandas
return self.export_to_callback(cb.export_to_pandas, None, query_or_table, query_params, callback_params, export_params)
File "/opt/python/pyexasol/connection.py", line 335, in export_to_callback
raise sql_thread.exc
File "/opt/python/pyexasol/http_transport.py", line 34, in run
self.run_sql()
File "/opt/python/pyexasol/http_transport.py", line 153, in run_sql
self.connection.execute("\n".join(parts))
File "/opt/python/pyexasol/connection.py", line 186, in execute
return self.cls_statement(self, query, query_params)
File "/opt/python/pyexasol/statement.py", line 55, in init
self._execute()
File "/opt/python/pyexasol/statement.py", line 159, in _execute
'sqlText': self.query,
File "/opt/python/pyexasol/connection.py", line 572, in req
raise cls_err(self, req['sqlText'], ret['exception']['sqlCode'], ret['exception']['text']) END

My guess, is that, the starting of the subprocess for the http_transport fails (see source reference below), because the pyexasol_utils module is not in the default module search path and the search path for the parent process was modified.

https://github.com/badoo/pyexasol/blob/3b5211fa78e4d83ea16e11532048f6cdcaeab43d/pyexasol/http_transport.py#L244

I would try next to get additional information about the environment with

import sys
import sysconfig
import os

print(sys.flags)
print(sys.path)
print(sysconfig.get_paths())
print(os.environ)
print(sysconfig.get_config_vars())

Any thoughts?

Pyexasol crashed in Kubernete due to incorrectly identified as orphaned process

When running pyexasol inside a kubernete pod, pyexasol crashed as it incorrectly identified the process as an orphan. This behavior was introduced in commit

39d8ae3

Please see below for more details for the error:

RuntimeError: Current process is orphaned, ppid=1
at check_orphaned (/usr/local/lib/python3.6/site-packages/pyexasol/utils.py:120)
at handle_request (/usr/local/lib/python3.6/site-packages/pyexasol/http_transport.py:245)
at <module> (/usr/local/lib/python3.6/site-packages/pyexasol/__main__.py:43)

Below is the ps stack for the kube pod

$ kubectl exec my-pod -- ps -ef
UID        PID  PPID C STIME TTY          TIME CMD
root        1     0  0 22:07 ?        00:00:01 python some.py
root        8     1  2 22:07 ?        00:00:01 /usr/local/bin/python -m pyexasol http --host x.x.x.x --port yyyy --mode export

Allow ENCODING parameter in ExaSQLImportThread

Hi,

Currently, encoding is not part of the parameters that are accepted by ExaSQLImportThread.
In our configuration, not having "ENCODING = 'UTF-8' " with cause the export to fail.

Could you please add this feature; useful and straightforward :) ?

pyexasol not recording logout_time in exa_dba_audit_sessions

pyexasol version 0.12.1 or 0.10.1
python 3.7

Closing a connection in a python script, doesn't show logout_time in exasol metadata table.

To reproduce:

open an exasol connection
run a dummy query
close connection
try to run the query again

>>> import os
>>> import pyexasol
>>> C = pyexasol.connect(dsn=os.getenv("EXASOL_DSN"), user=os.getenv("EXASOL_USR"), password=os.getenv("EXASOL_PWD"))
>>> stmt = C.execute("/* test */ SELECT 1;")
>>> print(stmt)
<ExaStatement session_id=1667199361071797309 stmt_idx=1>
>>> C.close()
>>> C.execute("/* test */ SELECT 1;")

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/uid/.virtualenvs/test/lib/python3.7/site-packages/pyexasol/connection.py", line 179, in execute
    return self.cls_statement(self, query, query_params)
  File "/Users/uid/.virtualenvs/test/lib/python3.7/site-packages/pyexasol/statement.py", line 36, in __init__
    raise ExaRuntimeError(self.connection, "Exasol connection was closed")

However when I checked the exa_statistics.exa_dba_audit_sessions table, no logout time was recorded:

SELECT 
  session_id,
  login_time,
  logout_time,
  user_name,
  client,
  driver,
  os_name
FROM exa_statistics.exa_dba_audit_sessions 
WHERE session_id = '1667835135723529859';

Even the next day (accounting for some delays) the logout time was 2 hours later.

Am I missing something?

get_disk_space_usage() is failing if Exasol has just started

The function get_disk_space_usage() in ext.py fails with:

Traceback (most recent call last):
  File "examples/13_ext.py", line 36, in <module>
    occupied_space = C.ext.get_disk_space_usage()
  File "/home/travis/build/expobrain/pyexasol/pyexasol/ext.py", line 200, in get_disk_space_usage
    row['occupied_size'] = int(row['occupied_size'])
TypeError: 'NoneType' object is not subscriptable

if Exasol has just started and the $EXA_STATS_DB_SIZE table is stil empty.

Frequent SSL Error

Version: 0.14.1

While executing multiple queries using 1 open connection, frequently it errors out with an SSL error. It happens on Select/Insert/Update/Merge queries.

File "/anaconda3/lib/python3.7/site-packages/pyexasol/connection.py", line 183, in execute
return self.cls_statement(self, query, query_params)
File "/anaconda3/lib/python3.7/site-packages/pyexasol/statement.py", line 55, in init
self._execute()
File "/anaconda3/lib/python3.7/site-packages/pyexasol/statement.py", line 159, in _execute
'sqlText': self.query,
File "/anaconda3/lib/python3.7/site-packages/pyexasol/connection.py", line 539, in req
recv_data = self._ws_recv()
File "/anaconda3/lib/python3.7/site-packages/pyexasol/connection.py", line 629, in
self._ws_recv = lambda: zlib.decompress(self._ws.recv())
File "/anaconda3/lib/python3.7/site-packages/websocket/_core.py", line 313, in recv
opcode, data = self.recv_data()
File "/anaconda3/lib/python3.7/site-packages/websocket/_core.py", line 330, in recv_data
opcode, frame = self.recv_data_frame(control_frame)
File "/anaconda3/lib/python3.7/site-packages/websocket/_core.py", line 343, in recv_data_frame
frame = self.recv_frame()
File "/anaconda3/lib/python3.7/site-packages/websocket/_core.py", line 377, in recv_frame
return self.frame_buffer.recv_frame()
File "/anaconda3/lib/python3.7/site-packages/websocket/_abnf.py", line 361, in recv_frame
self.recv_header()
File "/anaconda3/lib/python3.7/site-packages/websocket/_abnf.py", line 309, in recv_header
header = self.recv_strict(2)
File "/anaconda3/lib/python3.7/site-packages/websocket/abnf.py", line 396, in recv_strict
bytes = self.recv(min(16384, shortage))
File "/anaconda3/lib/python3.7/site-packages/websocket/_core.py", line 452, in _recv
return recv(self.sock, bufsize)
File "/anaconda3/lib/python3.7/site-packages/websocket/socket.py", line 102, in recv
bytes = _recv()
File "/anaconda3/lib/python3.7/site-packages/websocket/_socket.py", line 84, in _recv
return sock.recv(bufsize)
File "/anaconda3/lib/python3.7/ssl.py", line 1037, in recv
return self.read(buflen)
File "/anaconda3/lib/python3.7/ssl.py", line 913, in read
return self._sslobj.read(len)
ssl.SSLError: [SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:2488)

Insert multi cannot insert decimal into text field

Hi,

Is it expected that Insert_multi cannot insert decimal into text field? Thanks.

Table schema:

CREATE TABLE SAMPLE_SCHEMA.TABLE1(
NNUMBER NUMERIC(30,10),
WWORD VARCHAR(20000)
)

CREATE TABLE SAMPLE_SCHEMA.TABLE2(
NNUMBER VARCHAR(20000),
WWORD VARCHAR(20000)
)

Python code:

import pyexasol
import pandas as pd
print(pyexasol.__version__)
print(pd.__version__)

def get_connection():
    conn = pyexasol.connect(
        dsn="DSN",
        user="USERNAME",
        password="PASSWORD")
    return conn

tuple_list = [
    (3.1415926, "AAA"),(1.41421356237, "BBB")
]
df = pd.DataFrame(tuple_list, columns=["NNUMBER", "WWORD"])
conn = get_connection()

conn.import_from_pandas(df, ("SAMPLE_SCHEMA", "TABLE1"))
conn.commit()
print("Step 1 OK")

conn.import_from_pandas(df, ("SAMPLE_SCHEMA", "TABLE2"))
conn.commit()
print("Step 2 OK")

tuple_list = list(df.itertuples(index=False))
conn.ext.insert_multi(("SAMPLE_SCHEMA", "TABLE1"),tuple_list,columns=list(df.columns))
conn.commit()
print("Step 3 OK")

conn.ext.insert_multi(("SAMPLE_SCHEMA", "TABLE2"),tuple_list,columns=list(df.columns))
conn.commit()
print("Step 4 OK")  # <-- Will never be reached.

Output:

0.12.0
1.0.3
Step 1 OK
Step 2 OK
Step 3 OK
Traceback (most recent call last):
    ............
    raise cls_err(self, req['sqlText'], ret['exception']['sqlCode'], ret['exception']['text'])
KeyError: 'sqlText'

Error when importing from pandas

Hi,

I'm trying to import a dataframe into exasol using the import_from_pandas() method, resulting in the following errors. The debug says that the connection step before is ok. Strangely, the import sometimes works anyway and sometimes not.

Importing data:
2021-03-15 16:31:47.053 [WebSocket response 4]
{
"responseData": {
"numResults": 1,
"results": [
{
"resultType": "rowCount",
"rowCount": 19451
}
]
},
"status": "ok"
}
2021-03-15 16:31:47.200 [WebSocket request 5]
{
"command": "execute",
"sqlText": "IMPORT INTO T_JS_CRITEO_STAGE FROM CSV\nAT '...' FILE '000.gz'"
}
Traceback (most recent call last):
File "c:\program files\python39\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\program files\python39\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:...\site-packages\pyexasol_utils\http_transport.py", line 259, in
server.check_orphaned(args.ppid)
File "C:...\site-packages\pyexasol_utils\http_transport.py", line 92, in check_orphaned
raise RuntimeError(f"Current process is orphaned, initial ppid={initial_ppid}, current ppid={current_ppid}")
RuntimeError: Current process is orphaned, initial ppid=17440, current ppid=5732
2021-03-15 16:31:48.800 [WebSocket abort request]
{
"command": "abortQuery"
}
2021-03-15 16:31:49.596 [WebSocket response 5]
{
"responseData": {
"numResults": 1,
"results": [
{
"resultType": "rowCount",
"rowCount": 19451
}
]
},
"status": "ok"
}
2021-03-15 16:31:49.596 [WebSocket request 6]
{
"command": "disconnect"
}
2021-03-15 16:31:49.609 [WebSocket response 6]
{
"status": "ok"
}
2021-03-15 16:31:49.609 [WebSocket connection close]
Traceback (most recent call last):
File "C:\my_project\main.py", line 63, in
stage()
File "C:\my_project\main.py", line 56, in stage
dbconn.import_from_pandas(df, "T_JS_CRITEO_STAGE")
File "C:...\site-packages\pyexasol\connection.py", line 280, in import_from_pandas
return self.import_from_callback(cb.import_from_pandas, src, table, callback_params, import_params)
File "C:...\site-packages\pyexasol\connection.py", line 391, in import_from_callback
raise e
File "C:...\site-packages\pyexasol\connection.py", line 370, in import_from_callback
http_proc.join_with_exc()
File "C:...\site-packages\pyexasol\http_transport.py", line 263, in join_with_exc
raise RuntimeError(f"HTTP transport process finished with exitcode: {code}")
RuntimeError: HTTP transport process finished with exitcode: 1

Process finished with exit code 1

Following versions are used:
Python 3.9
Pandas 1.2.3
Pyexasol 0.18.0

How can I show all the tables in a specific scheme?

How to show all the tables in a specific scheme?
Say that I have a database called PSG, I can excecute "select * from PSG.sometable" but I can not get how many and the name of all the tables, even if I have tried many grammar and read many docs

what's the schema?

In /docs/LOCAL_CONFIG.md it says:

schema = my_schema

So what's the schema?

Security Risk with Export

When I was just trying to issue a SQL statement with (export_to_pandas) a user who only had read rights, an error occurred. I found out that the data is written to a CSV via SQL export and then the data is collected via FTP. These extended rights, FTP and the CSV file represent a security risk. In my opinion, another variant for select statements should be selected. I would appreciate a feedback or a modification of the Select variant

Issue with parsing double from CSV

Hi,

I faced an issue while working with library functionality
conn.import_from_file(file_obj, update_table_name)
if we have double value in CSV file like "0.0272" it will become 0.027200000000000002 in exasol.

I can't see ability to setup format for CSV like that: (1..57,58 format='999999999999999.999999', 59..93)
To query, that was produced by library:

IMPORT INTO tmp_transactions_7220673 FROM CSV
AT 'http://27.1.0.10:46363' FILE '000.gz'

Unsupported WebSocket opcode

We're seeing low-level errors bubble up through pyexasol which we can't explain. For example:

pyexasol.exceptions.ExaQueryError: 
(
    message     =>  WebSocket: readMessage: Unsupported WebSocket opcode: 11 (Session: <snip>)
    dsn         =>  exasol:8888
    user        =>  sys
    schema      =>  
    code        =>  00000
    session_id  =>  <snip>
    query       =>  MERGE INTO <snip>
USING <snip> landing_inserts
ON ( lake_table.<snip> = landing_inserts.<snip> )
WHEN MATCHED THEN
    UPDATE SET lake_table.<snip> = landing_inserts.<snip> , ..., lake_table.<snip>=CURRENT_TIMESTAMP
WHEN NOT MATCHED THEN
    INSERT VALUES (...)
)

This is after several queries ran through successfully; it is the first MERGE, though.

Re-running the same code yielded the same error with opcode: 12.
We've also seen opcode 14 once on another query, but have not been able to reproduce.

This is running queries with pyexasol 0.11.2 on Python 3.7.6 against exasol/docker-db:6.2.4-d1.

Encryption doesn't happen on Connection

I installed pyexasol as described in "pyexasol/docs/ENCRYPTION.md". After initiating the connection to database with "encryption=True" option or specifying "encryption = True" in the config file, it still created a non-encrypted connection.

I am checking the encryption status in the exa_user_sessions system table.

Am I missing something here?

Pandas import/export throws cannot serialize error

Following script:

import pyexasol
import json
import pandas

Con = pyexasol.connect(dsn='***:8563', user='', password = '')

##serialize error
data_frame = Con.export_to_pandas("SELECT * FROM STAGE.SQUAWKA_CUR_FIXTURES")

##works
#st = Con.execute("SELECT * FROM STAGE.SQUAWKA_CUR_FIXTURES")
#data_frame = pandas.DataFrame(data=st.fetchall())

print(data_frame.head())

##serialize error
#Con.import_from_pandas(data_frame,'sandbox.cur_fixtures')

##serialize error
#Con.import_from_iterable(data_frame.iterrows(),'sandbox.cur_fixtures')

throws error, when trying to export or import data into a pandas data frame.
Using the fetchall() method works and data can be loaded into a data frame.

Following trace is shown:

C:\Programmel\Continuum\Anaconda3\envs\tensorflow\python.exe "C:/Users/ado/Documents/betting db/099_dokumentation/Python/test.py"
Traceback (most recent call last):
File "C:/Users/ado/Documents/betting db/099_dokumentation/Python/test.py", line 8, in
data_frame = Con.export_to_pandas("SELECT * FROM STAGE.SQUAWKA_CUR_FIXTURES")
File "C:\Programmel\Continuum\Anaconda3\envs\tensorflow\lib\site-packages\pyexasol\connection.py", line 205, in export_to_pandas
return self.export_to_callback(cb.export_to_pandas, None, query_or_table, query_params, callback_params, {'with_column_names': True})
File "C:\Programmel\Continuum\Anaconda3\envs\tensorflow\lib\site-packages\pyexasol\connection.py", line 260, in export_to_callback
raise e
File "C:\Programmel\Continuum\Anaconda3\envs\tensorflow\lib\site-packages\pyexasol\connection.py", line 238, in export_to_callback
http_proc.start()
File "C:\Programmel\Continuum\Anaconda3\envs\tensorflow\lib\site-packages\pyexasol\http_transport.py", line 213, in start
super().start()
File "C:\Programmel\Continuum\Anaconda3\envs\tensorflow\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Programmel\Continuum\Anaconda3\envs\tensorflow\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Programmel\Continuum\Anaconda3\envs\tensorflow\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Programmel\Continuum\Anaconda3\envs\tensorflow\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\Programmel\Continuum\Anaconda3\envs\tensorflow\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot serialize '_io.FileIO' object

Process finished with exit code 1

Following versions are used:
Exasol = 6.0.0
Python = 3.6.5
rsa = 3.4.2
pandas = 0.23.0
websocket-client = 0.48.0

Wrong parameter assert for EXPORT

For the "DELIMIT" parameter the correct parameter value for "no delimiters" is "NEVER" rather than "NONE".
The way this is implemented in http_transport.py the paramter does not work with EXASOL 6.X

SSL connection with validation of SAN-entries in the certificate in newer python versions

We encounter some issues when we try to run pyExasol in combination with tls-encryption and validating SAN entries in the certificate, but only in newer python-versions... starting with python 3.7.9 the method we used to connect is not working anymore.

The issue is related to #61

See details below..

Running py exasol with option encrytped
and check_hostname=True doesn't work with certificates where DNS Names are in Subject Alternative Name (SAN) .

running with PYTHON 3.8.6 und Python 3.6.4 and Python 3.7.9 on Linux

check_hostname": True — throws error
check_hostname": False — works without validation of DNS Names (SNI entries)

import ssl
import pyexasol

c = pyexasol.connect(dsn='hostname_with_domain:8563',
user='USER',
password='PWD',
debug=True,
encryption=True,
websocket_sslopt={"cert_reqs": ssl.CERT_REQUIRED,
"check_hostname": True,
"ca_certs": "C:\test\cert\cacert.pem"}
)

res = c.execute('SELECT * FROM EXA_ALL_TABLES ')
rows = res.fetchmany(10)
for row in rows:
print(rows)

c.close(True)

Running Command with Python 3.6.4

pyexasol.exceptions.ExaConnectionFailedError:
(
message => Could not connect to Exasol: hostname 'xx.xxx.xxx.xxx' doesn't match either of 'hostname1.dns.com', 'hostname2.dns.com', 'CertreqId-xxxxxxxxxx'

Running Command with Python 3.8.6 and 3.7.9
pyexasol.exceptions.ExaConnectionFailedError:
(
message => Could not connect to Exasol: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: IP address mismatch, certificate is not valid for 'xx.xxx.xxx.xxx'. (_ssl.c:1124)
dsn => hostname_with_domain:8563
user => USER
schema =>
session_id =>
)

Running Python 3.6.4 with option

ssl.match_hostname = lambda cert, hostname: True

works !!!

(venv38) pip list
Package Version

certifi 2020.12.5
cffi 1.14.4
chardet 3.0.4
cryptography 3.2.1
EXASOL-DB-API 6.0.0
gevent 20.9.0
greenlet 0.4.17
idna 2.10
pip 20.3.1
pproxy 2.5
psutil 5.7.3
pyasn1 0.4.8
pycparser 2.20
pyexasol 0.15.1
pyOpenSSL 20.0.0
requests 2.25.0
rsa 4.6
setuptools 47.1.0
six 1.15.0
SQLAlchemy 1.3.6
urllib3 1.26.2
websocket 0.2.1
websocket-client 0.57.0
zope.event 4.5.0
zope.interface 5.2.0

get_sys_columns((my_schema, my_table)) doesn't work if the connection's schema is defined

Hi,
Using get_sys_columns() with a tuple (my_schema, my_table), my_schema is ignored if the schema argument of the current connection is not the default one ('', no schema).
I'm using version 0.6.4.
Any idea?

exasol / pyexasol Goto Github PK

pyexasol's Introduction

Pyexasol

Quick links

PyEXASOL main concepts

System requirements

Getting started

Created by

Maintained by

pyexasol's People

Contributors

Stargazers

Watchers

Forkers

pyexasol's Issues

INSTALLED VERSIONS

Import Error which is

Recommend Projects

Recommend Topics

Recommend Org