I am using the below to move the data from local file into exasol The error

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Pandas import error about pyexasol HOT 16 CLOSED

exasol commented on September 13, 2024

Pandas import error

from pyexasol.

Comments (16)

littleK0i commented on September 13, 2024

It is a bit weird that specific error message is not visible. Did you cut it out by any chance? It should be somewhere in this block:

'] (Session: xxxx)
dsn => xxxx
user => xxxx
schema => xxxx
code => xxxx
session_id => xxxx
query => xxxx
AT 'Server_IP:port' FILE '000.csv'
)

Also, you do not really need pandas to import this data. Just try to call .import_from_file, if it is a proper CSV file already.

Maybe you do not even need http transport at all. It is possible to import data directly from HTTP address of file in bucket.

from pyexasol.

Salehflaconi commented on September 13, 2024

No i didn't cut the error
which is weird as i don't know how to google it

I am using pandas as i will handle some data in the file
that's why i need to download the file first and then doing some transformation on data

from pyexasol.

littleK0i commented on September 13, 2024

Hmm, ok. Let's try to find actual error in EXA_DBA_AUDIT_SQL.

If Auditing flag is enabled, try to find failing IMPORT query in this system view. It provides a column ERROR_TEXT. Let's see what it says.

from pyexasol.

littleK0i commented on September 13, 2024

If it does not help, could you also try to enable debug=True connection option and post last request and response here?

Let's see if this error is truly anonymous. Maybe you discovered completely new type of execute responses which was not seen before.

from pyexasol.

littleK0i commented on September 13, 2024

Do you have a firewall on your local laptop, which blocks connections on custom ports? Could you try to disable all firewalls?

from pyexasol.

Salehflaconi commented on September 13, 2024

I spent yesterday working on it

The problem was mismatching datatypes between the default Dataframe and the data itself
don't know why the error message was weird !!

Thank you for your help

from pyexasol.

littleK0i commented on September 13, 2024

@Salehflaconi , could you attach a minimal reproducible code (1 row of data in pandas and table structure) or request log produced by connection option debug=True?

Lack of error message is not normal in such a trivial case. I would be happy to investigate it.

Thank you.

from pyexasol.

Salehflaconi commented on September 13, 2024

Of course i will be happy to do that with you
but would you please help me to enable the debuging as i didn't do it before
i also enabled the audting but it didn't show any error

How i solved part of the errors :
I chengded the datatypes of each column of the dataframes as

#Changing the data types to be able to import
df[0]= df[0].astype(str)
df[1]= df[1].astype(str)
df[2]= pd.to_numeric(df[2])
df[3] = pd.to_datetime(df[3]).dt.date
df[4] = pd.to_datetime(df[4]).dt.date
df[5] = df[5].astype(int)
df[6] = df[6].astype(float)
df[7] = df[7].astype(float)
df[8] = df[8].astype(float)
df[9] = df[9].astype(float)
df[10] = df[10].astype(float)
df[11] = df[11].astype(float)
df[12] = df[12].astype(float)
df[13] = df[13].astype(float)
df[14] = df[14].astype(float)
df[15] = df[15].astype(float)
df[16] = df[16].astype(str)

And now
I am facing a new issue which is handling decoded64 column
which i converted it to

df[16] = df[16].astype(str)

Sample of my Data is
b'

I set the delimiter as
df = pd.read_csv(outPutName, delimiter=';', skipinitialspace=True, engine='python', quotechar ='"' , header=None)

but the delimiter now is not "
and my string as XML contains a lot of special characters

And the Error is

Traceback (most recent call last):
File "C:/saleh/Programs/Git/bi-python/Download_Bonsai_from_S3_to_DWH/Bonsai.py", line 93, in
C.import_from_pandas(df.iloc[[1], [16]], 'BONSAI_APK')
File "......\pyexasol\connection.py", line 213, in import_from_pandas
return self.import_from_callback(cb.import_from_pandas, src, table, callback_params)
File "......\pyexasol\connection.py", line 300, in import_from_callback
raise sql_thread.exc
File "......\pyexasol\connection.py", line 290, in import_from_callback
sql_thread.join()
File "......\pyexasol\http_transport.py", line 53, in join
raise self.exc
File "......\pyexasol\http_transport.py", line 37, in run
self.run_sql()
File "......\pyexasol\http_transport.py", line 165, in run_sql
self.connection.execute(query)
File "......\pyexasol\connection.py", line 140, in execute
self.last_stmt = self.cls_statement(self, query, query_params)
File "......\pyexasol\statement.py", line 47, in init
self._execute()
File "......\pyexasol\statement.py", line 141, in _execute
'sqlText': self.query,
File "......\pyexasol\connection.py", line 442, in req
raise cls_err(self, req['sqlText'], ret['exception']['sqlCode'], ret['exception']['text'])
pyexasol.exceptions.ExaQueryError:
(
message => ETL-2105: Error while parsing row=0 (starting from 0) [CSV Parser found at byte 993 (starting with 0 at the beginning of the row) of 993 a single field delimiter in enclosed field or not correct enclosed field in file '000.csv'. Please check for unescaped field delimiters in data fields (they have to be escaped) and correct enclosing of this field] (Session: xxxxx)
dsn => xxxx
user => xxxx
schema => xxxx
code => xxxx
session_id => xxxx
query => xxxx
AT 'IP:port FILE '000.csv'
)

from pyexasol.

littleK0i commented on September 13, 2024

Thank you.

In order to enable debugging, just add an extra connection option debug=True to pyexasol.connect():

Example:

C = pyexasol.connect(dsn='...', user='...', password='...', debug=True)

from pyexasol.

cyroxx commented on September 13, 2024

I had the same error. For me, downgrading to pandas 0.23.4 helped. So maybe there is an issue with pyexasol+pandas 0.24.x?

from pyexasol.

Salehflaconi commented on September 13, 2024

@cyroxx i will share with you my solution today
@wildraid there is another bug i will share today it with you

as i created work around to complete my task as i just finished it !

from pyexasol.

Salehflaconi commented on September 13, 2024

The problem that i faced was converting the data types from pandas dataframe and loading it into exasol
as i couldn't add new column to the dataframe !!
it was a little bit strange

that's why i added a temp columns with random values at the beginning of my script and at the end i deleted them and it works !!!

i don't know the reason but i will send the debug to you soon

this is part my script

#############################################################################################
# Author : saleh ahmed
# Date of creation : 11-03-2019
# Task description : Downloading the file from s3 and loading it into Exasol
#############################################################################################

import boto3
import botocore
import pyexasol
import exa_cred as exa
import Bucket_info as Bk_info
import pandas as pd
import base64
import numpy as np
import logging
import sys


# Settings for logging
logger = logging.getLogger('logger')
logger.setLevel(logging.INFO)
ch = logging.StreamHandler(sys.stdout)
ch.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s [%(levelname)s] - %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
ch.setFormatter(formatter)
logger.addHandler(ch)

logger.info("Start script ...")


#reading credential info and bucket info from local file
exuser = exa.cred['exuser']
expwd = exa.cred['expwd']
exdsn = exa.cred['exdsn']
schema=exa.cred['schema']

Bucket_name =Bk_info.Bucket['Bucket_name']
C = pyexasol.connect(dsn=exdsn, user=exuser, password=expwd,schema=schema)
logger.info(" Connected to the Exasol ")

#debug=True is to track the errors

#Bucket information
Bucket = Bucket_name
Key = "s3_TABLE.csv"
outPutName = "s3_TABLE.csv"

#Handling exception of not found and downloading the file locally
session = boto3.Session(
    aws_access_key_id='',
    aws_secret_access_key='',
)
s3 = session.resource('s3')
try:
    s3.Bucket(Bucket).download_file(Key, outPutName)
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code'] == "404":
        print("The object does not exist.")
    else:
        raise

#reading the data with pandas
df = pd.read_csv(outPutName, delimiter=';' , header=None,lineterminator='\n')
logger.info(" Pandas dataframe has been read ")

#CREATING TEMP COLUMN TO SOLVE UNKNOW ISSUE WHICH IS RELATED TO THE CONNECTION
sLength = len(df[1])
df[17] = pd.Series(np.random.randn(sLength), index=df.index)


# #checking DataTypes
# before = df.dtypes
# print(before)
# print(df.head(10))


#Changing the data types to be able to import ir
df[0]= df[0].astype(str)
df[1]= df[1].astype(str)
df[2]= pd.to_numeric(df[2])
df[3] = pd.to_datetime(df[3]).dt.date
df[4] = pd.to_datetime(df[4]).dt.date
df[5] = df[5].astype(int)
df[6] = df[6].astype(float)
df[7] = df[7].astype(float)
df[8] = df[8].astype(float)
df[9] = df[9].astype(float)
df[10] = df[10].astype(float)
df[11] = df[11].astype(float)
df[12] = df[12].astype(float)
df[13] = df[13].astype(float)
df[14] = df[14].astype(float)
df[15] = df[15].astype(float)
df[16] = df[16].astype(str)
df[17] = df[17].astype(object)
logger.info(" Datatypes conversion has been successfully Completed    ")

#checking DataTypes
# after = df.dtypes
# print(after)
# print(df.head(10))


#Checking if the data is null or not and then decode it
#Replace the new decoded cell with the old value
for key, value in df[16].iteritems():

    if value == 'nan'  :
        #print(" true it's null")
        df.loc[key, 16]=""
    else:
        #print(" false it's not null ")
        #print(key, value)
        df.loc[key,16]=base64.b64decode(value)
        df.loc[key,16]=df.loc[key][16].decode("utf-8")
        tmp_var=df.loc[key,16]
        #print(tmp_var)
        df.loc[key,16]=df.loc[key,16]
        #print("This is new line \n")
        #print(df.iloc[[1], [16]]

logger.info(" Decoding A COLUMN  ")
#print (df.to_csv(sep=';', index=False, header=False,quotechar='"'))
#df.to_csv('....csv',sep=';', index=False, header=False,quotechar='"')


#Creating column for unknown issue but i have to do that just to skip the issue part
#i will drop the column at the end of the process

# cleanup tmp table
C.execute("TRUNCATE TABLE STAGE_FLAT_FILE_TMP.s3_TABLE;")
logger.info(" cleaning up the Stage Table   ")


sql_create_column= """ 	 ADDING THE TEMP COLUMN PHYSICALLY TO THE TABLE 
 """
C.execute(sql_create_column)

#Import from pandas DataFrame into Exasol table
C.import_from_pandas(df, 's3_TABLE')

#aDDING THE MERGE FOR BOTH THE NEW ROWS AND UPDATED ROWS
MERGE_NEW_AND_UPDATED_ROWS ="""MERGE INTO   TABLE .....
C.execute(MERGE_NEW_AND_UPDATED_ROWS)

 

#Droping useless Column
sql_drop_column= """ ALTER TABLE  "STAGE_FLAT_FILE_TMP"."s3_TABLE" DROP COLUMN TEMP_COLUMN """
C.execute(sql_drop_column)

logger.info(" Data has been inserted into STAGE_FLAT_FILE.s3_TABLE  ")

C.close()

@wildraid i couldn't send the first debug as i solved it and i don't know how to regenerate the bug again

from pyexasol.

littleK0i commented on September 13, 2024

@Salehflaconi , @cyroxx , maybe it is related to this bug in pandas: pandas-dev/pandas#25048

Could you try versions 0.24.0, 0.24.1 and 0.24.2? Maybe it was fixed in the latest version.

I do not have an easy access to Windows machines at this moment.

from pyexasol.

littleK0i commented on September 13, 2024

@cyroxx , please try updating pyexasol to latest version 0.5.5. This should solve the problem for you.

from pyexasol.

littleK0i commented on September 13, 2024

@Salehflaconi , I suspect you have genuine mismatch of number of columns between DataFrame and target table in Exasol. Try using df.info() function to check real number of columns before calling .import_from_pandas. I think you'll find an extra columns, which probably come from "Index" or as result of incorrect parsing of a source file.

from pyexasol.

littleK0i commented on September 13, 2024

@Salehflaconi , @cyroxx , hi guys.

Is this issue still relevant?
Thank you.

from pyexasol.

Pandas import error about pyexasol HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent