Comments (16)
It is a bit weird that specific error message is not visible. Did you cut it out by any chance? It should be somewhere in this block:
'] (Session: xxxx)
dsn => xxxx
user => xxxx
schema => xxxx
code => xxxx
session_id => xxxx
query => xxxx
AT 'Server_IP:port' FILE '000.csv'
)
Also, you do not really need pandas to import this data. Just try to call .import_from_file
, if it is a proper CSV file already.
Maybe you do not even need http transport at all. It is possible to import data directly from HTTP address of file in bucket.
from pyexasol.
No i didn't cut the error
which is weird as i don't know how to google it
I am using pandas as i will handle some data in the file
that's why i need to download the file first and then doing some transformation on data
from pyexasol.
Hmm, ok. Let's try to find actual error in EXA_DBA_AUDIT_SQL.
If Auditing
flag is enabled, try to find failing IMPORT query in this system view. It provides a column ERROR_TEXT. Let's see what it says.
from pyexasol.
If it does not help, could you also try to enable debug=True
connection option and post last request and response here?
Let's see if this error is truly anonymous. Maybe you discovered completely new type of execute
responses which was not seen before.
from pyexasol.
Do you have a firewall on your local laptop, which blocks connections on custom ports? Could you try to disable all firewalls?
from pyexasol.
I spent yesterday working on it
The problem was mismatching datatypes between the default Dataframe and the data itself
don't know why the error message was weird !!
Thank you for your help
from pyexasol.
@Salehflaconi , could you attach a minimal reproducible code (1 row of data in pandas and table structure) or request log produced by connection option debug=True
?
Lack of error message is not normal in such a trivial case. I would be happy to investigate it.
Thank you.
from pyexasol.
Of course i will be happy to do that with you
but would you please help me to enable the debuging as i didn't do it before
i also enabled the audting but it didn't show any error
How i solved part of the errors :
I chengded the datatypes of each column of the dataframes as
#Changing the data types to be able to import
df[0]= df[0].astype(str)
df[1]= df[1].astype(str)
df[2]= pd.to_numeric(df[2])
df[3] = pd.to_datetime(df[3]).dt.date
df[4] = pd.to_datetime(df[4]).dt.date
df[5] = df[5].astype(int)
df[6] = df[6].astype(float)
df[7] = df[7].astype(float)
df[8] = df[8].astype(float)
df[9] = df[9].astype(float)
df[10] = df[10].astype(float)
df[11] = df[11].astype(float)
df[12] = df[12].astype(float)
df[13] = df[13].astype(float)
df[14] = df[14].astype(float)
df[15] = df[15].astype(float)
df[16] = df[16].astype(str)
And now
I am facing a new issue which is handling decoded64 column
which i converted it to
df[16] = df[16].astype(str)
Sample of my Data is
b'
I set the delimiter as
df = pd.read_csv(outPutName, delimiter=';', skipinitialspace=True, engine='python', quotechar ='"' , header=None)
but the delimiter now is not "
and my string as XML contains a lot of special characters
And the Error is
Traceback (most recent call last):
File "C:/saleh/Programs/Git/bi-python/Download_Bonsai_from_S3_to_DWH/Bonsai.py", line 93, in
C.import_from_pandas(df.iloc[[1], [16]], 'BONSAI_APK')
File "......\pyexasol\connection.py", line 213, in import_from_pandas
return self.import_from_callback(cb.import_from_pandas, src, table, callback_params)
File "......\pyexasol\connection.py", line 300, in import_from_callback
raise sql_thread.exc
File "......\pyexasol\connection.py", line 290, in import_from_callback
sql_thread.join()
File "......\pyexasol\http_transport.py", line 53, in join
raise self.exc
File "......\pyexasol\http_transport.py", line 37, in run
self.run_sql()
File "......\pyexasol\http_transport.py", line 165, in run_sql
self.connection.execute(query)
File "......\pyexasol\connection.py", line 140, in execute
self.last_stmt = self.cls_statement(self, query, query_params)
File "......\pyexasol\statement.py", line 47, in init
self._execute()
File "......\pyexasol\statement.py", line 141, in _execute
'sqlText': self.query,
File "......\pyexasol\connection.py", line 442, in req
raise cls_err(self, req['sqlText'], ret['exception']['sqlCode'], ret['exception']['text'])
pyexasol.exceptions.ExaQueryError:
(
message => ETL-2105: Error while parsing row=0 (starting from 0) [CSV Parser found at byte 993 (starting with 0 at the beginning of the row) of 993 a single field delimiter in enclosed field or not correct enclosed field in file '000.csv'. Please check for unescaped field delimiters in data fields (they have to be escaped) and correct enclosing of this field] (Session: xxxxx)
dsn => xxxx
user => xxxx
schema => xxxx
code => xxxx
session_id => xxxx
query => xxxx
AT 'IP:port FILE '000.csv'
)
from pyexasol.
Thank you.
In order to enable debugging, just add an extra connection option debug=True
to pyexasol.connect()
:
Example:
C = pyexasol.connect(dsn='...', user='...', password='...', debug=True)
from pyexasol.
I had the same error. For me, downgrading to pandas 0.23.4 helped. So maybe there is an issue with pyexasol+pandas 0.24.x?
from pyexasol.
@cyroxx i will share with you my solution today
@wildraid there is another bug i will share today it with you
as i created work around to complete my task as i just finished it !
from pyexasol.
The problem that i faced was converting the data types from pandas dataframe and loading it into exasol
as i couldn't add new column to the dataframe !!
it was a little bit strange
that's why i added a temp columns with random values at the beginning of my script and at the end i deleted them and it works !!!
i don't know the reason but i will send the debug to you soon
this is part my script
#############################################################################################
# Author : saleh ahmed
# Date of creation : 11-03-2019
# Task description : Downloading the file from s3 and loading it into Exasol
#############################################################################################
import boto3
import botocore
import pyexasol
import exa_cred as exa
import Bucket_info as Bk_info
import pandas as pd
import base64
import numpy as np
import logging
import sys
# Settings for logging
logger = logging.getLogger('logger')
logger.setLevel(logging.INFO)
ch = logging.StreamHandler(sys.stdout)
ch.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s [%(levelname)s] - %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
ch.setFormatter(formatter)
logger.addHandler(ch)
logger.info("Start script ...")
#reading credential info and bucket info from local file
exuser = exa.cred['exuser']
expwd = exa.cred['expwd']
exdsn = exa.cred['exdsn']
schema=exa.cred['schema']
Bucket_name =Bk_info.Bucket['Bucket_name']
C = pyexasol.connect(dsn=exdsn, user=exuser, password=expwd,schema=schema)
logger.info(" Connected to the Exasol ")
#debug=True is to track the errors
#Bucket information
Bucket = Bucket_name
Key = "s3_TABLE.csv"
outPutName = "s3_TABLE.csv"
#Handling exception of not found and downloading the file locally
session = boto3.Session(
aws_access_key_id='',
aws_secret_access_key='',
)
s3 = session.resource('s3')
try:
s3.Bucket(Bucket).download_file(Key, outPutName)
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
#reading the data with pandas
df = pd.read_csv(outPutName, delimiter=';' , header=None,lineterminator='\n')
logger.info(" Pandas dataframe has been read ")
#CREATING TEMP COLUMN TO SOLVE UNKNOW ISSUE WHICH IS RELATED TO THE CONNECTION
sLength = len(df[1])
df[17] = pd.Series(np.random.randn(sLength), index=df.index)
# #checking DataTypes
# before = df.dtypes
# print(before)
# print(df.head(10))
#Changing the data types to be able to import ir
df[0]= df[0].astype(str)
df[1]= df[1].astype(str)
df[2]= pd.to_numeric(df[2])
df[3] = pd.to_datetime(df[3]).dt.date
df[4] = pd.to_datetime(df[4]).dt.date
df[5] = df[5].astype(int)
df[6] = df[6].astype(float)
df[7] = df[7].astype(float)
df[8] = df[8].astype(float)
df[9] = df[9].astype(float)
df[10] = df[10].astype(float)
df[11] = df[11].astype(float)
df[12] = df[12].astype(float)
df[13] = df[13].astype(float)
df[14] = df[14].astype(float)
df[15] = df[15].astype(float)
df[16] = df[16].astype(str)
df[17] = df[17].astype(object)
logger.info(" Datatypes conversion has been successfully Completed ")
#checking DataTypes
# after = df.dtypes
# print(after)
# print(df.head(10))
#Checking if the data is null or not and then decode it
#Replace the new decoded cell with the old value
for key, value in df[16].iteritems():
if value == 'nan' :
#print(" true it's null")
df.loc[key, 16]=""
else:
#print(" false it's not null ")
#print(key, value)
df.loc[key,16]=base64.b64decode(value)
df.loc[key,16]=df.loc[key][16].decode("utf-8")
tmp_var=df.loc[key,16]
#print(tmp_var)
df.loc[key,16]=df.loc[key,16]
#print("This is new line \n")
#print(df.iloc[[1], [16]]
logger.info(" Decoding A COLUMN ")
#print (df.to_csv(sep=';', index=False, header=False,quotechar='"'))
#df.to_csv('....csv',sep=';', index=False, header=False,quotechar='"')
#Creating column for unknown issue but i have to do that just to skip the issue part
#i will drop the column at the end of the process
# cleanup tmp table
C.execute("TRUNCATE TABLE STAGE_FLAT_FILE_TMP.s3_TABLE;")
logger.info(" cleaning up the Stage Table ")
sql_create_column= """ ADDING THE TEMP COLUMN PHYSICALLY TO THE TABLE
"""
C.execute(sql_create_column)
#Import from pandas DataFrame into Exasol table
C.import_from_pandas(df, 's3_TABLE')
#aDDING THE MERGE FOR BOTH THE NEW ROWS AND UPDATED ROWS
MERGE_NEW_AND_UPDATED_ROWS ="""MERGE INTO TABLE .....
C.execute(MERGE_NEW_AND_UPDATED_ROWS)
#Droping useless Column
sql_drop_column= """ ALTER TABLE "STAGE_FLAT_FILE_TMP"."s3_TABLE" DROP COLUMN TEMP_COLUMN """
C.execute(sql_drop_column)
logger.info(" Data has been inserted into STAGE_FLAT_FILE.s3_TABLE ")
C.close()
@wildraid i couldn't send the first debug as i solved it and i don't know how to regenerate the bug again
from pyexasol.
@Salehflaconi , @cyroxx , maybe it is related to this bug in pandas: pandas-dev/pandas#25048
Could you try versions 0.24.0, 0.24.1 and 0.24.2? Maybe it was fixed in the latest version.
I do not have an easy access to Windows machines at this moment.
from pyexasol.
@cyroxx , please try updating pyexasol to latest version 0.5.5
. This should solve the problem for you.
from pyexasol.
@Salehflaconi , I suspect you have genuine mismatch of number of columns between DataFrame and target table in Exasol. Try using df.info()
function to check real number of columns before calling .import_from_pandas
. I think you'll find an extra columns, which probably come from "Index" or as result of incorrect parsing of a source file.
from pyexasol.
@Salehflaconi , @cyroxx , hi guys.
Is this issue still relevant?
Thank you.
from pyexasol.
Related Issues (20)
- Mixed type error message HOT 3
- Support for Polars HOT 29
- Migrate project setup to poetry
- export_to_file running out of memory HOT 8
- 🐞WebSocket connection isn't properly closed in case of process termination HOT 4
- Allow orjson as a serialization framework HOT 2
- update table function - pandas style HOT 1
- Mistake in documentation of import_from_iterable HOT 2
- Enabling connection encryption by default on PyExasol HOT 5
- Add job name to .github/workflows/examples.yml HOT 5
- Pyexasol with SaaS Authentication error HOT 5
- Could not connect to Exasol: [SSL: LENGTH_MISMATCH] length mismatch (_ssl.c:1129) HOT 4
- Document that Connection sharing with multiprocessing doesn't work with TLS Encryption HOT 6
- Connection from pyexasol to Exasol SaaS HOT 20
- pyexasol not raising an Exception on Exasol error HOT 10
- profiling a script with pyexasol with scalene HOT 6
- bigint and certain decimals returned as string HOT 5
- respect `no_proxy` environment variable HOT 2
- Pyexasol raises FutureWarning HOT 4
- decimal separator is wrong when Exasol Column is DOUBLE and NLS_NUMERIC_CHARACTERS = ',.' HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyexasol.