Comments (15)
Hi @lmocsi , thanks for opening the issue with a detailed example provided. In connectorx we parse the data to
NaiveDateTime
rust type first and then fill in to arrowTimestamp
with nano second unit. The valid range of parsing rust NaiveDateTime to nano second is 1677-09-21T00:12:43.145224192 and 2262-04-11T23:47:16.854775807, and we will panic if the data is not in this range.
Someone in another issue wrote this: "This is a connectorx issue as it reads timestamps in nanoseconds (which it shouldn't, as few databases actually support nanosecond precision - the majority top-out at microseconds)."
datetime.datetime(9999, 12, 31) is a valid date in python. How / when will connectorx support it?
from connector-x.
The data here is from a real database.
Python version 3.11.8
return_type=arrow2
protocol=binary
CREATE TABLE date_test (
date_field timestamp NULL
);
INSERT INTO date_test (date_field) VALUES('2023-03-04 00:00:00.000');
INSERT INTO date_test (date_field) VALUES('2223-03-04 00:00:00.000');
INSERT INTO date_test (date_field) VALUES('4073-04-09 23:59:59.000');
Sending this to read_sql
fails in connectorx==0.3.3
, but is successful in connectorx==0.3.2
select date_field from date_test where date_field > '2200-01-01';
This is the expected output.
date_field |
---------------------------+
2223-03-04 00:00:00.000|
4073-04-09 23:59:59.000|
Sending this to read_sql
works fine with both versions.
select date_field from date_test where date_field < '2200-01-01';
from connector-x.
Yeah, 0.3.4a2 solved the problem!
Thx!
from connector-x.
I have the same issue, but with Postgres.
This started happening only on the new 0.3.3
release version.
from connector-x.
Hi @lmocsi , thanks for opening the issue with a detailed example provided. In connectorx we parse the data to NaiveDateTime
rust type first and then fill in to arrow Timestamp
with nano second unit. The valid range of parsing rust NaiveDateTime to nano second is 1677-09-21T00:12:43.145224192 and 2262-04-11T23:47:16.854775807, and we will panic if the data is not in this range.
from connector-x.
I have the same issue, but with Postgres. This started happening only on the new
0.3.3
release version.
Hi @auyer , I tried the example of timestamp 9999-12-31 00:00:01
and it is not supported in both 0.3.3 and 0.3.3-a2 due to the above reason. Can you provide an example that works before 0.3.3 but not 0.3.3? Thanks!
from connector-x.
The data here is from a real database. Python version 3.11.8 return_type=
arrow2
protocol=binary
CREATE TABLE date_test ( date_field timestamp NULL ); INSERT INTO date_test (date_field) VALUES('2023-03-04 00:00:00.000'); INSERT INTO date_test (date_field) VALUES('2223-03-04 00:00:00.000'); INSERT INTO date_test (date_field) VALUES('4073-04-09 23:59:59.000');Sending this to
read_sql
fails inconnectorx==0.3.3
, but is successful inconnectorx==0.3.2
select date_field from date_test where date_field > '2200-01-01';This is the expected output.
date_field | ---------------------------+ 2223-03-04 00:00:00.000| 4073-04-09 23:59:59.000|Sending this to
read_sql
works fine with both versions.select date_field from date_test where date_field < '2200-01-01';
Hi @auyer , thanks for the quick reply!
Indeed the example does not throw an error on 0.3.2
, however the result seems to be wrong. Here is the table:
tpch=# select * from test_date
tpch-# ;
date_field
---------------------
2023-03-04 00:00:00
2223-03-04 00:00:00
4073-04-09 23:59:59
(3 rows)
and this is the result arrow table I got from SELECT * from date_test
:
pyarrow.Table
date_field: timestamp[ns]
----
date_field: [[2023-03-04 00:00:00.000000000,2223-03-04 00:00:00.000000000,1735-01-22 01:41:44.161793536]]
As you can see, 4073-04-09
is parsed into 1735-01-22
without any error indication. In this case I think we probably prefer to panic instead of having the wrong result without noticing.
from connector-x.
Someone in another issue wrote this: "This is a connectorx issue as it reads timestamps in nanoseconds (which it shouldn't, as few databases actually support nanosecond precision - the majority top-out at microseconds)."
datetime.datetime(9999, 12, 31) is a valid date in python. How / when will connectorx support it?
Thanks @lmocsi , I think this is a valid point. We probably need to add different destination timestamp types in terms of different precisions and mapping databases that only support microseconds to the new type. Using arrow2
and postgres
for example, we need to
- add a new type (e.g.
DateTimeTzMicro
that wraps up aDateTime<Utc>
) here - implement the corresponding functions needed for the new
DateTimeTzMicro
type similar with DateTime but in micros seconds instead of nano seconds - change the type transport from postgres to arrow2 on
TimestampTz
from the current one to the newly introduced type here.
I can try to do this week (but I cannot guarantee any timelines for finishing this if I found other issues of this solution). You are also very welcome to contribute if you are interested : )
from connector-x.
Can you try out the new alpha version 0.3.4a1
to see whether it fixes the issue?
from connector-x.
Unfortunately connectorx==0.3.4a1 did not fix the issue. :(
I still get the following error (with polars==0.20.31):
PanicException: out of range DateTime
The above exception was the direct cause of the following exception:
PanicException Traceback (most recent call last)
/tmp/1000780000/ipykernel_850/2531851850.py in <module>
4 v_sql = "select * from my_test"
----> 5 df = pl.read_database_uri(
6 v_sql, uri, engine="connectorx",
7 #schema_overrides={'end_date': pl.String}
/opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/io/database/functions.py in read_database_uri(query, uri, partition_on, partition_range, partition_num, protocol, engine, schema_overrides, execute_options)
417 msg = "the 'connectorx' engine does not support use of `execute_options`"
418 raise ValueError(msg)
--> 419 return _read_sql_connectorx(
420 query,
421 connection_uri=uri,
/opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/io/database/_utils.py in _read_sql_connectorx(query, connection_uri, partition_on, partition_range, partition_num, protocol, schema_overrides)
64 # basic sanitisation of /user:pass/ credentials exposed in connectorx errs
65 errmsg = re.sub("://[^:]+:[^:]+@", "://***:***@", str(err))
---> 66 raise type(err)(errmsg) from err
67
68 return from_arrow(tbl, schema_overrides=schema_overrides) # type: ignore[return-value]
from connector-x.
Unfortunately connectorx==0.3.4a1 did not fix the issue. :( I still get the following error (with polars==0.20.31):
PanicException: out of range DateTime The above exception was the direct cause of the following exception: PanicException Traceback (most recent call last) /tmp/1000780000/ipykernel_850/2531851850.py in <module> 4 v_sql = "select * from my_test" ----> 5 df = pl.read_database_uri( 6 v_sql, uri, engine="connectorx", 7 #schema_overrides={'end_date': pl.String} /opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/io/database/functions.py in read_database_uri(query, uri, partition_on, partition_range, partition_num, protocol, engine, schema_overrides, execute_options) 417 msg = "the 'connectorx' engine does not support use of `execute_options`" 418 raise ValueError(msg) --> 419 return _read_sql_connectorx( 420 query, 421 connection_uri=uri, /opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/io/database/_utils.py in _read_sql_connectorx(query, connection_uri, partition_on, partition_range, partition_num, protocol, schema_overrides) 64 # basic sanitisation of /user:pass/ credentials exposed in connectorx errs 65 errmsg = re.sub("://[^:]+:[^:]+@", "://***:***@", str(err)) ---> 66 raise type(err)(errmsg) from err 67 68 return from_arrow(tbl, schema_overrides=schema_overrides) # type: ignore[return-value]
ops, missed the date type in oralce. How about 0.3.4a2
?
from connector-x.
Will there be a 0.3.4 version out of it?
from connector-x.
Hi @wangxiaoying, can this also be fixed for Trino?
from connector-x.
Hi @Miayl , according to trino's doc, the timestamp type is by default millisecond and could vary according to the parameter P. I believe we can apply similar code change like oracle's fix. Please feel free to submit a PR for this!
from connector-x.
Hi @wangxiaoying, I am using Snowflake. is it applied to Snowflake, too? When do I use this version in pip repository?
from connector-x.
Related Issues (20)
- Parallel Data Reading in Pandas Does Not Support Ordering in Queries (PostgreSQL) HOT 3
- SQL Server Datetime read_sql leads to PanicException HOT 5
- Add OID data type in Postgres-Pandas Type Mapping
- Add geography data type in Postgres-Pandas Type Mapping
- No 0.3.3 Python wheel for Linux on ARM HOT 2
- Does ConnectorX have support for azure managed identity HOT 1
- How to use self signed cert with trino connection HOT 1
- Set TRANSACTION ISOLATION LEVEL ?
- how to build a whl file dependance a lower version in glibc HOT 2
- Does not seem to use .pgpass file HOT 2
- Raise the proper exception in connector-x python when importing module
- Any way to execute commands prior to query? HOT 1
- Is there any plan to make this available for mongodb databases? HOT 2
- How to set database connection timeout for Oracle database?
- Any possible to update the version of polars?
- PostgreSQL: Error while mapping to Arrow types, if table has array columns HOT 3
- python connectorx query database data fail "Packets out of sync"
- MySQL ULongLong should be treated as integer
- Unable to allocate 161. GiB for array with panicked failed HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from connector-x.