Git Product home page Git Product logo

Comments (15)

lmocsi avatar lmocsi commented on August 16, 2024 2

Hi @lmocsi , thanks for opening the issue with a detailed example provided. In connectorx we parse the data to NaiveDateTime rust type first and then fill in to arrow Timestamp with nano second unit. The valid range of parsing rust NaiveDateTime to nano second is 1677-09-21T00:12:43.145224192 and 2262-04-11T23:47:16.854775807, and we will panic if the data is not in this range.

Someone in another issue wrote this: "This is a connectorx issue as it reads timestamps in nanoseconds (which it shouldn't, as few databases actually support nanosecond precision - the majority top-out at microseconds)."

datetime.datetime(9999, 12, 31) is a valid date in python. How / when will connectorx support it?

from connector-x.

auyer avatar auyer commented on August 16, 2024 1

The data here is from a real database.
Python version 3.11.8
return_type=arrow2
protocol=binary

CREATE TABLE date_test (
	date_field timestamp NULL
);


INSERT INTO date_test (date_field) VALUES('2023-03-04 00:00:00.000');
INSERT INTO date_test (date_field) VALUES('2223-03-04 00:00:00.000');
INSERT INTO date_test (date_field) VALUES('4073-04-09 23:59:59.000');

Sending this to read_sql fails in connectorx==0.3.3, but is successful in connectorx==0.3.2

select date_field from date_test where date_field > '2200-01-01';

This is the expected output.

date_field |
---------------------------+
    2223-03-04 00:00:00.000|
    4073-04-09 23:59:59.000|

Sending this to read_sql works fine with both versions.

select date_field from date_test where date_field < '2200-01-01';

from connector-x.

lmocsi avatar lmocsi commented on August 16, 2024 1

Yeah, 0.3.4a2 solved the problem!
Thx!

from connector-x.

auyer avatar auyer commented on August 16, 2024

I have the same issue, but with Postgres.
This started happening only on the new 0.3.3 release version.

from connector-x.

wangxiaoying avatar wangxiaoying commented on August 16, 2024

Hi @lmocsi , thanks for opening the issue with a detailed example provided. In connectorx we parse the data to NaiveDateTime rust type first and then fill in to arrow Timestamp with nano second unit. The valid range of parsing rust NaiveDateTime to nano second is 1677-09-21T00:12:43.145224192 and 2262-04-11T23:47:16.854775807, and we will panic if the data is not in this range.

from connector-x.

wangxiaoying avatar wangxiaoying commented on August 16, 2024

I have the same issue, but with Postgres. This started happening only on the new 0.3.3 release version.

Hi @auyer , I tried the example of timestamp 9999-12-31 00:00:01 and it is not supported in both 0.3.3 and 0.3.3-a2 due to the above reason. Can you provide an example that works before 0.3.3 but not 0.3.3? Thanks!

from connector-x.

wangxiaoying avatar wangxiaoying commented on August 16, 2024

The data here is from a real database. Python version 3.11.8 return_type=arrow2 protocol=binary

CREATE TABLE date_test (
	date_field timestamp NULL
);


INSERT INTO date_test (date_field) VALUES('2023-03-04 00:00:00.000');
INSERT INTO date_test (date_field) VALUES('2223-03-04 00:00:00.000');
INSERT INTO date_test (date_field) VALUES('4073-04-09 23:59:59.000');

Sending this to read_sql fails in connectorx==0.3.3, but is successful in connectorx==0.3.2

select date_field from date_test where date_field > '2200-01-01';

This is the expected output.

date_field |
---------------------------+
    2223-03-04 00:00:00.000|
    4073-04-09 23:59:59.000|

Sending this to read_sql works fine with both versions.

select date_field from date_test where date_field < '2200-01-01';

Hi @auyer , thanks for the quick reply!

Indeed the example does not throw an error on 0.3.2, however the result seems to be wrong. Here is the table:

tpch=# select * from test_date
tpch-# ;
     date_field
---------------------
 2023-03-04 00:00:00
 2223-03-04 00:00:00
 4073-04-09 23:59:59
(3 rows)

and this is the result arrow table I got from SELECT * from date_test:

pyarrow.Table
date_field: timestamp[ns]
----
date_field: [[2023-03-04 00:00:00.000000000,2223-03-04 00:00:00.000000000,1735-01-22 01:41:44.161793536]]

As you can see, 4073-04-09 is parsed into 1735-01-22 without any error indication. In this case I think we probably prefer to panic instead of having the wrong result without noticing.

from connector-x.

wangxiaoying avatar wangxiaoying commented on August 16, 2024

Someone in another issue wrote this: "This is a connectorx issue as it reads timestamps in nanoseconds (which it shouldn't, as few databases actually support nanosecond precision - the majority top-out at microseconds)."

datetime.datetime(9999, 12, 31) is a valid date in python. How / when will connectorx support it?

Thanks @lmocsi , I think this is a valid point. We probably need to add different destination timestamp types in terms of different precisions and mapping databases that only support microseconds to the new type. Using arrow2 and postgres for example, we need to

  1. add a new type (e.g. DateTimeTzMicro that wraps up a DateTime<Utc>) here
  2. implement the corresponding functions needed for the new DateTimeTzMicro type similar with DateTime but in micros seconds instead of nano seconds
  3. change the type transport from postgres to arrow2 on TimestampTz from the current one to the newly introduced type here.

I can try to do this week (but I cannot guarantee any timelines for finishing this if I found other issues of this solution). You are also very welcome to contribute if you are interested : )

from connector-x.

wangxiaoying avatar wangxiaoying commented on August 16, 2024

Can you try out the new alpha version 0.3.4a1 to see whether it fixes the issue?

from connector-x.

lmocsi avatar lmocsi commented on August 16, 2024

Unfortunately connectorx==0.3.4a1 did not fix the issue. :(
I still get the following error (with polars==0.20.31):

PanicException: out of range DateTime

The above exception was the direct cause of the following exception:

PanicException                            Traceback (most recent call last)
/tmp/1000780000/ipykernel_850/2531851850.py in <module>
      4 v_sql = "select * from my_test"
----> 5 df = pl.read_database_uri(
      6     v_sql, uri, engine="connectorx",
      7     #schema_overrides={'end_date': pl.String}

/opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/io/database/functions.py in read_database_uri(query, uri, partition_on, partition_range, partition_num, protocol, engine, schema_overrides, execute_options)
    417             msg = "the 'connectorx' engine does not support use of `execute_options`"
    418             raise ValueError(msg)
--> 419         return _read_sql_connectorx(
    420             query,
    421             connection_uri=uri,

/opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/io/database/_utils.py in _read_sql_connectorx(query, connection_uri, partition_on, partition_range, partition_num, protocol, schema_overrides)
     64         # basic sanitisation of /user:pass/ credentials exposed in connectorx errs
     65         errmsg = re.sub("://[^:]+:[^:]+@", "://***:***@", str(err))
---> 66         raise type(err)(errmsg) from err
     67 
     68     return from_arrow(tbl, schema_overrides=schema_overrides)  # type: ignore[return-value]

from connector-x.

wangxiaoying avatar wangxiaoying commented on August 16, 2024

Unfortunately connectorx==0.3.4a1 did not fix the issue. :( I still get the following error (with polars==0.20.31):

PanicException: out of range DateTime

The above exception was the direct cause of the following exception:

PanicException                            Traceback (most recent call last)
/tmp/1000780000/ipykernel_850/2531851850.py in <module>
      4 v_sql = "select * from my_test"
----> 5 df = pl.read_database_uri(
      6     v_sql, uri, engine="connectorx",
      7     #schema_overrides={'end_date': pl.String}

/opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/io/database/functions.py in read_database_uri(query, uri, partition_on, partition_range, partition_num, protocol, engine, schema_overrides, execute_options)
    417             msg = "the 'connectorx' engine does not support use of `execute_options`"
    418             raise ValueError(msg)
--> 419         return _read_sql_connectorx(
    420             query,
    421             connection_uri=uri,

/opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/io/database/_utils.py in _read_sql_connectorx(query, connection_uri, partition_on, partition_range, partition_num, protocol, schema_overrides)
     64         # basic sanitisation of /user:pass/ credentials exposed in connectorx errs
     65         errmsg = re.sub("://[^:]+:[^:]+@", "://***:***@", str(err))
---> 66         raise type(err)(errmsg) from err
     67 
     68     return from_arrow(tbl, schema_overrides=schema_overrides)  # type: ignore[return-value]

ops, missed the date type in oralce. How about 0.3.4a2?

from connector-x.

lmocsi avatar lmocsi commented on August 16, 2024

Will there be a 0.3.4 version out of it?

from connector-x.

Miayl avatar Miayl commented on August 16, 2024

Hi @wangxiaoying, can this also be fixed for Trino?

from connector-x.

wangxiaoying avatar wangxiaoying commented on August 16, 2024

Hi @Miayl , according to trino's doc, the timestamp type is by default millisecond and could vary according to the parameter P. I believe we can apply similar code change like oracle's fix. Please feel free to submit a PR for this!

from connector-x.

verystrongjoe avatar verystrongjoe commented on August 16, 2024

Hi @wangxiaoying, I am using Snowflake. is it applied to Snowflake, too? When do I use this version in pip repository?

from connector-x.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.