Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading date'9999-12-31' from Oracle leads to 'out of range DateTime' error #644

Closed
lmocsi opened this issue Jun 6, 2024 · 16 comments
Closed
Labels
bug Something isn't working

Comments

@lmocsi
Copy link

lmocsi commented Jun 6, 2024

What language are you using?

Python

What version are you using?

3.9.13

What database are you using?

Oracle

What dataframe are you using?

polars

Can you describe your bug?

If reading date'9999-12-31' from Oracle I get the error: 'out of range DateTime'

What are the steps to reproduce the behavior?

Described here in details: pola-rs/polars#16768

Database setup if the error only happens on specific data or data type
create table my_test nologging pctfree 0 as
select
   'aaa' prod,
   date'2000-01-01' start_date,
   date'9999-12-31' end_date
from dual
Example query / code
import polars as pl

uri = f"oracle://{user}:{password}@{host}:{port}/{service_name}"
v_sql = "select * from my_test"
df = pl.read_database_uri(
    v_sql, uri, engine="connectorx"
)

What is the error?

thread '' panicked at /__w/connector-x/connector-x/connectorx/src/destinations/arrow2/arrow_assoc.rs:312:36:
out of range DateTime


PanicException Traceback (most recent call last)
/opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/io/database/_utils.py in _read_sql_connectorx(query, connection_uri, partition_on, partition_range, partition_num, protocol, schema_overrides)
53 try:
---> 54 tbl = cx.read_sql(
55 conn=connection_uri,

/opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/connectorx/init.py in read_sql(conn, query, return_type, protocol, partition_on, partition_range, partition_num, index_col)
385
--> 386 result = _read_sql(
387 conn,

PanicException: out of range DateTime

The above exception was the direct cause of the following exception:

PanicException Traceback (most recent call last)
/tmp/1000780000/ipykernel_12937/202598540.py in
4 v_sql = "select * from my_test"
----> 5 df = pl.read_database_uri(
6 v_sql, uri, engine="connectorx",

/opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/io/database/functions.py in read_database_uri(query, uri, partition_on, partition_range, partition_num, protocol, engine, schema_overrides, execute_options)
417 msg = "the 'connectorx' engine does not support use of execute_options"
418 raise ValueError(msg)
--> 419 return _read_sql_connectorx(
420 query,
421 connection_uri=uri,

/opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/io/database/_utils.py in _read_sql_connectorx(query, connection_uri, partition_on, partition_range, partition_num, protocol, schema_overrides)
64 # basic sanitisation of /user:pass/ credentials exposed in connectorx errs
65 errmsg = re.sub("://[^:]+:[^:]+@", "://:@", str(err))
---> 66 raise type(err)(errmsg) from err
67
68 return from_arrow(tbl, schema_overrides=schema_overrides) # type: ignore[return-value]

PanicException: out of range DateTime

@lmocsi lmocsi added the bug Something isn't working label Jun 6, 2024
@auyer
Copy link
Contributor

auyer commented Jun 7, 2024

I have the same issue, but with Postgres.
This started happening only on the new 0.3.3 release version.

@wangxiaoying
Copy link
Contributor

Hi @lmocsi , thanks for opening the issue with a detailed example provided. In connectorx we parse the data to NaiveDateTime rust type first and then fill in to arrow Timestamp with nano second unit. The valid range of parsing rust NaiveDateTime to nano second is 1677-09-21T00:12:43.145224192 and 2262-04-11T23:47:16.854775807, and we will panic if the data is not in this range.

@wangxiaoying
Copy link
Contributor

I have the same issue, but with Postgres. This started happening only on the new 0.3.3 release version.

Hi @auyer , I tried the example of timestamp 9999-12-31 00:00:01 and it is not supported in both 0.3.3 and 0.3.3-a2 due to the above reason. Can you provide an example that works before 0.3.3 but not 0.3.3? Thanks!

@auyer
Copy link
Contributor

auyer commented Jun 7, 2024

The data here is from a real database.
Python version 3.11.8
return_type=arrow2
protocol=binary

CREATE TABLE date_test (
	date_field timestamp NULL
);


INSERT INTO date_test (date_field) VALUES('2023-03-04 00:00:00.000');
INSERT INTO date_test (date_field) VALUES('2223-03-04 00:00:00.000');
INSERT INTO date_test (date_field) VALUES('4073-04-09 23:59:59.000');

Sending this to read_sql fails in connectorx==0.3.3, but is successful in connectorx==0.3.2

select date_field from date_test where date_field > '2200-01-01';

This is the expected output.

date_field |
---------------------------+
    2223-03-04 00:00:00.000|
    4073-04-09 23:59:59.000|

Sending this to read_sql works fine with both versions.

select date_field from date_test where date_field < '2200-01-01';

@wangxiaoying
Copy link
Contributor

wangxiaoying commented Jun 8, 2024

The data here is from a real database. Python version 3.11.8 return_type=arrow2 protocol=binary

CREATE TABLE date_test (
	date_field timestamp NULL
);


INSERT INTO date_test (date_field) VALUES('2023-03-04 00:00:00.000');
INSERT INTO date_test (date_field) VALUES('2223-03-04 00:00:00.000');
INSERT INTO date_test (date_field) VALUES('4073-04-09 23:59:59.000');

Sending this to read_sql fails in connectorx==0.3.3, but is successful in connectorx==0.3.2

select date_field from date_test where date_field > '2200-01-01';

This is the expected output.

date_field |
---------------------------+
    2223-03-04 00:00:00.000|
    4073-04-09 23:59:59.000|

Sending this to read_sql works fine with both versions.

select date_field from date_test where date_field < '2200-01-01';

Hi @auyer , thanks for the quick reply!

Indeed the example does not throw an error on 0.3.2, however the result seems to be wrong. Here is the table:

tpch=# select * from test_date
tpch-# ;
     date_field
---------------------
 2023-03-04 00:00:00
 2223-03-04 00:00:00
 4073-04-09 23:59:59
(3 rows)

and this is the result arrow table I got from SELECT * from date_test:

pyarrow.Table
date_field: timestamp[ns]
----
date_field: [[2023-03-04 00:00:00.000000000,2223-03-04 00:00:00.000000000,1735-01-22 01:41:44.161793536]]

As you can see, 4073-04-09 is parsed into 1735-01-22 without any error indication. In this case I think we probably prefer to panic instead of having the wrong result without noticing.

@lmocsi
Copy link
Author

lmocsi commented Jun 9, 2024

Hi @lmocsi , thanks for opening the issue with a detailed example provided. In connectorx we parse the data to NaiveDateTime rust type first and then fill in to arrow Timestamp with nano second unit. The valid range of parsing rust NaiveDateTime to nano second is 1677-09-21T00:12:43.145224192 and 2262-04-11T23:47:16.854775807, and we will panic if the data is not in this range.

Someone in another issue wrote this: "This is a connectorx issue as it reads timestamps in nanoseconds (which it shouldn't, as few databases actually support nanosecond precision - the majority top-out at microseconds)."

datetime.datetime(9999, 12, 31) is a valid date in python. How / when will connectorx support it?

@wangxiaoying
Copy link
Contributor

wangxiaoying commented Jun 10, 2024

Someone in another issue wrote this: "This is a connectorx issue as it reads timestamps in nanoseconds (which it shouldn't, as few databases actually support nanosecond precision - the majority top-out at microseconds)."

datetime.datetime(9999, 12, 31) is a valid date in python. How / when will connectorx support it?

Thanks @lmocsi , I think this is a valid point. We probably need to add different destination timestamp types in terms of different precisions and mapping databases that only support microseconds to the new type. Using arrow2 and postgres for example, we need to

  1. add a new type (e.g. DateTimeTzMicro that wraps up a DateTime<Utc>) here
  2. implement the corresponding functions needed for the new DateTimeTzMicro type similar with DateTime but in micros seconds instead of nano seconds
  3. change the type transport from postgres to arrow2 on TimestampTz from the current one to the newly introduced type here.

I can try to do this week (but I cannot guarantee any timelines for finishing this if I found other issues of this solution). You are also very welcome to contribute if you are interested : )

@wangxiaoying
Copy link
Contributor

Can you try out the new alpha version 0.3.4a1 to see whether it fixes the issue?

@lmocsi
Copy link
Author

lmocsi commented Jun 17, 2024

Unfortunately connectorx==0.3.4a1 did not fix the issue. :(
I still get the following error (with polars==0.20.31):

PanicException: out of range DateTime

The above exception was the direct cause of the following exception:

PanicException                            Traceback (most recent call last)
/tmp/1000780000/ipykernel_850/2531851850.py in <module>
      4 v_sql = "select * from my_test"
----> 5 df = pl.read_database_uri(
      6     v_sql, uri, engine="connectorx",
      7     #schema_overrides={'end_date': pl.String}

/opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/io/database/functions.py in read_database_uri(query, uri, partition_on, partition_range, partition_num, protocol, engine, schema_overrides, execute_options)
    417             msg = "the 'connectorx' engine does not support use of `execute_options`"
    418             raise ValueError(msg)
--> 419         return _read_sql_connectorx(
    420             query,
    421             connection_uri=uri,

/opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/io/database/_utils.py in _read_sql_connectorx(query, connection_uri, partition_on, partition_range, partition_num, protocol, schema_overrides)
     64         # basic sanitisation of /user:pass/ credentials exposed in connectorx errs
     65         errmsg = re.sub("://[^:]+:[^:]+@", "://***:***@", str(err))
---> 66         raise type(err)(errmsg) from err
     67 
     68     return from_arrow(tbl, schema_overrides=schema_overrides)  # type: ignore[return-value]

@wangxiaoying wangxiaoying reopened this Jun 18, 2024
wangxiaoying added a commit that referenced this issue Jun 18, 2024
@wangxiaoying
Copy link
Contributor

Unfortunately connectorx==0.3.4a1 did not fix the issue. :( I still get the following error (with polars==0.20.31):

PanicException: out of range DateTime

The above exception was the direct cause of the following exception:

PanicException                            Traceback (most recent call last)
/tmp/1000780000/ipykernel_850/2531851850.py in <module>
      4 v_sql = "select * from my_test"
----> 5 df = pl.read_database_uri(
      6     v_sql, uri, engine="connectorx",
      7     #schema_overrides={'end_date': pl.String}

/opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/io/database/functions.py in read_database_uri(query, uri, partition_on, partition_range, partition_num, protocol, engine, schema_overrides, execute_options)
    417             msg = "the 'connectorx' engine does not support use of `execute_options`"
    418             raise ValueError(msg)
--> 419         return _read_sql_connectorx(
    420             query,
    421             connection_uri=uri,

/opt/conda/envs/Python-3.9-Premium/lib/python3.9/site-packages/polars/io/database/_utils.py in _read_sql_connectorx(query, connection_uri, partition_on, partition_range, partition_num, protocol, schema_overrides)
     64         # basic sanitisation of /user:pass/ credentials exposed in connectorx errs
     65         errmsg = re.sub("://[^:]+:[^:]+@", "://***:***@", str(err))
---> 66         raise type(err)(errmsg) from err
     67 
     68     return from_arrow(tbl, schema_overrides=schema_overrides)  # type: ignore[return-value]

ops, missed the date type in oralce. How about 0.3.4a2?

@lmocsi
Copy link
Author

lmocsi commented Jun 19, 2024

Yeah, 0.3.4a2 solved the problem!
Thx!

@lmocsi
Copy link
Author

lmocsi commented Jun 20, 2024

Will there be a 0.3.4 version out of it?

@Miayl
Copy link

Miayl commented Jul 11, 2024

Hi @wangxiaoying, can this also be fixed for Trino?

@wangxiaoying
Copy link
Contributor

wangxiaoying commented Jul 15, 2024

Hi @Miayl , according to trino's doc, the timestamp type is by default millisecond and could vary according to the parameter P. I believe we can apply similar code change like oracle's fix. Please feel free to submit a PR for this!

@verystrongjoe
Copy link

Hi @wangxiaoying, I am using Snowflake. is it applied to Snowflake, too? When do I use this version in pip repository?

@nbosc
Copy link

nbosc commented Nov 20, 2024

Hi, same error with a MySQL db despite having version 0.4.0 installed. The date is '0001-01-01', weird but correct as far as I know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants