Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas.Timestamp timezone and unit gets modified when creating new Polars Dataframe #18127

Open
2 tasks done
u3Izx9ql7vW4 opened this issue Aug 10, 2024 · 2 comments
Open
2 tasks done
Labels
A-timeseries Area: date/time functionality bug Something isn't working python Related to Python Polars

Comments

@u3Izx9ql7vW4
Copy link

u3Izx9ql7vW4 commented Aug 10, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
from datetime import datetime, UTC, timedelta
import pandas as pd

sample_array = [
    datetime(2022,1,1, tzinfo=UTC),
    datetime(2022,1,2, tzinfo=UTC)
]

# Create Pandas dataframe
df_pandas = pd.DataFrame(sample_array)

# Get datetime value
value_pandas = df_pandas.iloc[0,0]

# Create Polars dataframe
df_polars = pl.DataFrame([value_pandas])

# Get same value
value_polars = df_polars[0,0]

# Inconsistency - timezone gets dropped
assert value_pandas.tzinfo == UTC
assert value_polars.tzinfo != UTC

# Inconsistency - unit gets converted to from ns to us
assert value_pandas.unit == 'ns'
assert value_polars.resolution == timedelta(microseconds=1)

The following throws an error, see Log output section.

value_numpy = value_pandas.to_datetime64()

# Create Polars dataframe
df_polars = pl.DataFrame([value_numpy])

Log output

This is from attempting to create a Polars Dataframe using Numpy's datetime64 value:

    value_polars = df_polars[0,0]
                   ~~~~~~~~~^^^^^
  File "/home/user/.local/lib/python3.11/site-packages/polars/dataframe/frame.py", line 1183, in __getitem__
    return get_df_item_by_key(self, key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.local/lib/python3.11/site-packages/polars/_utils/getitem.py", line 144, in get_df_item_by_key
    return get_series_item_by_key(selection, row_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.local/lib/python3.11/site-packages/polars/_utils/getitem.py", line 52, in get_series_item_by_key
    return s._s.get_index_signed(key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: failed to construct datetime.datetime: PyErr { type: <class 'ValueError'>, value: ValueError('year 53971 is out of range'), traceback: None }
python-BaseException

Issue description

Polars should not be silently modifying data without notifying the developer. An exception should be thrown if data cannot be converted, rather than silently erasing information, in the case of timezone, or modifying it, in the case of resolution.

In the above example, timezone is set to None and nanosecond resolution is truncated to microseconds.

Expected behavior

Timezone information should be retained as well as resolution. If this cannot be done, an exception should be thrown.

Installed versions

--------Version info---------
Polars: 1.0.0
Index type: UInt32
Platform: Linux-6.8.0-39-generic-x86_64-with-glibc2.39
Python: 3.11.9 (main, Apr 27 2024, 21:16:11) [GCC 13.2.0]

----Optional dependencies----
adbc_driver_manager:
cloudpickle: 3.0.0
connectorx:
deltalake:
fastexcel:
fsspec: 2024.6.1
gevent:
great_tables:
hvplot:
matplotlib: 3.9.1
nest_asyncio: 1.6.0
numpy: 1.24.4
openpyxl:
pandas: 2.2.2
pyarrow: 16.1.0
pydantic: 2.8.2
pyiceberg:
sqlalchemy: 2.0.31
torch: 2.3.1+cu121
xlsx2csv:
xlsxwriter:
None

@u3Izx9ql7vW4 u3Izx9ql7vW4 added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Aug 10, 2024
@MarcoGorelli
Copy link
Collaborator

Thanks for the report will take a look

@MarcoGorelli MarcoGorelli added A-timeseries Area: date/time functionality and removed needs triage Awaiting prioritization by a maintainer labels Aug 10, 2024
@MarcoGorelli
Copy link
Collaborator

MarcoGorelli commented Aug 13, 2024

Simpler reproducer:

import polars as pl
from datetime import datetime, UTC, timedelta
import pandas as pd


ts = datetime(2022,1,1, tzinfo=UTC)

print(pl.Series([ts]))
print(pl.Series([pd.Timestamp(ts)]))
shape: (1,)
Series: '' [datetime[μs, UTC]]
[
        2022-01-01 00:00:00 UTC
]
shape: (1,)
Series: '' [datetime[μs]]
[
        2022-01-01 00:00:00
]

pd.Timestamp is a subclass of datetime, so the time zone issue could be addressed with

diff --git a/py-polars/polars/_utils/construction/series.py b/py-polars/polars/_utils/construction/series.py
index f13b9f5b0e..379bdbeb0a 100644
--- a/py-polars/polars/_utils/construction/series.py
+++ b/py-polars/polars/_utils/construction/series.py
@@ -179,7 +179,7 @@ def sequence_to_pyseries(
         python_dtype = type(value)
 
     # temporal branch
-    if python_dtype in py_temporal_types:
+    if issubclass(python_dtype, tuple(py_temporal_types)):
         if dtype is None:
             dtype = parse_into_dtype(python_dtype)  # construct from integer
         elif dtype in py_temporal_types:
diff --git a/py-polars/polars/datatypes/_parse.py b/py-polars/polars/datatypes/_parse.py
index 2649bc7905..5746934c55 100644
--- a/py-polars/polars/datatypes/_parse.py
+++ b/py-polars/polars/datatypes/_parse.py
@@ -76,10 +76,10 @@ def parse_py_type_into_dtype(input: PythonDataType | type[object]) -> PolarsData
         return String()
     elif input is bool:
         return Boolean()
-    elif input is date:
-        return Date()
-    elif input is datetime:
+    elif isinstance(input, type) and issubclass(input, datetime):
         return Datetime("us")
+    elif isinstance(input, type) and issubclass(input, date):
+        return Date()
     elif input is timedelta:
         return Duration
     elif input is time:

As for the time_unit - will see, but I agree that it would be good to preserve it if possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-timeseries Area: date/time functionality bug Something isn't working python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants