BUG/API: round-tripping non-nano datetime64s with to_json/read_json #55827

jbrockmendel · 2023-11-04T16:22:16Z

import pandas as pd
import pandas._testing as tm
from io import StringIO

df = pd.DataFrame(
    {
        "A": pd.to_datetime(["2013-01-01", "2013-01-02"]).as_unit("s"),
        "B": [3.5, 3.5],
    }
)

written = df.to_json(orient="split")

>>> written
'{"A":{"0":1356,"1":1357},"B":{"0":3.5,"1":3.5}}'

result = pd.read_json(StringIO(written), orient="split", convert_dates=["A"])

>>> result
      A    B
0  1356  3.5
1  1357  3.5

tm.assert_frame_equal(result, df)   # <- fails

The example here is based on test_frame_non_unique_columns, altered by 1) making the columns into ["A", "B"] and 2) changing the dtype for the first column from M8[ns] to M8[s].

This goes through a check in _try_convert_to_date:

        # ignore numbers that are out of range
        if issubclass(new_data.dtype.type, np.number):
            in_range = (
                isna(new_data._values)
                | (new_data > self.min_stamp)
                | (new_data._values == iNaT)
            )
            if not in_range.all():
                return data, False

when the json is produced from M8[s] (or M8[ms]) data, these values are all under self.min_stamp, so this check causes us to short-circuit and not go through the pd.to_datetime conversion that comes just after (which itself looks sketchy but that can wait for another day).

cc @WillAyd my best guess is that there is nothing we can do at the reading stage and we should convert non-nano to nano at the writing stage, or maybe just warn users that they are doing something that doesn't round-trip?

Surfaced while implementing #55564 (which will cause users to get non-nano in many cases where they currently get nano).

The text was updated successfully, but these errors were encountered:

WillAyd · 2023-11-04T17:38:52Z

Hmm shouldn't to_json be writing this as a date to begin with? I would be hesitant to make any guarantees on lossless-ness if we are just writing out the integer in to_json

jbrockmendel · 2023-11-04T19:21:05Z

shouldn't to_json be writing this as a date to begin with?

no idea about the "should", but it isn't. still integers if we don't do the .as_unit (though surprisingly, not quite just multiply-by-10**9)

WillAyd · 2023-11-04T19:35:36Z

Gotcha. Well I think that's a bug we need to investigate more with to_json, though ultimately even when we write dates I think we'd have the question as to what precision to read in since that is not going to be embedded in the JSON.

My initial thought is that we should just stick with ns as the default unless a user cares to specify. The alternative I think is to find the best precision to fit a date as it is read in, but I'd be happy to defer that until someone really requests it

WillAyd · 2023-12-01T17:30:01Z

Coming back here after @jbrockmendel comment #55901 (comment) - thanks for bringing this full circle

OK my current point of view with more context is when we write integral values with to_json we should stick with the "historic precedent" of nanosecond.

Ideally a user writes ISO strings, but when not I think there are too many different ways to interpret this data on a roundtrip, none seemingly better or worse than the other. So in that case would just like to stick with how this has implicitly "worked" for quite some time

lithomas1 · 2023-12-03T18:34:07Z

Does #53757 help in any way?

(I think I implemented this for to_json but not read_json).

WillAyd · 2023-12-03T20:08:11Z

I don't think there is a way to add this to read_json without the presence of additional metadata

lithomas1 · 2023-12-03T20:56:32Z

Ah, what about the ISO string case?

We should be able to infer then, I think.

WillAyd · 2023-12-03T21:02:42Z

I was only thinking about the integer case outlined in the OP, but that's a good point on the string roundtripping. Looks like ISO 8601 allows decimal fractions that can be used to determine the unit

jbrockmendel · 2023-12-17T19:26:34Z

Have punted on this by xfailing the relevant test in #55901.

jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 4, 2023

rhshadrach added Non-Nano datetime64/timedelta64 with non-nanosecond resolution IO JSON read_json, to_json, json_normalize labels Nov 5, 2023

jbrockmendel mentioned this issue Nov 9, 2023

ENH/WIP: resolution inference in pd.to_datetime, DatetimeIndex #55901

Merged

13 tasks

lithomas1 removed the Needs Triage Issue that has not been reviewed by a pandas team member label Jan 11, 2024

jbrockmendel mentioned this issue Mar 5, 2024

BUG: Convertion fails for columns with datetime64[ms] #57738

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG/API: round-tripping non-nano datetime64s with to_json/read_json #55827

BUG/API: round-tripping non-nano datetime64s with to_json/read_json #55827

jbrockmendel commented Nov 4, 2023 •

edited

Loading

WillAyd commented Nov 4, 2023

jbrockmendel commented Nov 4, 2023

WillAyd commented Nov 4, 2023

WillAyd commented Dec 1, 2023 •

edited

Loading

lithomas1 commented Dec 3, 2023

WillAyd commented Dec 3, 2023

lithomas1 commented Dec 3, 2023

WillAyd commented Dec 3, 2023

jbrockmendel commented Dec 17, 2023

BUG/API: round-tripping non-nano datetime64s with to_json/read_json #55827

BUG/API: round-tripping non-nano datetime64s with to_json/read_json #55827

Comments

jbrockmendel commented Nov 4, 2023 • edited Loading

WillAyd commented Nov 4, 2023

jbrockmendel commented Nov 4, 2023

WillAyd commented Nov 4, 2023

WillAyd commented Dec 1, 2023 • edited Loading

lithomas1 commented Dec 3, 2023

WillAyd commented Dec 3, 2023

lithomas1 commented Dec 3, 2023

WillAyd commented Dec 3, 2023

jbrockmendel commented Dec 17, 2023

jbrockmendel commented Nov 4, 2023 •

edited

Loading

WillAyd commented Dec 1, 2023 •

edited

Loading