-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG/API: round-tripping non-nano datetime64s with to_json/read_json #55827
Comments
Hmm shouldn't to_json be writing this as a date to begin with? I would be hesitant to make any guarantees on lossless-ness if we are just writing out the integer in to_json |
no idea about the "should", but it isn't. still integers if we don't do the .as_unit (though surprisingly, not quite just multiply-by- |
Gotcha. Well I think that's a bug we need to investigate more with to_json, though ultimately even when we write dates I think we'd have the question as to what precision to read in since that is not going to be embedded in the JSON. My initial thought is that we should just stick with |
Coming back here after @jbrockmendel comment #55901 (comment) - thanks for bringing this full circle OK my current point of view with more context is when we write integral values with to_json we should stick with the "historic precedent" of nanosecond. Ideally a user writes ISO strings, but when not I think there are too many different ways to interpret this data on a roundtrip, none seemingly better or worse than the other. So in that case would just like to stick with how this has implicitly "worked" for quite some time |
Does #53757 help in any way? (I think I implemented this for |
I don't think there is a way to add this to read_json without the presence of additional metadata |
Ah, what about the ISO string case? We should be able to infer then, I think. |
I was only thinking about the integer case outlined in the OP, but that's a good point on the string roundtripping. Looks like ISO 8601 allows decimal fractions that can be used to determine the unit |
Have punted on this by xfailing the relevant test in #55901. |
The example here is based on test_frame_non_unique_columns, altered by 1) making the columns into ["A", "B"] and 2) changing the dtype for the first column from
M8[ns]
toM8[s]
.This goes through a check in
_try_convert_to_date
:when the json is produced from M8[s] (or M8[ms]) data, these values are all under
self.min_stamp
, so this check causes us to short-circuit and not go through the pd.to_datetime conversion that comes just after (which itself looks sketchy but that can wait for another day).cc @WillAyd my best guess is that there is nothing we can do at the reading stage and we should convert non-nano to nano at the writing stage, or maybe just warn users that they are doing something that doesn't round-trip?
Surfaced while implementing #55564 (which will cause users to get non-nano in many cases where they currently get nano).
The text was updated successfully, but these errors were encountered: