-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add 'millisecond' option to ser_json_timedelta config parameter #1427
Conversation
src/serializers/config.rs
Outdated
// convert to int via a py timedelta not duration since we know this this case the input would have | ||
// been a py timedelta | ||
let py_timedelta = either_delta.try_into_py(py)?; | ||
let seconds: f64 = Self::total_seconds(&py_timedelta)?.extract()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be a better way to do this which is alluding me, maybe we could do the multiplication in python? 🤷🏻♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable enough - multiplication here should be faster.
Can we pull out some of the shared logic into a function (both for Float and for Millisecond) and repeat that across the various branches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, maybe, a couple of them use logic like:
let py_timedelta = either_delta.try_into_py(py)?;
let seconds: f64 = Self::total_seconds(&py_timedelta)?.extract()?;
However then the serializer needs to wrap these in map_err
calls, so can't be reused there..
Maybe a rust wizard could help with some clever refactoring im not seeing 🧙🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do the multiplication in Python would be something like
let object = Self::total_seconds(&py_timedelta)?.mul(1000)?;
... this requires creating a Python integer 1000
, which for best performance we might want to consider caching.
But there is another inefficiency here (and in the other cases) which is that the call to try_into_py
creates a Python timedelta
object which then gets thrown away immediately to convert to float. It will probably be better in all cases to use .to_duration()
, which will avoid the temporary Python object in the case of a Duration
Rust value.
The best option would be to go further and to add a .total_seconds()
method to EitherTimedelta
which extracts the fractional seconds from whatever state the EitherTimedelta
is currently in (doing the most efficient thing for each case) and then doing the multiplication in Rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, yeah neat suggestion on the EitherTimedelta
I guess we'd want something like this:
impl<'a> EitherTimedelta<'a> {
....
pub fn total_seconds(&self, py: Python<'a>) -> f64 {
match self {
Self::Raw(timedelta) => ...
Self::PyExact(py_timedelta) => ...
Self::PySubclass(py_timedelta) => ...
}
}
}
And then we have two cases: Duration
and PyDelta
to deal with. Looks like we have some methods around that would involve getting the py_timedelta
into a Duration
object, so then its just Duration
we need to deal with. However (maybe im reading the docs wrong!) i can't seem to see a method on there that returns the total_seconds as a float?
Probably missed something here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn total_seconds(&self) -> f64 {
match self {
Self::Raw(timedelta) => ...,
Self::PyExact(py_timedelta) => intern!(py_timedelta.py(), "total_seconds"))?.extract?
Self::PySubclass(py_timedelta) => intern!(py_timedelta.py(), "total_seconds"))?.extract?
}
}
Something like this? Not 100% sure what to do with the Raw
case, but doing what we do currently and extracting from the python object by calling "total_seconds" on it feels like the best way in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest looking at the to_duration
method to see how that gets days / seconds / microseconds out of the Python value, as that should have the most efficient implementation already set up to get the data out (and then you can do a bit of arithmetic). In general calling a Python method will be slow-ish, so there might be a faster step for PyExact
case.
For Raw
, you need to work with the speedate::Duration
object, can probably get a timestamp value and microseconds and combine those.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1427 +/- ##
==========================================
- Coverage 90.21% 89.22% -1.00%
==========================================
Files 106 112 +6
Lines 16339 17765 +1426
Branches 36 41 +5
==========================================
+ Hits 14740 15850 +1110
- Misses 1592 1895 +303
- Partials 7 20 +13
... and 48 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
please review |
CodSpeed Performance ReportMerging #1427 will not alter performanceComparing Summary
|
Hi! Any timeline on when this could get looked at? Would be a really valuable feature for us. Thanks! |
Absolutely - will review tomorrow :). |
Gah -- got behind on this with travel. Reviewing today! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable overall - left a couple of quick notes.
Let's get a second review from @davidhewitt re the best way to do the multiplication - he's our in house rust guru.
src/serializers/config.rs
Outdated
// convert to int via a py timedelta not duration since we know this this case the input would have | ||
// been a py timedelta | ||
let py_timedelta = either_delta.try_into_py(py)?; | ||
let seconds: f64 = Self::total_seconds(&py_timedelta)?.extract()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable enough - multiplication here should be faster.
Can we pull out some of the shared logic into a function (both for Float and for Millisecond) and repeat that across the various branches?
7a996db
to
537a484
Compare
@sydney-runkle I've changed the code here to better reflect the discussions in pydantic/pydantic#10293 (comment), on the refactoring im struggling to see a nice way to bring things out, but happy to apply any suggestions you or the team may have :) |
Also happy to add the other modes here if you'd like, wouldn't be too much effort |
please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has become a bit more involved than i first thought, but enjoying it! Im sure some comments on total_seconds, maybe we also want a total_milliseconds function, but regardless i think this is in a good state now.
@@ -356,7 +356,7 @@ def to_json( | |||
by_alias: bool = True, | |||
exclude_none: bool = False, | |||
round_trip: bool = False, | |||
timedelta_mode: Literal['iso8601', 'float'] = 'iso8601', | |||
timedelta_mode: Literal['iso8601', 'seconds_float', 'milliseconds_float'] = 'iso8601', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sydney-runkle as requested in pydantic pr, i've removed float from the type hints in various places.
Pydantic integration is failing, though i guess this is expected since we're changing some behaviour here with a deprecation.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. There's a bit of precision being lost which I think we might be able to mitigate, see my other comment.
RE pydantic integration, I guess if we intend to remove the setting from |
tests/serializers/test_any.py
Outdated
(timedelta(microseconds=1), 0.001, b'0.001', {'0.001': 'foo'}, b'{"0.001":"foo"}', 'milliseconds_float'), | ||
( | ||
timedelta(microseconds=-1), | ||
-0.0009999999999763531, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test (and the https://github.com/pydantic/pydantic-core/pull/1427/files#diff-83a0ae6e1d65a0cd2ffcef088f3e07274f2e38a33e89b77c6bcffe137c3e1ddaR264) seem to still have a floating point error, both are the single negative microseconds case. Curious if you have any insight on this!
please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the very late reply; combination of leave and sickness.
I think I've worked out a possible solution & have posted below.
src/input/datetime.rs
Outdated
let days: f64 = f64::from(py_timedelta.get_days()); // -999999999 to 999999999 | ||
let seconds: f64 = f64::from(py_timedelta.get_seconds()); // 0 through 86399 | ||
let microseconds: f64 = f64::from(py_timedelta.get_microseconds()); // 0 through 999999 | ||
Ok(86_400_000.0 * days + seconds * 1_000.0 + microseconds / 1_000.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think to keep full precision, we need to try to keep full microsecond precision as integer arithmetic and do floating point cast at the last minute. If we work in i64
then we might overflow on the conversion from days & seconds to micros for large values of days
, but in that case the precision on the microseconds won't matter much anyway.
So I get something like this:
let days: f64 = f64::from(py_timedelta.get_days()); // -999999999 to 999999999 | |
let seconds: f64 = f64::from(py_timedelta.get_seconds()); // 0 through 86399 | |
let microseconds: f64 = f64::from(py_timedelta.get_microseconds()); // 0 through 999999 | |
Ok(86_400_000.0 * days + seconds * 1_000.0 + microseconds / 1_000.0) | |
let days: i64 = py_timedelta.get_days().into(); // -999999999 to 999999999 | |
let seconds: i64 = py_timedelta.get_seconds().into(); // 0 through 86399 | |
let microseconds = py_timedelta.get_microseconds(); // 0 through 999999 | |
let days_seconds = (86_400 * days) + seconds; | |
if let Some(days_seconds_as_micros) = days_seconds.checked_mul(1_000_000) { | |
let total_microseconds = days_seconds_as_micros + i64::from(microseconds); | |
Ok(total_microseconds as f64 / 1_000.0) | |
} else { | |
// Fall back to floating-point operations if the multiplication overflows | |
let total_milliseconds = days_seconds as f64 * 1_000.0 + f64::from(microseconds) / 1_000.0; | |
Ok(total_milliseconds) | |
} |
... and I guess we can do similar for the other cases.
Co-authored-by: David Hewitt <mail@davidhewitt.dev>
Nice!!! I've applied that change and altered it for the other cases, seems to have removed all the precision issues. Should be good now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic, thanks for the many rounds of iteration here; this looks great to me!
cc @sydney-runkle if you are happy with moving the float
option to be in pydantic
only, then this is ready to merge👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work here - thanks for sticking with this through some significant (api and code) changes.
Parametrized test looks great - thanks!
Change Summary
Adds a millisecond option to ser_json_timedelta, which returns the number of milliseconds in the timedelta.
Note a corresponding PR will be needed in pydantic
Related issue number
pydantic/pydantic#10256
Checklist
pydantic-core
(except for expected changes)Selected Reviewer: @sydney-runkle