Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add 'millisecond' option to ser_json_timedelta config parameter #1427

Merged
merged 23 commits into from
Sep 18, 2024

Conversation

ollz272
Copy link
Contributor

@ollz272 ollz272 commented Aug 30, 2024

Change Summary

Adds a millisecond option to ser_json_timedelta, which returns the number of milliseconds in the timedelta.

Note a corresponding PR will be needed in pydantic

Related issue number

pydantic/pydantic#10256

Checklist

  • Unit tests for the changes exist
  • Documentation reflects the changes where applicable
  • Pydantic tests pass with this pydantic-core (except for expected changes)
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @sydney-runkle

// convert to int via a py timedelta not duration since we know this this case the input would have
// been a py timedelta
let py_timedelta = either_delta.try_into_py(py)?;
let seconds: f64 = Self::total_seconds(&py_timedelta)?.extract()?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a better way to do this which is alluding me, maybe we could do the multiplication in python? 🤷🏻‍♂️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable enough - multiplication here should be faster.

Can we pull out some of the shared logic into a function (both for Float and for Millisecond) and repeat that across the various branches?

Copy link
Contributor Author

@ollz272 ollz272 Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, maybe, a couple of them use logic like:

let py_timedelta = either_delta.try_into_py(py)?;
let seconds: f64 = Self::total_seconds(&py_timedelta)?.extract()?;

However then the serializer needs to wrap these in map_err calls, so can't be reused there..

Maybe a rust wizard could help with some clever refactoring im not seeing 🧙🏻

Copy link
Contributor

@davidhewitt davidhewitt Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do the multiplication in Python would be something like

let object = Self::total_seconds(&py_timedelta)?.mul(1000)?;

... this requires creating a Python integer 1000, which for best performance we might want to consider caching.

But there is another inefficiency here (and in the other cases) which is that the call to try_into_py creates a Python timedelta object which then gets thrown away immediately to convert to float. It will probably be better in all cases to use .to_duration(), which will avoid the temporary Python object in the case of a Duration Rust value.

The best option would be to go further and to add a .total_seconds() method to EitherTimedelta which extracts the fractional seconds from whatever state the EitherTimedelta is currently in (doing the most efficient thing for each case) and then doing the multiplication in Rust.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, yeah neat suggestion on the EitherTimedelta

I guess we'd want something like this:

impl<'a> EitherTimedelta<'a> {
    ....

    pub fn total_seconds(&self, py: Python<'a>) -> f64 {
        match self {
            Self::Raw(timedelta) => ...
            Self::PyExact(py_timedelta) => ...
            Self::PySubclass(py_timedelta) => ...
        }
    }
}

And then we have two cases: Duration and PyDelta to deal with. Looks like we have some methods around that would involve getting the py_timedelta into a Duration object, so then its just Duration we need to deal with. However (maybe im reading the docs wrong!) i can't seem to see a method on there that returns the total_seconds as a float?

Probably missed something here!

Copy link
Contributor Author

@ollz272 ollz272 Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    pub fn total_seconds(&self) -> f64 {
        match self {
            Self::Raw(timedelta) => ...,
            Self::PyExact(py_timedelta) => intern!(py_timedelta.py(), "total_seconds"))?.extract?
            Self::PySubclass(py_timedelta) => intern!(py_timedelta.py(), "total_seconds"))?.extract?
        }
    }

Something like this? Not 100% sure what to do with the Raw case, but doing what we do currently and extracting from the python object by calling "total_seconds" on it feels like the best way in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest looking at the to_duration method to see how that gets days / seconds / microseconds out of the Python value, as that should have the most efficient implementation already set up to get the data out (and then you can do a bit of arithmetic). In general calling a Python method will be slow-ish, so there might be a faster step for PyExact case.

For Raw, you need to work with the speedate::Duration object, can probably get a timestamp value and microseconds and combine those.

Copy link

codecov bot commented Aug 30, 2024

Codecov Report

Attention: Patch coverage is 46.83544% with 42 lines in your changes missing coverage. Please review.

Project coverage is 89.22%. Comparing base (ab503cb) to head (3e23511).
Report is 178 commits behind head on main.

Files with missing lines Patch % Lines
src/input/datetime.rs 34.37% 42 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1427      +/-   ##
==========================================
- Coverage   90.21%   89.22%   -1.00%     
==========================================
  Files         106      112       +6     
  Lines       16339    17765    +1426     
  Branches       36       41       +5     
==========================================
+ Hits        14740    15850    +1110     
- Misses       1592     1895     +303     
- Partials        7       20      +13     
Files with missing lines Coverage Δ
python/pydantic_core/core_schema.py 94.76% <100.00%> (-0.01%) ⬇️
src/serializers/config.rs 94.25% <100.00%> (-0.20%) ⬇️
src/serializers/infer.rs 90.94% <100.00%> (-4.18%) ⬇️
src/serializers/type_serializers/timedelta.rs 100.00% <100.00%> (ø)
src/input/datetime.rs 90.18% <34.37%> (-8.59%) ⬇️

... and 48 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc0c97a...3e23511. Read the comment docs.

@ollz272
Copy link
Contributor Author

ollz272 commented Aug 30, 2024

please review

Copy link

codspeed-hq bot commented Aug 30, 2024

CodSpeed Performance Report

Merging #1427 will not alter performance

Comparing ollz272:add-millisecond (3e23511) with main (bc0c97a)

Summary

✅ 155 untouched benchmarks

@ollz272
Copy link
Contributor Author

ollz272 commented Sep 4, 2024

Hi! Any timeline on when this could get looked at? Would be a really valuable feature for us. Thanks!

@sydney-runkle
Copy link
Member

Hi! Any timeline on when this could get looked at? Would be a really valuable feature for us. Thanks!

Absolutely - will review tomorrow :).

@sydney-runkle
Copy link
Member

Absolutely - will review tomorrow :).

Gah -- got behind on this with travel. Reviewing today!

Copy link
Member

@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable overall - left a couple of quick notes.

Let's get a second review from @davidhewitt re the best way to do the multiplication - he's our in house rust guru.

tests/serializers/test_any.py Outdated Show resolved Hide resolved
// convert to int via a py timedelta not duration since we know this this case the input would have
// been a py timedelta
let py_timedelta = either_delta.try_into_py(py)?;
let seconds: f64 = Self::total_seconds(&py_timedelta)?.extract()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable enough - multiplication here should be faster.

Can we pull out some of the shared logic into a function (both for Float and for Millisecond) and repeat that across the various branches?

@ollz272
Copy link
Contributor Author

ollz272 commented Sep 11, 2024

@sydney-runkle I've changed the code here to better reflect the discussions in pydantic/pydantic#10293 (comment), on the refactoring im struggling to see a nice way to bring things out, but happy to apply any suggestions you or the team may have :)

@ollz272
Copy link
Contributor Author

ollz272 commented Sep 11, 2024

Also happy to add the other modes here if you'd like, wouldn't be too much effort

@ollz272
Copy link
Contributor Author

ollz272 commented Sep 11, 2024

please review

Copy link
Contributor Author

@ollz272 ollz272 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has become a bit more involved than i first thought, but enjoying it! Im sure some comments on total_seconds, maybe we also want a total_milliseconds function, but regardless i think this is in a good state now.

@@ -356,7 +356,7 @@ def to_json(
by_alias: bool = True,
exclude_none: bool = False,
round_trip: bool = False,
timedelta_mode: Literal['iso8601', 'float'] = 'iso8601',
timedelta_mode: Literal['iso8601', 'seconds_float', 'milliseconds_float'] = 'iso8601',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sydney-runkle as requested in pydantic pr, i've removed float from the type hints in various places.

src/input/datetime.rs Show resolved Hide resolved
src/serializers/config.rs Show resolved Hide resolved
src/serializers/config.rs Show resolved Hide resolved
src/serializers/infer.rs Show resolved Hide resolved
tests/serializers/test_any.py Show resolved Hide resolved
@ollz272
Copy link
Contributor Author

ollz272 commented Sep 12, 2024

Pydantic integration is failing, though i guess this is expected since we're changing some behaviour here with a deprecation..

Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. There's a bit of precision being lost which I think we might be able to mitigate, see my other comment.

src/input/datetime.rs Outdated Show resolved Hide resolved
src/serializers/config.rs Show resolved Hide resolved
src/serializers/config.rs Outdated Show resolved Hide resolved
tests/serializers/test_any.py Outdated Show resolved Hide resolved
tests/serializers/test_any.py Outdated Show resolved Hide resolved
@davidhewitt
Copy link
Contributor

RE pydantic integration, I guess if we intend to remove the setting from pydantic-core completely and keep it as a pydantic level thing, then yes the failing test is expected.

(timedelta(microseconds=1), 0.001, b'0.001', {'0.001': 'foo'}, b'{"0.001":"foo"}', 'milliseconds_float'),
(
timedelta(microseconds=-1),
-0.0009999999999763531,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test (and the https://github.com/pydantic/pydantic-core/pull/1427/files#diff-83a0ae6e1d65a0cd2ffcef088f3e07274f2e38a33e89b77c6bcffe137c3e1ddaR264) seem to still have a floating point error, both are the single negative microseconds case. Curious if you have any insight on this!

@ollz272 ollz272 requested a review from davidhewitt September 13, 2024 08:37
@ollz272
Copy link
Contributor Author

ollz272 commented Sep 17, 2024

please review

Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the very late reply; combination of leave and sickness.

I think I've worked out a possible solution & have posted below.

src/input/datetime.rs Outdated Show resolved Hide resolved
Comment on lines 152 to 155
let days: f64 = f64::from(py_timedelta.get_days()); // -999999999 to 999999999
let seconds: f64 = f64::from(py_timedelta.get_seconds()); // 0 through 86399
let microseconds: f64 = f64::from(py_timedelta.get_microseconds()); // 0 through 999999
Ok(86_400_000.0 * days + seconds * 1_000.0 + microseconds / 1_000.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to keep full precision, we need to try to keep full microsecond precision as integer arithmetic and do floating point cast at the last minute. If we work in i64 then we might overflow on the conversion from days & seconds to micros for large values of days, but in that case the precision on the microseconds won't matter much anyway.

So I get something like this:

Suggested change
let days: f64 = f64::from(py_timedelta.get_days()); // -999999999 to 999999999
let seconds: f64 = f64::from(py_timedelta.get_seconds()); // 0 through 86399
let microseconds: f64 = f64::from(py_timedelta.get_microseconds()); // 0 through 999999
Ok(86_400_000.0 * days + seconds * 1_000.0 + microseconds / 1_000.0)
let days: i64 = py_timedelta.get_days().into(); // -999999999 to 999999999
let seconds: i64 = py_timedelta.get_seconds().into(); // 0 through 86399
let microseconds = py_timedelta.get_microseconds(); // 0 through 999999
let days_seconds = (86_400 * days) + seconds;
if let Some(days_seconds_as_micros) = days_seconds.checked_mul(1_000_000) {
let total_microseconds = days_seconds_as_micros + i64::from(microseconds);
Ok(total_microseconds as f64 / 1_000.0)
} else {
// Fall back to floating-point operations if the multiplication overflows
let total_milliseconds = days_seconds as f64 * 1_000.0 + f64::from(microseconds) / 1_000.0;
Ok(total_milliseconds)
}

... and I guess we can do similar for the other cases.

ollz272 and others added 2 commits September 17, 2024 18:48
@ollz272
Copy link
Contributor Author

ollz272 commented Sep 17, 2024

Sorry for the very late reply; combination of leave and sickness.

I think I've worked out a possible solution & have posted below.

Nice!!! I've applied that change and altered it for the other cases, seems to have removed all the precision issues. Should be good now

@ollz272 ollz272 requested a review from davidhewitt September 18, 2024 05:13
Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic, thanks for the many rounds of iteration here; this looks great to me!

cc @sydney-runkle if you are happy with moving the float option to be in pydantic only, then this is ready to merge👍

Copy link
Member

@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work here - thanks for sticking with this through some significant (api and code) changes.

Parametrized test looks great - thanks!

@sydney-runkle sydney-runkle merged commit e0b4c94 into pydantic:main Sep 18, 2024
29 of 30 checks passed
sydney-runkle added a commit that referenced this pull request Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants