Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsed fixes and documentation #1439

Merged
merged 5 commits into from
Feb 27, 2024
Merged

Conversation

pitdicker
Copy link
Collaborator

@pitdicker pitdicker commented Feb 14, 2024

This is the first part of #1418 and applies to the branch for 0.4.x.

We have two small bugs in our implementation of Parsed:

  • If there is a timestamp and an offset field, the offset is added to the timestamp. But the definition of a Unix timestamp is that the value is in UTC, so this is not correct.
  • We were returning OUT_OF_RANGE if the year value didn't match with the values of year_div_100 or year_mod_100. It should return IMPOSSIBLE instead.

I extended the documentation of Parsed to describe why it exists (the resolution algorithm) and to give an example (fixes #55). Partly taken from the blog post before chrono 0.2: https://lifthrasiir.github.io/rustlog/worklog-2015-02-19.html

All the methods now have more accurate documentation that describes their error causes.

The documentation got a couple of rounds of self-review.

Copy link

codecov bot commented Feb 14, 2024

Codecov Report

Attention: Patch coverage is 96.66667% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 92.14%. Comparing base (02c68d6) to head (3ea3af6).
Report is 3 commits behind head on main.

Files Patch % Lines
src/format/parsed.rs 96.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1439      +/-   ##
==========================================
+ Coverage   92.11%   92.14%   +0.03%     
==========================================
  Files          40       40              
  Lines       18026    18079      +53     
==========================================
+ Hits        16604    16659      +55     
+ Misses       1422     1420       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/format/parsed.rs Show resolved Hide resolved
src/format/parsed.rs Show resolved Hide resolved
src/format/parsed.rs Outdated Show resolved Hide resolved
src/format/parsed.rs Outdated Show resolved Hide resolved
@pitdicker pitdicker force-pushed the parsed_fixes branch 2 times, most recently from 26e5fc3 to 755fe4f Compare February 15, 2024 06:44
@pitdicker
Copy link
Collaborator Author

Found more oddity to document for the to_naive_datetime_with_offset method:

The offset is assumed to have a given value. It is not compared against the offset field set in the Parsed type, so it is allowed to be inconsistent.

Actually that whole method is just weird to work with. The logic should be inverted to return a NaiveDateTime, and an optional FixedOffset if there is enough information available. In September I planned to deprecate that method and replace it with something more sensible but never made the PR.

@pitdicker
Copy link
Collaborator Author

  • If there is a timestamp and an offset field, the offset is added to the timestamp. But the definition of a Unix timestamp is that the value is in UTC, so this is not correct.

I have to admit I read the current code wrong. It is working exactly like it should. The timestamp is assumed to be in UTC, it adds an offset, and then checks the rest of the values as local date and time. Dropped the first commit.

src/format/parsed.rs Outdated Show resolved Hide resolved
src/format/parsed.rs Show resolved Hide resolved
@pitdicker pitdicker force-pushed the parsed_fixes branch 2 times, most recently from 4d6aedf to cadf339 Compare February 26, 2024 18:14
@pitdicker
Copy link
Collaborator Author

Found one more missing error case in Parsed::to_fixed_offset. The use of Options covered it up 😄.
If there is no offset field set, it should return NOT_ENOUGH instead of OUT_OF_RANGE.

///
/// - `to_*` methods try to make a concrete date and time value out of set fields.
/// It fully checks any remaining out-of-range conditions and inconsistent/impossible fields.
/// They fully check that all fields are consistent and whether the date/datetime exists.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side thought: arguably the to_() methods should become impl TryFrom<Parsed> for _ impls?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a few cases they can. Good to keep in mind! I'll come back to it later (before 0.5).

/// # Resolving algorithm
///
/// Resolving date/time parts is littered with lots of corner cases, which is why common date/time
/// parsers do not correctly implement it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"do not correctly implement it" -> "do not implement it correctly" is a better word order here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.

/// Tries to set the [`year`](#structfield.year) field from given value.
/// Set the 'year' field to the given value.
///
/// The value can be negative unlike the 'year divided by 100' and 'year modulo 100' fields.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need commas before "unlike". When referencing values stored in other fields, would suggest referencing the field directly, like year_div_100 or even [`Parsed::year_div_100`]?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When referencing values stored in other fields, would suggest referencing the field directly, like year_div_100 or even [`Parsed::year_div_100`]?

I choose to avoid referencing the fields directly so it is easier to make them private (my proposal for the 0.5.x branch).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, suggest making them #[doc(hidden)] on this branch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I was think the same, I'll do it with a PR that adds the range checks and setter methods.

Comment on lines +274 to +275
/// May return `OUT_OF_RANGE` if `value` is negative or if it is greater than 99.
/// Currently only checks the value is positive and not out of range for an `i32`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sounds pretty weird. "May return blah" but actually "Currently" doesn't? I think we should describe what it does, and not what it doesn't do. So maybe replace "Currently only checks [..]" with "For performance reasons, precise range checks may be deferred in favor of checking on conversion into a target type."?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part about performance has never really been true or benchmarked. See #1418 (comment).

With incomplete range checks (current):

bench_datetime_parse_from_rfc2822
                        time:   [156.24 ns 157.01 ns 157.85 ns]
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

bench_datetime_parse_from_rfc3339
                        time:   [104.25 ns 104.54 ns 104.86 ns]
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

With complete range checks:

bench_datetime_parse_from_rfc2822
                        time:   [156.47 ns 156.81 ns 157.16 ns]
                        change: [-0.3688% +0.2339% +0.7850%] (p = 0.44 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

bench_datetime_parse_from_rfc3339
                        time:   [102.69 ns 103.07 ns 103.49 ns]
                        change: [-2.4267% -1.7659% -1.1091%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

So basically identical.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sounds pretty weird. "May return blah" but actually "Currently" doesn't? I think we should describe what it does, and not what it doesn't do.

My hope was to open a PR to add the complete range checks (instead of the incomplete current ones) as soon as this one is merged and remove all those lines again. Does that sound okay or should I polish this a bit more?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I guess that's fine.

@djc
Copy link
Member

djc commented Feb 27, 2024

This is shaping up nicely!

@pitdicker pitdicker merged commit 85c70ff into chronotope:main Feb 27, 2024
35 checks passed
@pitdicker pitdicker deleted the parsed_fixes branch February 27, 2024 14:32
@pitdicker
Copy link
Collaborator Author

pitdicker commented Feb 28, 2024

Found one more missing error case in Parsed::to_fixed_offset.

Wow, apparently I found and fixed it last year also in #1042 😆.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Example of Parsing to Parsed in README.MD
2 participants