Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flexible offset parsing [2/3] #1083

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

pitdicker
Copy link
Collaborator

@pitdicker pitdicker commented May 24, 2023

The first commit is shared with #1082.
Then a couple of commits do some cleanup and add a new offset parser.
All the test keep passing, except some of those for #z. The difference is in the error causes, which have improved a bit.

Next are commits to make %:z, %::z and %:::z work as documented:

  • %:z should require a colon (this is what makes it different from %z).
  • %::z should also be able to parse, and even require, seconds.
  • %:::z should only accept hours.

Finally the FromStr implementation of DateTime<FixedOffset> now accepts optional seconds in the offset, so that it can parse the Debug format of DateTime.


TODO: I still need to figure out some way to test all the code paths of the new offset parser.

@pitdicker pitdicker force-pushed the offset_formatting_2 branch from d622df8 to 9f4df92 Compare May 24, 2023 19:42
@@ -219,7 +220,18 @@ fn parse_rfc3339<'a>(parsed: &mut Parsed, mut s: &'a str) -> ParseResult<(&'a st
parsed.set_nanosecond(nanosecond)?;
}

let offset = try_consume!(scan::timezone_offset_zulu(s, |s| scan::char(s, b':')));
let offset = if s.starts_with("UTC") || s.starts_with("utc") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The quote from RFC 3339 in the code comment above suggests the special string to search for would be "Z". Why is that not the special case here?

  1. Why not search the entire remainder of the string s? e.g. why not

    if matches!(s, "UTC" | "utc") {

    (also, matches is more succinct). Would there be a case where s should have trailing data after having matched the timezone in an RFC3339?

It seems like this code expression should be:

let offset = if matches!(s, "Z" | "z") {
    s = &[1..];
    0
} else {
    ...
}
if ! s.empty() {
    return Err(TOO_LONG);
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The quote from RFC 3339 in the code comment above suggests the special string to search for would be "Z". Why is that not the special case here?

"Z" is handled by the parser. But this item does not only parse valid RFC 3339, but also supports "UTC" as offset. This is documented in https://docs.rs/chrono/latest/chrono/format/strftime/index.html#fn7. I suppose it makes some sense to support parsing this.
I'll add a comment.

  1. Why not search the entire remainder of the string s? e.g. why not
    if matches!(s, "UTC" | "utc") {
    (also, matches is more succinct). Would there be a case where s should have trailing data after having matched the timezone in an RFC3339?

I agree it is not likely to have any data after this.
But all parsing items only consume as much as they need to. There can be other items coming after it. And otherwise it is up to the calling function decide what to do with the trailing data.

Copy link
Contributor

@jtmoon79 jtmoon79 May 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Z" is handled by the parser

Oh, it's handled by scan::utc_offset(s, offset_format)) below. Gotcha.

also supports "UTC" as offset.

It is convenient but it's not strict RFC 3339, AFAICT. I'd prefer not allowing it. I won't block on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But all parsing items only consume as much as they need to.... And otherwise it is up to the calling function decide what to do with the trailing data.

Ah right, I recall now

chrono/src/format/parse.rs

Lines 519 to 524 in c01e3a7

// if there are trailling chars, it is an error
if !s.is_empty() {
Err((s, TOO_LONG))
} else {
Ok(s)
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is convenient but it's not strict RFC 3339, AFAICT. I'd prefer not allowing it.

The commit tile says what I think about it 😄:

Remove hack to accept "UTC" in timezone_offset_zulu

  • One place it is used is for the %+ formatting item, with this description:

    Same as %Y-%m-%dT%H:%M:%S%.f%:z, i.e. 0, 3, 6 or 9 fractional digits for seconds and colons in the time zone offset.

    This format also supports having a Z or UTC in place of %:z. They are equivalent to +00:00.

    Note that all T, Z, and UTC are parsed case-insensitively.

    The typical strftime implementations have different (and locale-dependent) formats for this specifier. While Chrono's format for %+ is far more stable, it is best to avoid this specifier if you want to control the exact output.
    So being less strict here seems useful to me.

  • DateTime<FixedOffset>::parse_from_rfc_3339 also uses it, and should be very strict however.

  • A second RFC3339 parser is in impl str::FromStr for DateTime<FixedOffset>.
    That one is very forgiving, which is useful there.
    Maybe parsing %+ should use that implementation?

@pitdicker pitdicker force-pushed the offset_formatting_2 branch 4 times, most recently from 8f0e628 to 257fa20 Compare June 12, 2023 10:22
@pitdicker pitdicker marked this pull request as draft June 12, 2023 10:23
@pitdicker pitdicker force-pushed the offset_formatting_2 branch 2 times, most recently from 438d354 to 86c4719 Compare June 12, 2023 16:57
@pitdicker
Copy link
Collaborator Author

I did not realize %::z and %:::z are fairly recent additions, added in #759. Looking at that PR I understand they never properly added to the parser. That seems like an extra argument to fix them.

I'll keep this PR as a draft for a little longer. #1145 has the proper fix for some changes I had to make here to RFC 3339 parsing code, and it is better if that is merged first.

@pitdicker pitdicker force-pushed the offset_formatting_2 branch 6 times, most recently from d95caa7 to 00437e5 Compare July 1, 2023 20:01
@pitdicker pitdicker force-pushed the offset_formatting_2 branch from 00437e5 to ecb572c Compare July 7, 2023 11:31
@pitdicker pitdicker force-pushed the offset_formatting_2 branch from ecb572c to 1eed59a Compare August 11, 2023 11:46
@codecov
Copy link

codecov bot commented Aug 11, 2023

Codecov Report

Merging #1083 (7964d8f) into 0.4.x (00d389e) will increase coverage by 0.24%.
Report is 6 commits behind head on 0.4.x.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##            0.4.x    #1083      +/-   ##
==========================================
+ Coverage   86.10%   86.35%   +0.24%     
==========================================
  Files          37       37              
  Lines       13522    13776     +254     
==========================================
+ Hits        11643    11896     +253     
- Misses       1879     1880       +1     
Files Changed Coverage Δ
src/format/mod.rs 75.36% <ø> (ø)
src/datetime/tests.rs 96.56% <100.00%> (+0.02%) ⬆️
src/format/parse.rs 97.26% <100.00%> (+0.03%) ⬆️
src/format/scan.rs 99.38% <100.00%> (+0.12%) ⬆️
src/offset/fixed.rs 78.19% <100.00%> (+1.03%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants