Flexible offset parsing [2/3] #1083

pitdicker · 2023-05-24T19:37:30Z

The first commit is shared with #1082.
Then a couple of commits do some cleanup and add a new offset parser.
All the test keep passing, except some of those for #z. The difference is in the error causes, which have improved a bit.

Next are commits to make %:z, %::z and %:::z work as documented:

%:z should require a colon (this is what makes it different from %z).
%::z should also be able to parse, and even require, seconds.
%:::z should only accept hours.

Finally the FromStr implementation of DateTime<FixedOffset> now accepts optional seconds in the offset, so that it can parse the Debug format of DateTime.

TODO: I still need to figure out some way to test all the code paths of the new offset parser.

jtmoon79 · 2023-05-28T02:26:53Z

src/format/parse.rs

@@ -219,7 +220,18 @@ fn parse_rfc3339<'a>(parsed: &mut Parsed, mut s: &'a str) -> ParseResult<(&'a st
        parsed.set_nanosecond(nanosecond)?;
    }

-    let offset = try_consume!(scan::timezone_offset_zulu(s, |s| scan::char(s, b':')));
+    let offset = if s.starts_with("UTC") || s.starts_with("utc") {


The quote from RFC 3339 in the code comment above suggests the special string to search for would be "Z". Why is that not the special case here?

Why not search the entire remainder of the string s? e.g. why not

if matches!(s, "UTC" | "utc") {

(also, matches is more succinct). Would there be a case where s should have trailing data after having matched the timezone in an RFC3339?

It seems like this code expression should be:

let offset = if matches!(s, "Z" | "z") { s = &[1..]; 0 } else { ... } if ! s.empty() { return Err(TOO_LONG); }

The quote from RFC 3339 in the code comment above suggests the special string to search for would be "Z". Why is that not the special case here?

"Z" is handled by the parser. But this item does not only parse valid RFC 3339, but also supports "UTC" as offset. This is documented in https://docs.rs/chrono/latest/chrono/format/strftime/index.html#fn7. I suppose it makes some sense to support parsing this.
I'll add a comment.

Why not search the entire remainder of the string s? e.g. why not
if matches!(s, "UTC" | "utc") {
(also, matches is more succinct). Would there be a case where s should have trailing data after having matched the timezone in an RFC3339?

I agree it is not likely to have any data after this.
But all parsing items only consume as much as they need to. There can be other items coming after it. And otherwise it is up to the calling function decide what to do with the trailing data.

"Z" is handled by the parser

Oh, it's handled by scan::utc_offset(s, offset_format)) below. Gotcha.

also supports "UTC" as offset.

It is convenient but it's not strict RFC 3339, AFAICT. I'd prefer not allowing it. I won't block on this.

But all parsing items only consume as much as they need to.... And otherwise it is up to the calling function decide what to do with the trailing data.

Ah right, I recall now

chrono/src/format/parse.rs

Lines 519 to 524 in c01e3a7

// if there are trailling chars, it is an error

if !s.is_empty() {

Err((s, TOO_LONG))

} else {

Ok(s)

}

It is convenient but it's not strict RFC 3339, AFAICT. I'd prefer not allowing it.

The commit tile says what I think about it 😄:

Remove hack to accept "UTC" in timezone_offset_zulu

One place it is used is for the %+ formatting item, with this description:

Same as %Y-%m-%dT%H:%M:%S%.f%:z, i.e. 0, 3, 6 or 9 fractional digits for seconds and colons in the time zone offset.

This format also supports having a Z or UTC in place of %:z. They are equivalent to +00:00.

Note that all T, Z, and UTC are parsed case-insensitively.

The typical strftime implementations have different (and locale-dependent) formats for this specifier. While Chrono's format for %+ is far more stable, it is best to avoid this specifier if you want to control the exact output.
So being less strict here seems useful to me.

DateTime<FixedOffset>::parse_from_rfc_3339 also uses it, and should be very strict however.

A second RFC3339 parser is in impl str::FromStr for DateTime<FixedOffset>.
That one is very forgiving, which is useful there.
Maybe parsing %+ should use that implementation?

pitdicker · 2023-06-23T10:57:59Z

I did not realize %::z and %:::z are fairly recent additions, added in #759. Looking at that PR I understand they never properly added to the parser. That seems like an extra argument to fix them.

I'll keep this PR as a draft for a little longer. #1145 has the proper fix for some changes I had to make here to RFC 3339 parsing code, and it is better if that is merged first.

codecov · 2023-08-11T11:50:09Z

Codecov Report

Merging #1083 (7964d8f) into 0.4.x (00d389e) will increase coverage by 0.24%.
Report is 6 commits behind head on 0.4.x.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##            0.4.x    #1083      +/-   ##
==========================================
+ Coverage   86.10%   86.35%   +0.24%     
==========================================
  Files          37       37              
  Lines       13522    13776     +254     
==========================================
+ Hits        11643    11896     +253     
- Misses       1879     1880       +1

Files Changed	Coverage Δ
src/format/mod.rs	`75.36% <ø> (ø)`
src/datetime/tests.rs	`96.56% <100.00%> (+0.02%)`	⬆️
src/format/parse.rs	`97.26% <100.00%> (+0.03%)`	⬆️
src/format/scan.rs	`99.38% <100.00%> (+0.12%)`	⬆️
src/offset/fixed.rs	`78.19% <100.00%> (+1.03%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

refactor when chronotope/chrono#1083 will land

pitdicker force-pushed the offset_formatting_2 branch from d622df8 to 9f4df92 Compare May 24, 2023 19:42

jtmoon79 suggested changes May 28, 2023

View reviewed changes

djc added the waiting-on-review label May 28, 2023

This was referenced Jun 1, 2023

Which parsing bugs can we fix in a minor release? #1118

Open

Add more tests for parse_from_str and specifiers #1122

Draft

pitdicker force-pushed the offset_formatting_2 branch 4 times, most recently from 8f0e628 to 257fa20 Compare June 12, 2023 10:22

pitdicker marked this pull request as draft June 12, 2023 10:23

pitdicker force-pushed the offset_formatting_2 branch 2 times, most recently from 438d354 to 86c4719 Compare June 12, 2023 16:57

pitdicker mentioned this pull request Jun 29, 2023

Offset parsing cleanup #1158

Merged

pitdicker force-pushed the offset_formatting_2 branch 6 times, most recently from d95caa7 to 00437e5 Compare July 1, 2023 20:01

pitdicker force-pushed the offset_formatting_2 branch from 00437e5 to ecb572c Compare July 7, 2023 11:31

pitdicker mentioned this pull request Jul 31, 2023

Parse Zulu suffix with %z #1210

Closed

pitdicker mentioned this pull request Aug 11, 2023

Small fixes to the RFC 3339 parsers #1145

Merged

pitdicker force-pushed the offset_formatting_2 branch from ecb572c to 1eed59a Compare August 11, 2023 11:46

xiangjinwu mentioned this pull request Aug 21, 2023

bug(batch): Do not try to write %#z it is undefined risingwavelabs/risingwave#10283

Open

pitdicker mentioned this pull request Aug 27, 2023

%:::z format incorrectly expects minutes when parsing #1228

Open

pitdicker added 3 commits August 29, 2023 15:37

Add new offset parser

6582df9

Add offset parsing tests

6c84dde

Allow seconds in DateTime::from_str offset

d8e41d0

pitdicker added 5 commits August 29, 2023 15:39

Make FixedOffset::from_str more permissive

5db1c0b

Make parsing %:::z conform to documentation

4de10a2

Make parsing %::z conform to documentation

074c1de

Make parsing %:z conform to documentation

456669b

fix tests

7964d8f

pitdicker force-pushed the offset_formatting_2 branch from 1eed59a to 7964d8f Compare August 29, 2023 13:40

Lurk added a commit to Lurk/yamd that referenced this pull request Oct 14, 2023

consider timestamp without colon in offset as wrong date

d2d3ae5

refactor when chronotope/chrono#1083 will land

Lurk mentioned this pull request Oct 14, 2023

consider timestamp without colon in offset as wrong date Lurk/yamd#31

Merged

This was referenced Feb 5, 2024

support to use dot as timezone offset separator #1413

Closed

Recognize our review capacity is limited for non-trivial work #1417

Open

pitdicker mentioned this pull request Dec 11, 2024

Fix parsing for zone offset seconds #1639

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flexible offset parsing [2/3] #1083

Flexible offset parsing [2/3] #1083

pitdicker commented May 24, 2023 •

edited

Loading

jtmoon79 May 28, 2023

pitdicker May 28, 2023

jtmoon79 May 29, 2023 •

edited

Loading

jtmoon79 May 29, 2023

pitdicker May 29, 2023

pitdicker commented Jun 23, 2023

codecov bot commented Aug 11, 2023 •

edited

Loading

	// if there are trailling chars, it is an error
	if !s.is_empty() {
	Err((s, TOO_LONG))
	} else {
	Ok(s)
	}

Flexible offset parsing [2/3] #1083

Are you sure you want to change the base?

Flexible offset parsing [2/3] #1083

Conversation

pitdicker commented May 24, 2023 • edited Loading

jtmoon79 May 28, 2023

Choose a reason for hiding this comment

pitdicker May 28, 2023

Choose a reason for hiding this comment

jtmoon79 May 29, 2023 • edited Loading

Choose a reason for hiding this comment

jtmoon79 May 29, 2023

Choose a reason for hiding this comment

pitdicker May 29, 2023

Choose a reason for hiding this comment

pitdicker commented Jun 23, 2023

codecov bot commented Aug 11, 2023 • edited Loading

Codecov Report

pitdicker commented May 24, 2023 •

edited

Loading

jtmoon79 May 29, 2023 •

edited

Loading

codecov bot commented Aug 11, 2023 •

edited

Loading