Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normative: Allow UTC offset time zones #788

Merged
merged 3 commits into from
Sep 27, 2023

Conversation

gibson042
Copy link
Contributor

@gibson042 gibson042 commented Jun 1, 2023

Fixes #683

This aligns with ECMA-262, and therefore removes the DefaultTimeZone override.

Example time zones that will now be accepted: Temporal |TimeZoneUTCOffsetName|

  • "+00"
  • "-00"
  • "−00"
  • "+0000"
  • "-0000"
  • "−0000"
  • "+00:00"
  • "-00:00"
  • "−00:00"
  • "+2359"
  • "+23:59"
  • "-2359"
  • "-23:59"
  • "−2359"
  • "−23:59"

Comment on lines 151 to 153
<h1>FormatTimeZoneOffsetString ( _offsetNanoseconds_ )</h1>
<emu-alg>
1. Assert: _offsetNanoseconds_ is an integer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend a structured header (and then there's no need for the assertion). We can also limit the type of offsetNanoseconds to "an integer in the range of -86,400,000,000,000 to 86,400,000,000,000". (I forget whether the range is inclusive or exclusive at both ends.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This operation is defined in Temporal and will be removed from ECMA-402 and/or this PR once added to ECMA-262. I will propagate any upstream changes.

Copy link
Contributor

@justingrant justingrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

1. If the result of IsValidTimeZoneName(_timeZone_) is *false*, then
1. Throw a *RangeError* exception.
1. If IsTimeZoneOffsetString(_timeZone_) is *true*, then
1. Let _offsetNanoseconds_ be ParseTimeZoneOffsetString(_timeZone_).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a new CanonicalizeTimeZoneOffsetString AO in Temporal. Should this be brought into 402 to replace this line and the one below it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to keep changes minimal here, but intend to make use of CanonicalizeTimeZoneOffsetString once it lands in ECMA-262.

Copy link
Contributor

@FrankYFTang FrankYFTang Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand why not just change the definition of HourSubcomponents in ECMA262 instead
Currently, it is

HourSubcomponents[Extended] :::
  TimeSeparator[?Extended] MinuteSecond
  TimeSeparator[?Extended] MinuteSecond TimeSeparator[?Extended] MinuteSecond TemporalDecimalFractionopt

why not change it to just

HourSubcomponents[Extended] :::
  TimeSeparator[?Extended] MinuteSecond

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place reference to it is UTCOffset and the only places refence to UTCOffset are
21.4.1.33.1 IsTimeZoneOffsetString ( offsetString )
and
21.4.1.33.2 ParseTimeZoneOffsetString ( offsetString )

which are exactly the two places we need to change and both places are only used by place dealing with system timezone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrankYFTang An ECMA-262 grammar cannot be changed from an ECMA-402 pull request. But all of this will be simplified by/in response to Temporal, as demonstrated in #788 (comment) .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I do not understand is why do we have this PR for ECMA402 instead of just let Temporal to change both ECMA262 and 402. What is the urgent need for that to happen before Temporal reach stage 4?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, the main reason is that offset time zones are currently supported in (pre-Temporal) ECMA-262. My assumption is that all ECMA-262 features should be supported when ECMA-402 is also implemented.

Also, do we know yet whether ICU and/or CLDR changes will be needed to enable formatting of offset time zones? If no, that's great news. If yes, then where do changes need to be made?

@FrankYFTang
Copy link
Contributor

FrankYFTang commented Jun 1, 2023

Richard point out https://www.ietf.org/archive/id/draft-ietf-sedate-datetime-extended-08.html require nanoseconds in the TimeZone but I cannot see from that spec of such requirement.

what I see is

time-zone         = "[" critical-flag  
                        time-zone-name / time-numoffset "]"

date-time and time-numoffset are imported from Section 5.6 of [RFC3339]

   time-hour       = 2DIGIT  ; 00-23
   time-minute     = 2DIGIT  ; 00-59
   time-second     = 2DIGIT  ; 00-58, 00-59, 00-60 based on leap second
                             ; rules
   time-secfrac    = "." 1*DIGIT
   time-numoffset  = ("+" / "-") time-hour ":" time-minute
   time-offset     = "Z" / time-numoffset

   partial-time    = time-hour ":" time-minute ":" time-second
                     [time-secfrac]

so.... where is the nanosecond requirement?

@gibson042
Copy link
Contributor Author

so.... where is the nanosecond requirement?

It will be discussed upstream at tc39/ecma262#3087 and tc39/proposal-temporal#2593 .

@sffc
Copy link
Contributor

sffc commented Jun 2, 2023

TG2 discussion: https://github.com/tc39/ecma402/blob/master/meetings/notes-2023-06-01.md#normative-allow-utc-offset-time-zones-788

We need to resolve the minutes/seconds/centiseconds/nanoseconds first, but it seems okay to the group besides that point, which was somewhat controversial for a number of reasons.

@sffc
Copy link
Contributor

sffc commented Jun 30, 2023

TG2 discussion: https://github.com/tc39/ecma402/blob/master/meetings/notes-2023-06-29.md#normative-allow-utc-offset-timezones

We'd like to wait until the Temporal changes land and then revisit this PR after it has been updated.

@gibson042
Copy link
Contributor Author

@sffc Let's please also discuss it at the next TG2 meeting regardless.

@FrankYFTang
Copy link
Contributor

@sffc Let's please also discuss it at the next TG2 meeting regardless.

added to https://github.com/tc39/ecma402/projects/2

@sffc
Copy link
Contributor

sffc commented Sep 7, 2023

Do we support formatting these with timeZoneName "shortOffset" and "longOffset"?

We should think about the formatting fallback given the six time zone format options: "short", "long", "shortOffset", "longOffset", "shortGeneric", "longGeneric". We discussed the fallback between these at length in the time zone format options proposal a couple of years ago. It shows up in BasicFormatMatcher: https://tc39.es/ecma402/#sec-basicformatmatcher

@FrankYFTang
Copy link
Contributor

Please write some example of the timeZone which previous throw but after this PR will be accepted . Thanks

@gibson042
Copy link
Contributor Author

PR updated to remove support for UTC offsets with sub-minute precision, just like Temporal.

@gibson042
Copy link
Contributor Author

Please write some example of the timeZone which previous throw but after this PR will be accepted . Thanks

@FrankYFTang added to the PR description.

@sffc
Copy link
Contributor

sffc commented Sep 7, 2023

1. If IsTimeZoneOffsetString(_timeZone_) is *true*, then
1. Let _parseResult_ be ParseText(StringToCodePoints(_timeZone_), |UTCOffset|).
1. Assert: _parseResult_ is a Parse Node.
1. If _parseResult_ contains more than one |MinuteSecond| Parse Node, throw a *RangeError* exception.
Copy link
Contributor

@FrankYFTang FrankYFTang Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how does "contains more than one |MinuteSecond| Parse Node" work here.
How is the "contains" operation defined for Parse Node? I cannot find a concrete definition.

Look at the UTCOffset grammar in https://tc39.es/ecma262/#sec-time-zone-offset-string-format

UTCOffset :::
TemporalSign Hour
TemporalSign Hour HourSubcomponents[+Extended]
TemporalSign Hour HourSubcomponents[~Extended]

So if the concept of "contains" is shallow, then it would be impossible for UTCOffset to contain any MinuteSecond.
since 1 or 2 MinuteSecond could be contain inside the HourSubcomponents that contained by UTCOffset. However, the require a "deep containment" interpretation and I am not sure that is what ECMA262 specify. (since I cannot find a clear definition of what "contains" mean for Parse Node.

Copy link
Contributor Author

@gibson042 gibson042 Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FrankYFTang This use of "contains" is prose of the sort that is also currently in Temporal ParseTimeZoneIdentifier—as suggested by the lack of automatic hyperlinking, there is no concrete definition. However, as I thought would be clear from context in both documents, it is intended to be understood as entailing deep inspection of the tree of Parse Nodes. Can you suggest a change here that would make that more clear?

I could define a precise syntax-directed operation, but really don't want to do that because it would apply cross-document to the ECMA-262 |UTCOffset| nonterminal that will be replaced by Temporal |TimeZoneUTCOffsetName| (creating a situation where this part of ECMA-402 would be incoherent until fixed).

It's also worth noting that this explicit check for multiple |MinuteSecond| Parse Nodes will itself be removed when Temporal lands, along with many of the other steps here that are subsumed by ParseTimeZoneIdentifier:

+        1. Let _timeZoneRecord_ be ? ParseTimeZoneIdentifier(_timeZone_).
-        1. If IsTimeZoneOffsetString(_timeZone_) is *true*, then
-          1. Let _parseResult_ be ParseText(StringToCodePoints(_timeZone_), |UTCOffset|).
-          1. Assert: _parseResult_ is a Parse Node.
-          1. If _parseResult_ contains more than one |MinuteSecond| Parse Node, throw a *RangeError* exception.
-          1. Let _offsetNanoseconds_ be ParseTimeZoneOffsetString(_timeZone_).
-          1. Let _offsetMinutes_ be _offsetNanoseconds_ / (6 × 10<sup>10</sup>).
-          1. Assert: _offsetMinutes_ is an integer.
+        1. If _timeZoneRecord_.[[Name]] is ~empty~, then
+          1. Let _offsetMinutes_ be _timeZoneRecord_.[[OffsetMinutes]].
           1. Set _timeZone_ to FormatOffsetTimeZoneIdentifier(_offsetMinutes_).
-        1. Else if IsValidTimeZoneName(_timeZone_) is *true*, then
-          1. Set _timeZone_ to CanonicalizeTimeZoneName(_timeZone_).
+        1. Else if IsValidTimeZoneName(_timeZoneRecord_.[[Name]]) is *true*, then
+          1. Set _timeZone_ to CanonicalizeTimeZoneName(_timeZoneRecord_.[[Name]]).
         1. Else,
           1. Throw a *RangeError* exception.

@FrankYFTang

This comment was marked as off-topic.

@FrankYFTang FrankYFTang closed this Sep 9, 2023
@FrankYFTang FrankYFTang reopened this Sep 9, 2023
@FrankYFTang
Copy link
Contributor

sorry, I click the wrong bottom.

@FrankYFTang
Copy link
Contributor

Trying to implement in v8 https://chromium-review.googlesource.com/c/v8/v8/+/4854730

@gibson042

This comment was marked as off-topic.

@FrankYFTang
Copy link
Contributor

ICU SimpleTimeZone created by TimeZone::createInstance will handle the offset timezone but use id prefix with "GMT"
so it will be "GMT+01:30" or "GMT-05:33" etc. My v8 implementation just check if it is starting with "GMT" and then remove that "GMT" prefix

@FrankYFTang
Copy link
Contributor

One problem we have with the implementation is we can take the input timezone as
"+00"
"-00"
"−00"
"+0000"
"-0000"
"−0000"
"+00:00"
"-00:00"
"−00:00"

but we it will be inefficient to distinguish it from "UTC" from the resolvedOptions. Could we treat them as simply as "UTC" in the spec text? (so resolvedOptions().timeZone will return "UTC" for these cases.

FrankYFTang added a commit to FrankYFTang/test262 that referenced this pull request Sep 12, 2023
@FrankYFTang
Copy link
Contributor

FrankYFTang commented Sep 12, 2023

please take a look at tc39/test262#3917
At first I have some concern about the following name

"+00"
"-00"
"−00"
"+0000"
"-0000"
"−0000"
"+00:00"
"-00:00"
"−00:00"

But after I dig into it more I think the current PR will make them all become timeZone "+00:00" and that should be fine.

@FrankYFTang
Copy link
Contributor

FrankYFTang commented Sep 12, 2023

My understanding is for the following code

(new Intl.DateTimeFormat(undefined, {timeZone}).resolvedOptions().timeZone

should get the following result:
"+00" => "+00:00"
"-00" => "+00:00"
"−00" => "+00:00"
"+0000" => "+00:00"
"-0000" => "+00:00"
"−0000" => "+00:00"
"+00:00" => "+00:00"
"-00:00" => "+00:00"
"−00:00" => "+00:00"
"+2359" => "+23:59"
"+23:59" => "+23:59"
"-2359" => "-23:59"
"-23:59" => "-23:59"
"−2359" => "-23:59"
"−23:59" => "-23:59"

Please comment if my understanding is incorrect. thanks

@justingrant
Copy link
Contributor

justingrant commented Sep 12, 2023

"-00" => "-00:00"

Nope. -00, -00:00, -0000, +00, and +0000 should all be normalized to +00:00. Here's why:

String IDs of built-in time zones (either offset or IANA) that are returned by ECMAScript methods (including getters) should always be normalized. For offset zones, normalized means ±HH:MM, with -00:00 not allowed. For IANA time zones, normalized means matching the letter case used in the IANA time zone database. ECMAScript methods should accept non-normalized formats like +00 or -00:00, but should never return a time zone ID string except in a normalized format.

This normalization behavior is specified in https://tc39.es/proposal-temporal/#sec-time-zone-identifiers, as well as in AO spec text in both Temporal and this PR. (If you find a case where an ECMAScript method is allowed to return a non-normalized ID for a built-in offset time zone or built-in named time zone, then it's a spec bug.)

The reason why we require normalized outputs is so that implementations won't have to store the original ID strings that users provided. Instead, implementations are free to store timezone ID slots in a more optimized way, as long as the normalized ID string can be reconstituted later as needed. (e.g. in id or timeZoneId getters).

For example, an implementation could optimize storage of a timezone ID slot using a 16-bit union:

  • one bit to mark the ID as an offset time zone ID or IANA ID
  • if offset, a 12-bit signed number of minutes (from -1439 representing -23:59 to +1439 representing +23:59)
  • if IANA, a 10-bit unsigned index into a cached array of the ~600 IANA ID strings

Hopefully this explains why "-00" => "+00:00" is the expected behavior.

@justingrant
Copy link
Contributor

it will be inefficient to distinguish it from "UTC" from the resolvedOptions. Could we treat them as simply as "UTC" in the spec text? (so resolvedOptions().timeZone will return "UTC" for these cases.

I assume that we want new Intl.DateTimeFormat('en', {timeZone}).resolvedOptions().tineZone to always match Temporal.TimeZone.from(timeZone).id. Making them different would be confusing for users. Assuming they should behave the same, then:

  1. Normalizing the offset time zone ID "+00:00" into the IANA time zone ID "UTC" would be a normative change to Temporal and I'm pretty sure also a normative change to pre-Temporal ECMA-262. Without digging into the specs it's hard to know how much churn this change would cause, but the bar for normative changes like this is very high so the default should be to avoid making changes like this.

  2. There's also a user-facing advantage of leaving them separate, because UTC and +00:00 do have a slight semantic difference. UTC has a special meaning as a "computers-only" time zone; no humans are in a place where "UTC" is their time zone ID. On the other hand, +00:00 is interpreted to mean an offshore time zone where humans are, like a boat 500km south of Liberia. So even though those two zones return identical results, it can be helpful for a userland app to know the difference. For example, an app may choose to show a user a warning message, e.g. "No humans live in a place whose time zone ID is 'UTC'. Was this a mistake?".

So unless there's a very large performance advantage of making this change, then I'd like to leave the current spec as-is so that +00:00 and UTC are treated as distinct time zones, even though they always result in the same numeric results.

@FrankYFTang
Copy link
Contributor

I drop my early concern about "+00:00" vs "UTC". It is not a problem.

@Louis-Aime
Copy link

Louis-Aime commented Sep 14, 2023

I understand this PR is about letting an author specify a time zone with a simple string, as an offset to UTC, in the context of Temporal. I withdraw the concern I had during last meeting. Temporal.TimeZone has all functions for the use cases I had in mind.

However we should then stick to the documentation for timeZone.getOffsetString():

This method is similar to timeZone.getOffsetNanosecondsFor(), but returns the offset formatted as a string, with sign, hours, and minutes.

The present polyfill gives also the seconds:
mytz = new Temporal.TimeZone('Europe/Paris');
mytz.getOffsetStringFor('1900-02-01T12:00Z'); // '00:09:21'

@justingrant
Copy link
Contributor

justingrant commented Sep 14, 2023

I understand this PR is about letting an author specify a time zone with a simple string, as an offset to UTC, in the context of Temporal. I withdraw the concern I had during last meeting. Temporal.TimeZone has all functions for the use cases I had in mind.

However we should then stick to the documentation for timeZone.getOffsetString():

This method is similar to timeZone.getOffsetNanosecondsFor(), but returns the offset formatted as a string, with sign, hours, and minutes.

The present polyfill gives also the seconds: mytz = new Temporal.TimeZone('Europe/Paris'); mytz.getOffsetStringFor('1900-02-01T12:00Z'); // '00:09:21'

The documentation is wrong. If a built-in IANA time zone (or a custom time zone that inherits from a built-in IANA zone) has sub-minute offsets, then TimeZone.p.getOffsetStringFor and ZonedDateTime.p.offset should return the string including sub-minute values. The limitation to minutes precision applies only to offset time zones, not to IANA zones.

I'm working on a PR (EDIT: tc39/proposal-temporal#2674) of those docs to fix the problem.

@FrankYFTang
Copy link
Contributor

This PR reach consensus on 2023-09-26 meeting

@ryzokuken ryzokuken added has consensus (TG1) Has consensus from TC39-TG1 has consensus Has consensus from TC39-TG2 labels Sep 26, 2023
@ryzokuken
Copy link
Member

Is this still blocked on something or should I hit merge?

ptomato pushed a commit to FrankYFTang/test262 that referenced this pull request Sep 26, 2023
ptomato pushed a commit to tc39/test262 that referenced this pull request Sep 26, 2023
* Add test for ECMA402 PR 788

tc39/ecma402#788

* Fix misunderstanding about "+00:00"

* Fix lint

* Swap actual, expected position

* Update test/intl402/DateTimeFormat/prototype/resolvedOptions/offset-timezone-change.js

Co-authored-by: Richard Gibson <richard.gibson@gmail.com>

* Update test/intl402/DateTimeFormat/prototype/resolvedOptions/offset-timezone-basic.js

Co-authored-by: Richard Gibson <richard.gibson@gmail.com>

* Update test/intl402/DateTimeFormat/prototype/formatToParts/offset-timezone-correct.js

Co-authored-by: Richard Gibson <richard.gibson@gmail.com>

* Update test/intl402/DateTimeFormat/constructor-invalid-offset-timezone.js

Co-authored-by: Richard Gibson <richard.gibson@gmail.com>

* Update test/intl402/DateTimeFormat/constructor-invalid-offset-timezone.js

Co-authored-by: Richard Gibson <richard.gibson@gmail.com>

* Update test/intl402/DateTimeFormat/prototype/resolvedOptions/offset-timezone-change.js

Co-authored-by: Richard Gibson <richard.gibson@gmail.com>

* Update test/intl402/DateTimeFormat/constructor-invalid-offset-timezone.js

Co-authored-by: Richard Gibson <richard.gibson@gmail.com>

* Update test/intl402/DateTimeFormat/prototype/format/offset-timezone-gmt-same.js

Co-authored-by: Richard Gibson <richard.gibson@gmail.com>

* Update test/intl402/DateTimeFormat/prototype/format/offset-timezone-gmt-same.js

Co-authored-by: Richard Gibson <richard.gibson@gmail.com>

* Update test/intl402/DateTimeFormat/prototype/formatToParts/offset-timezone-correct.js

Co-authored-by: Richard Gibson <richard.gibson@gmail.com>

---------

Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
@FrankYFTang
Copy link
Contributor

please merge

@ryzokuken ryzokuken merged commit e25c455 into tc39:master Sep 27, 2023
4 checks passed
@justingrant
Copy link
Contributor

Congratulations everyone for getting this over the finish line!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has consensus (TG1) Has consensus from TC39-TG1 has consensus Has consensus from TC39-TG2 normative
Projects
Status: Previously Discussed
Development

Successfully merging this pull request may close these issues.

ECMA-402 should allow numeric-offset host time zones
7 participants