Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datagen: Change time zone untagged enum resolution to work with CLDR 44 #4158

Merged
merged 1 commit into from
Oct 16, 2023

Conversation

sffc
Copy link
Member

@sffc sffc commented Oct 13, 2023

Part of #4155

CLDR 44 now includes data such as the following:

{
  "main": {
    "fi": {
      "dates": {
        "timeZoneNames": {
          "zone": {
            "America": {
              "Thule": {
                "exemplarCity-alt-secondary": "Qaanaaq"
              }
            },

We need to detect that this is a name table and not a sub-region table. Previously we used the presence of either exemplarCity, short, or long to make this determination. I am changing the code to also check for exemplarCity-alt-secondary. I'm also reversing the polarity of the check to make the code cleaner.

@sffc sffc requested review from srl295 and echeran and removed request for robertbastian and Manishearth October 13, 2023 23:55
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add a fn exemplar_city_alt_secondary(&self) -> Option<String> in the impl Location {...} block to match the new field in the Location struct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not unless we need it. It's dead code otherwise. I was considering just deleting the functions and accessing the fields directly.

@sffc
Copy link
Member Author

sffc commented Oct 16, 2023

@srl295 Do you have a suggestion on a better way to achieve this?

We have an object which could be one or the other shape, and the CLDR JSON schema tells us nothing about which one is in that position. I don't know how else to determine whether this is a "SubRegion" or "Location" node other than checking for the presence of certain keys, which is brittle (it broke in CLDR 44).

@sffc
Copy link
Member Author

sffc commented Oct 16, 2023

To clarify: the JSON looks like

          "zone": {
            "America": {
              "Asuncion": {
                "exemplarCity": "Asunción"
              },
              "Bahia_Banderas": {
                "exemplarCity": "Bahia Banderas"
              },
...
              "Havana": {
                "exemplarCity": "Havanna"
              },
              "Indiana": {
                "Vincennes": {
                  "exemplarCity": "Vincennes, Indiana"
                },
                "Petersburg": {
                  "exemplarCity": "Petersburg, Indiana"
                },
...
                "Vevay": {
                  "exemplarCity": "Vevay, Indiana"
                }
              },
              "Jamaica": {
                "exemplarCity": "Jamaika"
              },

Here, zones/America/Asuncion is a Location node (a leaf node) whereas zones/America/Indiana is a SubRegion node (has Location nodes as children). How is a parser supposed to tell the difference?

@srl295
Copy link
Member

srl295 commented Oct 16, 2023

It's like duck typing but some of the ducks flew off.

OK… what about adding a "Indiana": { "_type": "SubRegion", … } ? As a new key, it shouldn't conflict. It would be under "America": { also.

@srl295
Copy link
Member

srl295 commented Oct 16, 2023

What's the urgency here? If this is fragile-but-works, could we propose _type (could note it as a known issue) for CLDR 45?

One observation, is that using / to create subnodes was perhaps counterproductive here. if it was just flat

{
   "zones": {
        "America/Ascuncion": { }
        "America/Indiana/Vincennes": { } 
    }
}

… then you wouldn't have the issue.

@srl295
Copy link
Member

srl295 commented Oct 16, 2023

please file a CLDR issue in any event.

@sffc
Copy link
Member Author

sffc commented Oct 16, 2023

It's like duck typing but some of the ducks flew off.

OK… what about adding a "Indiana": { "_type": "SubRegion", … } ? As a new key, it shouldn't conflict. It would be under "America": { also.

Something along these lines would be helpful. Good to know we can move in this direction.

I don't consider this a CLDR 44 blocker though, because we had to do the duck typing in CLDR 43 as well, just the rules changed. I'll open an issue to investigate this for CLDR 45.

@sffc
Copy link
Member Author

sffc commented Oct 16, 2023

One observation, is that using / to create subnodes was perhaps counterproductive here. if it was just flat

{
   "zones": {
        "America/Ascuncion": { }
        "America/Indiana/Vincennes": { } 
    }
}

… then you wouldn't have the issue.

A flat structure would be easier to parse. Our parser basically needs to transform the table into a form more like this one, with the "/" glue, before it can do anything else.

@srl295
Copy link
Member

srl295 commented Oct 16, 2023

maybe { "_subRegion": true } would be more compact somehow, rather than creating an open ended enumeration.

Wait a second, though, this is trying to make the localization file self-describing. The _alias field of cldr-bcp47/bcp47/timezone.json already has an exhaustive enumeration of all valid tzids. Can't you just use that to define the shape?

  • America/Asuncion
  • America/Indiana/Vincennes

@sffc
Copy link
Member Author

sffc commented Oct 16, 2023

@sffc
Copy link
Member Author

sffc commented Oct 16, 2023

@srl295 I'll merge this since it is an improvement over what we currently have and I am tracking follow-on work (using timezone.json) in #4044

@sffc sffc merged commit fef13b6 into unicode-org:main Oct 16, 2023
28 checks passed
@sffc sffc deleted the datagen-datetime-fix branch October 16, 2023 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants