Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i18n: use locale for all numeric formatting in the report #10786

Open
exterkamp opened this issue May 14, 2020 · 12 comments
Open

i18n: use locale for all numeric formatting in the report #10786

exterkamp opened this issue May 14, 2020 · 12 comments
Assignees
Labels
i18n internationalization thangs P1.5

Comments

@exterkamp
Copy link
Member

The titles for the new diagnostics (although I predict any ICU plural we use with an explicit "1") are not rendered properly with non-standard numeric representations.
image

This is because the ICU decleration for the "1" case is explicitly an en "1".

  displayValue: `{nodeCount, plural,
    =0 {No elements found}
    =1 {1 element found} // this uses the literal "1"
    other {# elements found} // this uses the number, like the character "١"
    }`,
@patrickhulce
Copy link
Collaborator

patrickhulce commented May 14, 2020

I'm not sure I understand this one @exterkamp, could you explain a tad more? Isn't it the role of the translators to author the appropriate translated plurals and we're only writing the english ones?

Based on my understanding, 1 element found would never use a number other than 1...

EDIT: I'm also not clear what were we supposed to write instead :)

@exterkamp
Copy link
Member Author

ar uses the eastern arabic numeral set: Eastern Arabic ٠ | ١ | ٢ | ٣ | ٤ | ٥ | ٦ | ٧ | ٨ | ٩ | ١٠
Notice this is "4 elements found", but uses a "٤" instead of "4":
image
Now look at how it handles "1 element found":
image

It should say "١ element found"; however the ICU plural for =1 is the literal "1 element found". And I don't think the translators will use eastern arabic numerals consistently with what we do (we always use them). Additionally I doubt they will use thai or traditional Chinese numerics as well if we ever choose to enforce those characters: https://github.com/GoogleChrome/lighthouse/tree/i18n-numbers
image
image
image

I think we will need to use the replacement for =1 as well. Like this:

  displayValue: `{nodeCount, plural,
    =0 {No elements found}
    =1 {# element found}
    other {# elements found}
    }`,

That said, I don't think this is high priority, since our numeric i18n story is all over the place...
image

@patrickhulce
Copy link
Collaborator

patrickhulce commented May 15, 2020

Notice this is "4 elements found", but uses a "٤" instead of "4":
Now look at how it handles "1 element found":

I got that part, what I was asking is why does it matter? For ar we won't be using the message 1 element found we'll be using something else written by the translator that's written in that locale and that message should use the correct numeral or # in place of it, right?

I don't think the translators will use eastern arabic numerals consistently

I think this answers my real question, but let me double check. I am understanding this correctly that the concern is translators will use whatever numeral we write in the english string and not use whatever numeral is appropriate for the locale they are translating to and/or appropriating substitute # for it?

@exterkamp
Copy link
Member Author

translators will use whatever numeral we write in the english string and not use whatever numeral is appropriate for the locale they are translating to and/or appropriating substitute # for it

Correct. In the same way that our markdown was getting ruined by some translators replacing " ` " with " ' ". Translating is not as consistent as we want it to be. I think if we try to become more consistent with numerals of other languages, then we should probably look into this as an issue.

Is it an issue now? meh
Is it something that we should handle in the future? probably

@connorjclark
Copy link
Collaborator

connorjclark commented May 15, 2020

can we add a test asserting there is no:

  • `
  • [
  • ]
  • (
  • )

maybe not an assert, since we couldn't easily fix it (unless you can just edit it in TC?) maybe just autofix and warn

@exterkamp
Copy link
Member Author

can we add a test asserting there is no:

  • `
  • [
  • ]
  • (
  • )

maybe not an assert, since we couldn't easily fix it (unless you can just edit it in TC?) maybe just autofix and warn

This seems unrelated. Is there a problem with markdown now? I was talking prior to the ICU escaping & CTC files.

@connorjclark
Copy link
Collaborator

this is what we have today

image

in general we need to fix numeral i18n

@connorjclark connorjclark added i18n internationalization thangs P1.5 labels Apr 5, 2022
@connorjclark
Copy link
Collaborator

ok well the numerics are translated properly for other languages in the metrics table.

image

arabic isn't getting localized numbers because intl-messageformat doesn't support it i guess.

We are on an old version of that package, perhaps upgrading is in order.

anyway if we want arabic numerals we should use a different locale code. ar-u-nu-arab see this

not here tho
image

@connorjclark
Copy link
Collaborator

can get arabic numerals from CLI with no code changes:

yarn start https://www.example.com/ -A --view --locale=ar-u-nu-arab

image

(but not the gauge..)

@brendankenny
Copy link
Member

brendankenny commented Apr 6, 2022

arabic isn't getting localized numbers because intl-messageformat doesn't support it i guess.

FWIW, this comes from the browser's formatting which intl-messageformat uses. e.g. (50).toLocaleString('ar') gives '50' while (50).toLocaleString('ar-EG') gives '٥٠'.

Reading about this today, I believe this may be intentional. Basically there are multiple types of digits depending on the region (ar-eg, ar-tn, etc), and the decision was made that ar (aka ar-001 or world-wide ar) would use ascii/latin digits as it would be most widely understood, even if it was likely not the user's usual choice of digit.

This CLDR thread gives some background, though I haven't found the follow up yet that moved it from testing to default.

@connorjclark
Copy link
Collaborator

I agree, it seems the expectation is that "ar" alone is latin numerals. LH is doing things correctly here already.

@connorjclark connorjclark changed the title i18n: ICU plurals using "1" i18n: use locale for all numeric formatting in the report May 23, 2022
@connorjclark
Copy link
Collaborator

connorjclark commented May 23, 2022

Some of this code was recently modified via #13830

We should make sure we are using the number formatter for all numbers in the report. Some places we're (probably) missing today: the gauge, the score legend. stuff in the meta block / tooltips

@connorjclark connorjclark self-assigned this May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n internationalization thangs P1.5
Projects
None yet
Development

No branches or pull requests

5 participants