Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i18n: localize units in report formatter #13830

Merged
merged 14 commits into from
Apr 19, 2022
Merged

Conversation

connorjclark
Copy link
Collaborator

@connorjclark connorjclark commented Apr 6, 2022

  • All number time formatters now use Intl.NumberFormatter with the unit field
  • So does formatBytes, but the larger byte formatters could not use Intl units because binary units are currently not supported
  • formatDuration is now localized
  • formatNumber now does not drop significant digits (ex: formatNumber(10, 0.1) => 10.0, but before it was 10. This matches the behavior of how granularity was implemented for _byteFormatterForGranularity, and better conveys the precision/rounding to the user (should they even notice)
  • moved all the granularity stuff into a shared function

Even with this change, you'll notice that many time units are still not localized:

image

Only the opportunity audits display values use our report/renderer/i18n.js formatter

displayEl.textContent = Util.i18n.formatSeconds(details.overallSavingsMs, 0.01);

The other audit's display value field is used directly, and is hardcoding the unit:

str_(i18n.UIStrings.seconds, {timeInMs: totalBootupTime}) : '',

seconds: '{timeInMs, number, seconds}\xa0s',

We have the numeric value and the unit in the audit results, so we should format the value in the report directly. We can also look into using our formatter directly in the audit result so that displayValue in JSON is still localized and with the right unit. I suspect it may require a refactor since this formatter module is in the report, so this will be a follow up PR.

@connorjclark connorjclark requested a review from a team as a code owner April 6, 2022 23:04
@connorjclark connorjclark requested review from adamraine and removed request for a team April 6, 2022 23:04
@connorjclark
Copy link
Collaborator Author

One option for formatting KiB/MiB (the binary prefixes) would be to use kilobyte as the unit but for english manually format as KiB/MiB. That way, we stick with our intention to use the less ambiguous unit in english (#10870) but for other languages at least show their equivalent of KB/MB (and for all we know, the same confusion does not exist in their locale)

number = Math.round(number / granularity) * granularity;

// Avoid displaying a negative value that rounds to zero as "0".
if (Object.is(number, -0)) number = 0;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could have done if (number === 0) number = 0 but it was just too much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if JS added ==== so -0 ==== 0 would be false 🤪

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IEEE-754 requires that they compare as equal 🤓


return new Intl.NumberFormat(this._locale, opts).format(number).replace(' ', NBSP2);
}

/**
* Format number.
* @param {number} number
* @param {number=} granularity Number of decimal places to include. Defaults to 0.1.
* @return {string}
*/
formatNumber(number, granularity = 0.1) {
Copy link
Collaborator Author

@connorjclark connorjclark Apr 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps this default should be 1? it would match the previous behavior, such that integers are formatted without a fractional component

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead I made a new function formatInteger

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it's reasonable to show the extra fractional digit when granularity is 0.1 (which is what #11489 did for bytes), and maybe we don't really want 0.1 as the default based on how we usually don't do anything about it, but that's annoying to audit and formatInteger is a great solution :)

@connorjclark
Copy link
Collaborator Author

connorjclark commented Apr 7, 2022

We have the numeric value and the unit in the audit results, so we should format the value in the report directly. We can also look into using our formatter directly in the audit result so that displayValue in JSON is still localized and with the right unit. I suspect it may require a refactor since this formatter module is in the report, so this will be a follow up PR.

Perhaps a better solution could be to upgrade our icu package and use the new skeleton format in our icu message

see:

https://messageformat.github.io/messageformat/guide/#number

https://github.com/unicode-org/icu/blob/main/docs/userguide/format_parse/numbers/skeletons.md

lighthouse-core/audits/bootup-time.js Outdated Show resolved Hide resolved
number = Math.round(number / granularity) * granularity;

// Avoid displaying a negative value that rounds to zero as "0".
if (Object.is(number, -0)) number = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if JS added ==== so -0 ==== 0 would be false 🤪

@@ -147,8 +147,8 @@ class TreemapUtil {
* @param {string} unit
*/
static format(value, unit) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use this function anywhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤭

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol i'll delete it

`${Util.i18n.formatNumber(throttling.downloadThroughputKbps)}${NBSP}Kbps down, ` +
`${Util.i18n.formatNumber(throttling.uploadThroughputKbps)}${NBSP}Kbps up (DevTools)`;
networkThrottling = `${Util.i18n.formatMilliseconds(requestLatencyMs, 1)} HTTP RTT, ` +
`${Util.i18n.formatInteger(throttling.downloadThroughputKbps)}${NBSP}Kbps down, ` +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can do unit: 'kilobit-per-second' for these since it's Kbps (not...Kibps?). It'll be kb/s instead of Kbps though

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds better to me!

* @return {string}
*/
_formatNumberWithGranularity(number, granularity, opts = {}) {
opts = {...opts};

This comment was marked as resolved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i had my cs prof yelling in my head "don't mutate input parameters!" so i copied it

_formatNumberWithGranularity(number, granularity, opts = {}) {
opts = {...opts};
const log10 = -Math.log10(granularity);
if (!Number.isFinite(log10) || (granularity > 1 && Math.floor(log10) !== log10)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if (!Number.isFinite(log10) || !Number.isInteger(log10)) { is sufficient if we just want powers of 10

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a test that uses 0.5 as a value here:

and given we pass granularity through a details object throwing here may even be a breaking change... but good thing we are in pre-10.0 mode :)

hmmm throwing in report code is probably bad anyhow. wish we had a DCHECK (cc @exterkamp lol)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a test that uses 0.5 as a value here:

ha, well that looks like something that was supported so it was nice to add a test, rather that something we've ever used or wanted to use, but if we want to keep it, if (!Number.isFinite(log10) || (granularity > 1 && !Number.isInteger(log10))) is slightly easier to understand

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some reason i had no idea isInteger existed


number = Math.round(number / granularity) * granularity;

// Avoid displaying a negative value that rounds to zero as "0".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signDisplay: 'negative' will finally let this be handled how signDisplay should have defaulted in the first place :) Firefox only for now though.

number = Math.round(number / granularity) * granularity;

// Avoid displaying a negative value that rounds to zero as "0".
if (Object.is(number, -0)) number = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IEEE-754 requires that they compare as equal 🤓

// Avoid displaying a negative value that rounds to zero as "0".
if (Object.is(number, -0)) number = 0;

return new Intl.NumberFormat(this._locale, opts).format(number).replace(' ', NBSP2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it should be fine to do the replace thing, though it feels dirty and maybe there's somehow an issue with rtl or something? 🤷

Do we still need it? I assume we do, but it would be cool if our various layout initiatives over the years made it unnecessary to keep them from breaking across lines

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, for new Intl.NumberFormat('fr', {style: 'unit', unit: 'second', unitDisplay: 'long'}).format(5) ('5 secondes'), the second character is NBSP2, but that doesn't appear true for most (all?) other languages.

new Intl.NumberFormat('fr', {style: 'unit', unit: 'second', unitDisplay: 'short'}) has \u202f (narrow no-break space) as its second character.

Also, languages disagree on spacing of units :/
new Intl.NumberFormat('en-US', {style: 'unit', unit: 'byte', unitDisplay: 'narrow'}).format(5) -> '5B'
new Intl.NumberFormat('en-US', {style: 'unit', unit: 'byte', unitDisplay: 'narrow'}).format(5) -> '5 B'

Copy link
Collaborator Author

@connorjclark connorjclark Apr 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omg this is infuriatingly inconsistent. no way this is due to locale differences, these control characters are formatting and what locale would expect numbers to be disassociated from their unit by newlines?? this has to be a mistake in the localization library...

your fr example is odd because the CLDR data has 32 space for long but narrow non-break space for short, based on Intl output I'd expect the former to show NBSP2. (narrow has no whitespace)

One can sort of see reason for there being a non-breaking space between the number and the short string but not for the longer one, as without that number the shorter strings being further from their number gives them far less context.

(your last example is the same two lines of code btw. i'd expect narrow to never include any whitespace. I see de does though... I think for some languages CLDR doesn't bother defining narrows strings so it defaults to short)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One can sort of see reason for there being a non-breaking space between the number and the short string but not for the longer one, as without that number the shorter strings being further from their number gives them far less context.

totally, but why isn't it the same across languages?? Or at least, why wouldn't english (or german or spanish or...) do it as well?

(your last example is the same two lines of code btw. i'd expect narrow to never include any whitespace)

oops, I meant de for the second one:

new Intl.NumberFormat('en-US', {style: 'unit', unit: 'byte', unitDisplay: 'narrow'}).format(5) -> '5B'
new Intl.NumberFormat('de', {style: 'unit', unit: 'byte', unitDisplay: 'narrow'}).format(5) -> '5 B'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need it? I assume we do, but it would be cool if our various layout initiatives over the years made it unnecessary to keep them from breaking across lines

image

hmm....


return new Intl.NumberFormat(this._locale, opts).format(number).replace(' ', NBSP2);
}

/**
* Format number.
* @param {number} number
* @param {number=} granularity Number of decimal places to include. Defaults to 0.1.
* @return {string}
*/
formatNumber(number, granularity = 0.1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it's reasonable to show the extra fractional digit when granularity is 0.1 (which is what #11489 did for bytes), and maybe we don't really want 0.1 as the default based on how we usually don't do anything about it, but that's annoying to audit and formatInteger is a great solution :)

unit,
unitDisplay: 'narrow',
});
const label = unitFormatter.formatToParts(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like it could be problematic to use 0 here for languages with different (abbreviated) unit names for plurals? Might not be a problem for any of our languages, but this also seems to be a very roundabout way vs concatenating calls to _formatNumberWithGranularity. What's the motivation? May be worth a comment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, better to call that function directly

@@ -431,9 +431,9 @@ class Util {
case 'devtools': {
const {cpuSlowdownMultiplier, requestLatencyMs} = throttling;
cpuThrottling = `${Util.i18n.formatNumber(cpuSlowdownMultiplier)}x slowdown (DevTools)`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably formatInteger on the cpu throttling as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we don't ever assign this value to non-integer numbers, others might via the CLI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point. 2.0x looks kind of dumb, though :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hold up why don't we just do ${cpuSlowdownMultiplier}x

this entire string gets a F for i18n atm anyway so we still need to do this correctly with a ICU message

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right, arabic

@brendankenny
Copy link
Member

brendankenny commented Apr 7, 2022

We have the numeric value and the unit in the audit results, so we should format the value in the report directly. We can also look into using our formatter directly in the audit result so that displayValue in JSON is still localized and with the right unit. I suspect it may require a refactor since this formatter module is in the report, so this will be a follow up PR.

this may be worth a more in-depth conversation with our TC contact, because this is running automated Intl.NumberFormat translations up against human-translator-provided translations and wanting them to be consistent. It's possible they've judged '{timeInMs, number, milliseconds}\xa0ms' staying as '{PH} ms' as the better translation for their locale, even if Intl.NumberFormat might give something else.

Maybe it's better, but OTOH maybe the report formatter should be using i18n.UIStrings.ms instead of Intl.NumberFormat?

@connorjclark
Copy link
Collaborator Author

connorjclark commented Apr 7, 2022

Since writing that "next PR steps" comments my opinion has changed to basically this:

maybe the report formatter should be using i18n.UIStrings.ms instead of Intl.NumberFormat?

TBH, I have more trust in these CLDR strings being translated correctly (and by default, always consistent!) than our human translators process, so I'd like to see full reliance on the ICU pipeline for things like unit localization when/if possible, and let the human translators focus on the really hard work of translating the rest of the string.

For that ms string specifically we'd be sending strings with no translation work, which may be kinda funny to them:

ms: '{timeInMs, number, ::unit/.......}'

(tbh I haven't grokked how to use this yet, but the docs suggest we can specify unit and width so should be good)

this may be worth a more in-depth conversation with our TC contact

Just for the consistency angle or also for "can we use a different translation process for units"?

@connorjclark
Copy link
Collaborator Author

connorjclark commented Apr 7, 2022

just looking at a few locales on how our translators might differ in unit localization compared to CLDR

de

our string:

"{timeInMs, number, milliseconds} ms"
Intl.NumberFormat('de', {style: 'unit', unit: 'millisecond', unitDisplay: 'short'}).format(1);
> 1 ms
Intl.NumberFormat('de', {style: 'unit', unit: 'millisecond', unitDisplay: 'long'}).format(1);
> 1 Millisekunden

all good... CLDR specifies the short string in de to inherit from the english string


hu
(Hungarian)

our string:

"{timeInMs, number, milliseconds} ms"
Intl.NumberFormat('hu', {style: 'unit', unit: 'millisecond', unitDisplay: 'short'}).format(1);
> 1 ms
Intl.NumberFormat('hu', {style: 'unit', unit: 'millisecond', unitDisplay: 'long'}).format(1);
> 1 ezredmásodperc

Also good


ko

our string:

"{timeInMs, number, milliseconds} 밀리초"
Intl.NumberFormat('ko', {style: 'unit', unit: 'millisecond', unitDisplay: 'short'}).format(12);
> 12ms
Intl.NumberFormat('ko', {style: 'unit', unit: 'millisecond', unitDisplay: 'long'}).format(12);
> 12밀리초

according to CLDR the preferred value here would have been ms (if we really wished for a short display here)

@brendankenny
Copy link
Member

Just for the consistency angle or also for "can we use a different translation process for units"?

basically what's the best practice here, are the automatic translations good enough, etc. Full consistency would also include things like 'Third-party code blocked the main thread for {timeInMs, number, milliseconds} ms' dropping the ms and letting the number formatter add it. Does interpolating the already unitized number in there work?

@brendankenny

This comment was marked as outdated.

@connorjclark
Copy link
Collaborator Author

connorjclark commented Apr 7, 2022

hmm I'm not sure "interpolating the already unitized number in there" is an accurate representation of what the ICU message formatter does. we do some slight massaging of the values we pass to the formatter via _preformatValues but the formatter is in charge of stringing (heh) all these parts together, and it's gonna do that in a locale-aware manner, including its addition of the unit (should we begin to use this feature)

@connorjclark

This comment was marked as outdated.

@connorjclark
Copy link
Collaborator Author

connorjclark commented Apr 7, 2022

One issue with our human translator process is we don't specify in our translators comments anything about "how much space should this unit take up / what unit display width is preferred", relying on the assumption that if we state ms we want a short abbreviated term, when possible. This ambiguity may result in different translators giving us different terms. We could fix all our translator strings to include this, but that doesn't leave out the possibility of some being overlooked (by us or them)

@brendankenny
Copy link
Member

according to CLDR the preferred value here would have been ms (if we really wished for a short display here)

is that better? I don't know :) That's why I was suggesting we reach out.

One issue with our human translator process is we don't specify in our translators comments anything about "how much space should this unit take up / what unit display width is preferred", relying on the assumption that if we state ms we want a short abbreviated term, when possible. This ambiguity may result in different translators giving us different terms. We could fix all our translator strings to include this, but that doesn't leave out the possibility of some being overlooked (by us or them)

this is definitely our fault, this is an explicit request for writing translation descriptions when the length is important. We do have it in a few places (e.g. "Ideally fits within a ~40 character limit"), but we could probably have some automated annotation for checking short strings across locales automatically if we really wanted to keep that in check.

@brendankenny
Copy link
Member

is this conversation off topic and really about the next PR?

@connorjclark
Copy link
Collaborator Author

connorjclark commented Apr 7, 2022

yeah I think this PR can be merged (after I address feedback) as it's a net-positive (localizing text that wasn't before) but the issue on consistency is worth discussing somewhere

Co-authored-by: Adam Raine <6752989+adamraine@users.noreply.github.com>
@@ -169,6 +189,10 @@ export class I18n {
* @return {string}
*/
formatDuration(timeInMilliseconds) {
Copy link
Collaborator Author

@connorjclark connorjclark Apr 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just realized the comment for this function was wrong

1d 2h 13m 52s

we had NBSP all up in there...

Copy link
Member

@adamraine adamraine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failure looks like a simple fix, so LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants