Address name and literal equality #885

aphillips · 2024-09-13T17:36:53Z

This change defines equality as discussed in the 2024-09-09 teleconference in the following ways:

It defines name equality as being under NFC
It defines literal equality as explicitly not under NFC
It moves name before identifier in that section of text to avoid a forward definition.

Note that this deviates from discussion in 2024-09-09's call in that we didn't discuss literals at length. It also doesn't discuss non-name/non-literal values, which I'll point out are limited to ASCII sequences such as keywords.

This change defines equality as discussed in the 2024-09-09 teleconference in the following ways: - It defines _name_ equality as being under NFC - It defines _literal_ equality as explicitly **not** under NFC - It moves _name_ before _identifier_ in that section of text to avoid a forward definition. Note that this deviates from discussion in 2024-09-09's call in that we didn't discuss literals at length. It also doesn't discuss non-name/non-literal values, which I'll point out are limited to ASCII sequences such as keywords.

eemeli · 2024-09-14T10:11:01Z

spec/syntax.md

+Two _literals_ are considered equal if they consist of the same sequence of Unicode
+code points.


This seems a bit excessive. While most name comparisons are internal to the spec, AFAIK the only literal comparison in the spec is for duplicate variant key lists, which tbh I'd prefer to be done with normalization.

All other literal value handling is done by functions, which we should not restrict from applying normalization in their internal processing.

I agree. I'd go a bit further in encouraging function specifiers to normalize when matching: and we make the standard functions do so. That is, something along the lines of:

An NFC comparison (aka Unicode canonically equivalent comparison) produces the same results as if each string value being compared were converted to the Unicode Normalization Form C (NFC). For example, with an NFC comparison against the literal |U\x{3308}|, the same result is obtained as if the literal were |\x{DC}|. For more examples, see the Unicode Standard.

When determining whether two variant key lists are duplicates, NFC comparison MUST be used for literals.

When a selector function evaluates matches to literal keys, the matches SHOULD use NFC comparison. Moreover, the implementation of the standard selector functions MUST use NFC comparison. Thus the standard :string selector function MUST match a string input parameter of "U\x{3308}" with the literal |\x{DC}|.

BTW: Some selector functions, such as the standard numeric selectors, only match literals with all ASCII characters. ASCII literals never change when converted to NFC, and there are only 3 non ASCII characters that change to ASCII. So selector functions whose literals don't include ";", "`", or "K" don't need to use NFC comparison; that includes our numeric selectors.

Char. Code Point Name

; U+037E GREEK QUESTION MARK

` U+1FEF GREEK VARIA

K U+212A KELVIN SIGN

First, do we agree that literals MAY be non-normalized/denormalized?

If a literal can be a non-normalized string, then we should define when two literals match inside MF2. Literal comparison is for duplicate key lists, but also for matching between the sorted results of a selector and the keys in the message (the sorting is done by a function, but not the matching after sorting). This text says nothing about what functions do or are allowed to do with (possibly not normalized) literal values. All it says is when MF2 considers two literals to be equal. I could add text allowing functions to have greater restriction on equality. @macchiati suggests requiring it for :string.

When determining whether two variant key lists are duplicates, NFC comparison MUST be used for literals.

This is the opposite of what @eemeli is saying? If we allow normalization (but don't require it) we also allow the lack of it.

By not normalizing literals, we allow non-normalized sequences to be used in expressions, option values, or keys. This has positive impacts (for people who know what they're doing when working with combining marks or certain characters) and negative consequences (when people don't)

When a selector function evaluates matches to literal keys, the matches SHOULD use NFC comparison. Moreover, the implementation of the standard selector functions MUST use NFC comparison. Thus the standard :string selector function MUST match a string input parameter of "U\x{3308}" with the literal |\x{DC}|.

Why?

.local $angstromsAreCool = {Å :string} .match $angstromsAreCool Å {{U+212B is the only way to be cool}} Å {{I'm U+00C5, so almost cool}} Å {{I'm A + U+030A, so I combine with cool}} * {{I'm not cool}}

I understand the lack of illustrating a compelling use case here. Most of the time the sets of valid keys should be rational, sane, highly-normalized enumerated values and not just random text... in fact, I have a note cautioning people about this right 👇

I think it is far, far more likely that people will make mistakes with non-NFC literals (or input) than the really, really obscure edge case of someone wanting to match non-normalized text.

First, do we agree that literals MAY be non-normalized/denormalized?

As with pattern text, I agree that we should not require the normalization of literal values.

Literal comparison is [...] also for matching between the sorted results of a selector and the keys in the message (the sorting is done by a function, but not the matching after sorting).

Regarding the latter, we say this:

message-format-wg/spec/formatting.md

Lines 511 to 514 in 80bec52

The method MatchSelectorKeys is determined by the implementation.

It takes as arguments a resolved _selector_ value `rv` and a list of string keys `keys`,

and returns a list of string keys in preferential order.

The returned list MUST contain only unique elements of the input list `keys`.

That MUST is requiring the processing to not normalise any of the values, even if it did so for its internal processing.

I'd be completely fine with us normalising the keys before they're passed to the function, or at least allowing an implementation to do so.

When determining whether two variant key lists are duplicates, NFC comparison MUST be used for literals.

This is the opposite of what @eemeli is saying? If we allow normalization (but don't require it) we also allow the lack of it.

I'm aligned with @macchiati here. We don't need to normalize key values, but we should do their comparison when checking for duplicate key lists as if they were normalized.

aphillips · 2024-09-16T18:18:50Z

In the 2024-09-16 call we agreed to a variety of changes to this PR. Keys, option names, and attribute names to be NFC.

- Make _key_ require NFC for uniqueness/comparison - Add a note about NFC - Make _literal_ **_not_** define equality - Make text in _name_ identical to that in _key_ for consistency

eemeli

The bit about key values being normalised should be reflected here:

message-format-wg/spec/formatting.md

Lines 505 to 506 in 95ec6d5

    
           1. Let `ks` be the resolved value of `key`. 
        
           1. Append `ks` as the last element of the list `keys`.

Maybe with a change like this?

-         1. Let `ks` be the resolved value of `key`.
+         1. Let `ks` be the resolved value of `key` in Unicode Normalization Form C.

spec/syntax.md

spec/formatting.md

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

eemeli

This is an improvement on what we have now. I'd be happier if the suggestion from #885 (comment) was included, but not going to insist on it.

As discussed on the call, I think we should also normalise option names, but that doesn't need to be a part of this change.

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

aphillips · 2024-09-17T23:18:08Z

In spite of having only one approval, I'm going to merge this, based on consensus from 2024-09-16.

aphillips added 3 commits September 13, 2024 10:36

Typo fix

8d26f7f

Add a note about not requiring implementations to actually normalize

46c80bf

aphillips marked this pull request as ready for review September 13, 2024 17:43

aphillips requested review from catamorphism, eemeli, echeran, mihnita and macchiati September 13, 2024 17:44

aphillips added syntax Issues related with MF Syntax normative LDML46 LDML46 Release (Tech Preview - October 2024) labels Sep 13, 2024

aphillips mentioned this pull request Sep 13, 2024

Add section on Uniqueness and Equality #869

Closed

eemeli requested changes Sep 14, 2024

View reviewed changes

Implement changes dicussed in 2024-09-16 call.

c1e4982

- Make _key_ require NFC for uniqueness/comparison - Add a note about NFC - Make _literal_ **_not_** define equality - Make text in _name_ identical to that in _key_ for consistency

aphillips requested a review from eemeli September 16, 2024 22:19

Merge branch 'main' into aphillips-name-equality

981dd66

eemeli reviewed Sep 17, 2024

View reviewed changes

spec/syntax.md Outdated Show resolved Hide resolved

spec/syntax.md Outdated Show resolved Hide resolved

Update formatting.md to include keys in NFC

20cbbe7

eemeli reviewed Sep 17, 2024

View reviewed changes

spec/formatting.md Outdated Show resolved Hide resolved

aphillips and others added 2 commits September 17, 2024 11:12

Address comments

b5eec2a

Update spec/syntax.md

eb09a95

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

aphillips requested a review from eemeli September 17, 2024 18:17

eemeli approved these changes Sep 17, 2024

View reviewed changes

Update spec/syntax.md

94e1246

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

aphillips merged commit 6f5ad39 into main Sep 17, 2024
1 check passed

aphillips deleted the aphillips-name-equality branch September 17, 2024 23:18

eemeli mentioned this pull request Sep 24, 2024

MF2 spec updates messageformat/messageformat#429

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address name and literal equality #885

Address name and literal equality #885

aphillips commented Sep 13, 2024

eemeli Sep 14, 2024

macchiati Sep 14, 2024

aphillips Sep 14, 2024

macchiati Sep 14, 2024

eemeli Sep 16, 2024

aphillips commented Sep 16, 2024

eemeli left a comment

eemeli left a comment

aphillips commented Sep 17, 2024

		Two _literals_ are considered equal if they consist of the same sequence of Unicode
		code points.

Char.	Code Point	Name
;	U+037E	GREEK QUESTION MARK
`	U+1FEF	GREEK VARIA
K	U+212A	KELVIN SIGN

	The method MatchSelectorKeys is determined by the implementation.
	It takes as arguments a resolved _selector_ value `rv` and a list of string keys `keys`,
	and returns a list of string keys in preferential order.
	The returned list MUST contain only unique elements of the input list `keys`.

	1. Let `ks` be the resolved value of `key`.
	1. Append `ks` as the last element of the list `keys`.

Address name and literal equality #885

Address name and literal equality #885

Conversation

aphillips commented Sep 13, 2024

eemeli Sep 14, 2024

Choose a reason for hiding this comment

macchiati Sep 14, 2024

Choose a reason for hiding this comment

aphillips Sep 14, 2024

Choose a reason for hiding this comment

macchiati Sep 14, 2024

Choose a reason for hiding this comment

eemeli Sep 16, 2024

Choose a reason for hiding this comment

aphillips commented Sep 16, 2024

eemeli left a comment

Choose a reason for hiding this comment

eemeli left a comment

Choose a reason for hiding this comment

aphillips commented Sep 17, 2024