Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix matching user input to datalist values #4814 #7003

Merged
merged 4 commits into from
Sep 8, 2021
Merged

Conversation

aphillips
Copy link
Contributor

@aphillips aphillips commented Aug 31, 2021

Changes the description of datalist matching to use the term
'search' instead of 'substring match' and adds examples of some
of the textual variation issues that user agents might consider.

Adds a reference to CHARMOD-NORM (String Matching).

  • At least two implementers are interested (and none opposed):
  • Tests are written and can be reviewed and commented upon at:
  • Implementation bugs are filed:
    • Chrome: …
    • Firefox: …
    • Safari: …

(See WHATWG Working Mode: Changes for more details.)


/input.html ( diff )
/references.html ( diff )

Changes the description of datalist matching to use the term
'search' instead of 'substring match' and adds examples of some
of the textual variation issues that user agents might consider.

Adds a reference to CHARMOD-NORM (String Matching).
Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Text looks great, just some nits on formatting.

source Show resolved Hide resolved
source Outdated
@@ -125282,6 +125286,10 @@ INSERT INTERFACES HERE
<dt id="refsCHARMOD">[CHARMOD]</dt>
<dd>(Non-normative) <cite><a href="https://www.w3.org/TR/charmod/">Character Model for the World Wide Web 1.0: Fundamentals</a></cite>, M. D&uuml;rst, F. Yergeau, R. Ishida, M. Wolf, T. Texin. W3C.</dd>

<dt id="refsCHARMODNORM">[CHARMODNORM]</dt> <dd>(Non-normative) <cite><a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping is confused here. (The great rewrapper does not work well with dt/dds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -125282,6 +125286,10 @@ INSERT INTERFACES HERE
<dt id="refsCHARMOD">[CHARMOD]</dt>
<dd>(Non-normative) <cite><a href="https://www.w3.org/TR/charmod/">Character Model for the World Wide Web 1.0: Fundamentals</a></cite>, M. D&uuml;rst, F. Yergeau, R. Ishida, M. Wolf, T. Texin. W3C.</dd>

<dt id="refsCHARMODNORM">[CHARMODNORM]</dt> <dd>(Non-normative) <cite><a
href="https://www.w3.org/TR/charmod-norm/">Character Model for the World Wide Web: String
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use https://w3c.github.io/charmod-norm/ instead (we always cite EDs in WHATWG specs, never TR).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix.

(I don't think there is an ED for charmod just above, which may be an exception to the rule... which I faithfully copied).

@@ -125282,6 +125286,10 @@ INSERT INTERFACES HERE
<dt id="refsCHARMOD">[CHARMOD]</dt>
<dd>(Non-normative) <cite><a href="https://www.w3.org/TR/charmod/">Character Model for the World Wide Web 1.0: Fundamentals</a></cite>, M. D&uuml;rst, F. Yergeau, R. Ishida, M. Wolf, T. Texin. W3C.</dd>

<dt id="refsCHARMODNORM">[CHARMODNORM]</dt> <dd>(Non-normative) <cite><a
href="https://www.w3.org/TR/charmod-norm/">Character Model for the World Wide Web: String
Matching</a></cite>, A.Phillips. W3C.</dd>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space after "A."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@domenic domenic self-assigned this Sep 1, 2021
@r12a
Copy link

r12a commented Sep 1, 2021

Um. Sorry, but this appears to have lost sight of the main point of the issue raised, which was that (while additional forms of matching would be nice and are certainly encouraged) Unicode normalisation and case folding really SHOULD be an expected baseline. The proposed text put no emphasis or urgency on those two things.

@aphillips in case you didn't see the email i sent to you, here was my suggestion for the edit (it's probably needs some improvement, but it tries to emphasise the importance of normalisation and case-folding):

If filtering based on the user's input, user agents should use substring matching against both the suggestions' label and value (as opposed to prefix matching). Such substring matching should be done after Unicode normalization and case folding have been applied. User agents may apply other matching techniques as desired for their user experience, for example accent-stripping, matching kana with kanji, matching against potential misspellings, etc.

@aphillips
Copy link
Contributor Author

@r12a Okay. I tried to follow our original bug report.

What do you think of:

User agents are encouraged to filter the suggestions represented by the suggestions source element when the number of suggestions is large, including only the most relevant ones (e.g. based on the user's input so far). No precise threshold is defined, but capping the list at four to seven values is reasonable. If filtering based on the user's input, user agents should search within both the label and value of the suggestions for matches (as opposed to merely prefix matching). User agents need to consider how input variations affect the matching process. For examples, see Character Model for the World Wide Web: String Matching. Substring matching should be done after Unicode normalization and appropriate case folding have been applied. User agents may also apply other matching techniques as desired for their user experiences, for example ignoring accents, matching kana with kanji, or matching against potential misspellings.

Note a few wording insertions, particularly "appropriate" with "case folding". Should we say "language-appropriate"? Suggest edits.

@whatwg whatwg deleted a comment from sowhatmeme Sep 1, 2021
@r12a
Copy link

r12a commented Sep 2, 2021

@aphillips I'm not clear why you'd want to qualify case-folding with '(language-)appropriate'. This is case folding, not case conversion. With case-folding there's no need to know the language of the text.

User agents need to consider how input variations affect the matching process. For examples, see Character Model for the World Wide Web: String Matching.

I think that that text is redundant. You say the same thing as the first sentence in the rest of the paragraph. Wrt the second sentence, this is searching, rather than string-matching, so the only examples that are relevant are those related to Unicode normalisation and case-folding, so if we keep the link, i'd put it after the "Substring matching should be done..." sentence.

This includes the discussion of the I18N WG in the teleconference of
2021-09-02.
@aphillips
Copy link
Contributor Author

I have updated the text with the results of our discussion in the I18N teleconference of 2021-09-02. @r12a please check the results.

I notice that what we're describing here is basically a subset of #3539 (window.find()). I didn't add a reference to that operation here, even though I'm sorely tempted to say:

Implementations should be more-or-less consistent with caseless window.find()

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nit, but I guess we should also get @r12a's signoff?

Thanks so much for working on this.

code point sequences caused by different keyboard- or input-specific mechanisms do not interfere
with the matching process. Case variations should be ignored, which may require language-specific
case mapping. For examples of these, see <cite>Character Model for the World Wide Web: String
Matching</cite> <ref spec=CHARMODNORM>. User agents may also provide other matching features: for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: move the <ref spec=CHARMODNORM> to the end of the paragraph after the final .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

@r12a r12a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: i'd put commas in, to make it easier to read:

code point sequences, caused by different keyboard- or input-specific mechanisms, do not

- Added commas for readability.
- Moved CHARMODNORM ref tag to end of paragraph.
@aphillips
Copy link
Contributor Author

@r12a Thanks for the review. Added the commas.

@domenic After the build runs, should be ready to merge.

@domenic domenic merged commit e022948 into whatwg:main Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants