Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use v flag instead of u for pattern RegExps #7908

Merged
merged 1 commit into from
Apr 25, 2023

Conversation

mathiasbynens
Copy link
Member

@mathiasbynens mathiasbynens commented May 9, 2022

This makes the pattern attribute more powerful, enabling the use of RegExp set notation syntax and properties of strings in its values.

Differences with the previous u flag-based behavior:

  • [FEATURE] Previously invalid patterns now become valid, e.g.

    pattern="[\p{ASCII_Hex_Digit}--[Ff]]"
    pattern="\p{RGI_Emoji}"
    pattern="[_\q{a|bc|def}]"
    
  • [BREAKING CHANGE] Some previously valid patterns are now errors, specifically those with a character class including either an unescaped special character ( ) [ ] { } / - \ | or a double punctuator:

    pattern="[(]"
    pattern="[)]"
    pattern="[[]"
    pattern="[{]"
    pattern="[}]"
    pattern="[/]"
    pattern="[-]"
    pattern="[|]"
    pattern="[&&]"
    pattern="[!!]"
    pattern="[##]"
    pattern="[$$]"
    pattern="[%%]"
    pattern="[**]"
    pattern="[++]"
    pattern="[,,]"
    pattern="[..]"
    pattern="[::]"
    pattern="[;;]"
    pattern="[<<]"
    pattern="[==]"
    pattern="[>>]"
    pattern="[??]"
    pattern="[@@]"
    pattern="[``]"
    pattern="[~~]"
    pattern="[_^^]"
    

    Throwing patterns result in inputElement.validity.valid === true for any input value, so the only compatibility risk is that some value/pattern combinations that would previously result in inputElement.validity.valid === false now result in inputElement.validity.valid === true.

  • Other previously valid patterns still behave the same. (Other than the abovementioned features, the v flags only differs in behavior from the u flag w.r.t. case-insensitive matching, but the pattern attribute uses case-sensitive matching.)

Note that the breaking changes apply to somewhat esoteric edge cases that can easily be avoided. In the worst case, this could cause previously invalid input to now be considered valid (since throwing patterns result in inputElement.validity.valid === true for any input value). IMHO making the change is worth it given the powerful new functionality it brings, and the relatively small compatibility risk. This is reminiscent of the discussion in #439 (but in a different direction).

For context, here’s a few pointers w.r.t. when we decided to implicitly enable the u flag for the pattern attribute in the first place:


(See WHATWG Working Mode: Changes for more details.)


/infrastructure.html ( diff )
/input.html ( diff )
/references.html ( diff )

@gibson042

This comment was marked as outdated.

@mathiasbynens

This comment was marked as outdated.

@gibson042

This comment was marked as resolved.

@mathiasbynens

This comment was marked as resolved.

@mathiasbynens
Copy link
Member Author

Chrome 112 is set to include support for the RegExp v flag. (The feature is currently “staged” i.e. gated on the --harmony V8 flag, but will be enabled-by-default later this week.)

Given that, I’d like to ask for initial feedback on this proposal. I’d be happy to write WPT tests + file implementation bugs, but wanted to double-check first if this is something worth pursuing. The functionality seems useful but perhaps the potential breakage is not worth it or warrants a different approach (like a separate attribute). Please let me know your thoughts!

WDYT @domenic @annevk @zcorpan @domfarolino?

@zcorpan
Copy link
Member

zcorpan commented Jan 30, 2023

Have you looked at httparchive for the breaking change cases, or implemented a use counter?

@annevk
Copy link
Member

annevk commented Jan 30, 2023

This generally seems like a good idea, provided it's web compatible.

@mathiasbynens
Copy link
Member Author

I’ve fixed some of the existing <input pattern> tests in web-platform-tests/wpt#38278 and added coverage for the current u flag-based behavior in web-platform-tests/wpt#38281. Once this last one is merged, I’ll start sending follow-up draft PRs for the v-specific behavior described here.

@mathiasbynens
Copy link
Member Author

Have you looked at httparchive for the breaking change cases, or implemented a use counter?

@pthier is implementing a V8-level use counter for the cases listed above occurring in u RegExps: https://chromium-review.googlesource.com/c/v8/v8/+/4212393 Note that this would count any patterns that cannot be switched over to v, regardless of whether they occur in a pattern attribute value or not — so it’s a superset of the truly problematic cases.

@mathiasbynens
Copy link
Member Author

Proposed WPT tests: web-platform-tests/wpt#38325

The compatibility risk is smaller than I previously thought, since throwing patterns result in inputElement.validity.valid === true for any input value. So the only impact would be that some previously invalid value/pattern combinations would now be considered valid.

mathiasbynens added a commit to mathiasbynens/wpt that referenced this pull request Feb 2, 2023
mathiasbynens added a commit to mathiasbynens/wpt that referenced this pull request Feb 2, 2023
@mathiasbynens mathiasbynens marked this pull request as ready for review February 3, 2023 12:01
aarongable pushed a commit to chromium/chromium that referenced this pull request Feb 6, 2023
Some regular expression patterns that were valid with /u are invalid
with /v flag.
This CL adds a UseCounter for such usages in /u to get an idea how often
they are used in the wild.
This is important information w.r.t the proposal to use /v instead of /u
for the pattern attribute (http://go/gh/whatwg/html/pull/7908).

V8 CL: http://go/crrv/c/4212393

Bug: v8:11935
Change-Id: I3f049a2196c2f360eeeb8eb54438d3ea0d534345
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4221395
Commit-Queue: Patrick Thier <pthier@chromium.org>
Reviewed-by: Michael Lippautz <mlippautz@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1101528}
@mfreed7
Copy link
Contributor

mfreed7 commented Feb 10, 2023

I'm also supportive of this proposal, provided it proves to be web compatible.

aarongable pushed a commit to chromium/chromium that referenced this pull request Feb 13, 2023
This patch removes the ScriptRegexp CharacterMode enum ({BMP, UTF16})
and instead introduces a UnicodeMode enum ({kBmpOnly, kUnicode,
kUnicodeSets}) distinguishing non-u, u, and v RegExps respectively.

The new kUnicodeSets value is required to implement a proposed change
to the HTML pattern attribute [1], to be enabled in a separate CL.

[1]: whatwg/html#7908

Bug: chromium:1412729
Change-Id: I40d3982476b62e517b85b799238f9c093f74e518
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4239709
Reviewed-by: Mason Freed <masonf@chromium.org>
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1104723}
tkent-google pushed a commit to web-platform-tests/wpt that referenced this pull request Feb 14, 2023
aarongable pushed a commit to chromium/chromium that referenced this pull request Feb 14, 2023
A proposed change to the HTML pattern attribute [1] swaps the `u`
RegExp flag for the new `v` flag [2], resulting in some potential
incompatibility.

Some previously valid patterns are now errors, specifically those
with a character class including either an unescaped special
character or a double punctuator:

    pattern="[(]"
    pattern="[)]"
    pattern="[[]"
    pattern="[{]"
    pattern="[}]"
    pattern="[/]"
    pattern="[-]"
    pattern="[|]"
    pattern="[&&]"
    pattern="[!!]"
    pattern="[##]"
    pattern="[$$]"
    pattern="[%%]"
    pattern="[**]"
    pattern="[++]"
    pattern="[,,]"
    pattern="[..]"
    pattern="[::]"
    pattern="[;;]"
    pattern="[<<]"
    pattern="[==]"
    pattern="[>>]"
    pattern="[??]"
    pattern="[@@]"
    pattern="[``]"
    pattern="[~~]"
    pattern="[_^^]"

We don’t expect such patterns to be very common. This UseCounter
aims to validate that assumption.

Note that throwing patterns result in `inputElement.validity.valid
=== true` for any input value, so the only compatibility risk is
that some value/pattern combinations that would previously result
in `inputElement.validity.valid === false` now result in
`inputElement.validity.valid === true`.

[1]: whatwg/html#7908
[2]: https://v8.dev/features/regexp-v-flag

Bug: chromium:1412729
Change-Id: Ifa8bcc27dbf6e8a2a7098643dbb27a7633bb97de
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4249120
Reviewed-by: Mason Freed <masonf@chromium.org>
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Reviewed-by: Alexei Svitkine <asvitkine@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1105129}
mathiasbynens added a commit to mathiasbynens/wpt that referenced this pull request Feb 16, 2023
mathiasbynens added a commit to mathiasbynens/wpt that referenced this pull request Feb 16, 2023
Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If specified, the attribute's value must match the JavaScript Pattern[+UnicodeMode, +N] production.

It seems this is being modified by the "v" proposal so this probably needs updating as well?

Also, if we want to land this before it's integrated into JavaScript proper we'll need to add references to the proposal as we have done for other JavaScript proposals.

mathiasbynens added a commit to mathiasbynens/wpt that referenced this pull request Apr 3, 2023
mathiasbynens added a commit to mathiasbynens/wpt that referenced this pull request Apr 3, 2023
mathiasbynens added a commit to mathiasbynens/wpt that referenced this pull request Apr 13, 2023
mathiasbynens added a commit to mathiasbynens/wpt that referenced this pull request Apr 14, 2023
aarongable pushed a commit to chromium/chromium that referenced this pull request Apr 17, 2023
A proposed change to the HTML pattern attribute [1] swaps the `u`
RegExp flag for the new `v` flag [2], enabling the use of three new
features:

- set notation
- string literal syntax
- Unicode properties of strings

This CL is an alternative to [3], which simply enables the feature by
default.

Intent to Ship: https://groups.google.com/a/chromium.org/g/blink-dev/c/gIyvMw0n2qw

[1]: whatwg/html#7908
[2]: https://v8.dev/features/regexp-v-flag
[3]: https://chromium-review.googlesource.com/c/chromium/src/+/4253834

Bug: chromium:1412729
Change-Id: Ifa825789ad7ad8bb9347f8e652483d668f69b116
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4414859
Reviewed-by: Mason Freed <masonf@chromium.org>
Code-Coverage: Findit <findit-for-me@appspot.gserviceaccount.com>
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1131075}
@mathiasbynens
Copy link
Member Author

mathiasbynens commented Apr 21, 2023

Would it work if we merge when the intent gets approved?

Update: The Intent just got approved. How do we feel about landing the WPT tests + this spec patch now?

W.r.t. the needs-compat-analysis label: I’ve summarized the results of my investigation here: https://groups.google.com/a/chromium.org/g/blink-dev/c/gIyvMw0n2qw/m/3XaP6hFpAgAJ

origin link to analysis continues to work with the change?
https://account.amazon.jobs https://crbug.com/1412729#c23
https://code.earthengine.google.com https://crbug.com/1412729#c41
https://ekartlogistics.com https://crbug.com/1412729#c52
https://go.thepersonalfinancialguide.com https://crbug.com/1412729#c29
https://heliyatra.irctc.co.in https://crbug.com/1412729#c47
https://identity.appen.com https://crbug.com/1412729#c11
https://login.vitalsource.com https://crbug.com/1412729#c17
https://login.yahoo.com https://crbug.com/1412729#c16
https://m.betpix365.com https://crbug.com/1412729#c27
https://m.esportesdasorte.com https://crbug.com/1412729#c21
https://m.estrelabet.com https://crbug.com/1412729#c22
https://m.vaidebet.com https://crbug.com/1412729#c32
https://my1.konami.net https://crbug.com/1412729#c33
https://particuliers.engie.fr https://crbug.com/1412729#c34
https://register.betway.de https://crbug.com/1412729#c25
https://renegociacao.itau.com.br https://crbug.com/1412729#c51
https://signin.costco.com https://crbug.com/1412729#c12
https://smartkey.xertica.com https://crbug.com/1412729#c19
https://sports.coral.co.uk https://crbug.com/1412729#c35
https://sports.ladbrokes.com https://crbug.com/1412729#c31
https://tazkarti.com https://crbug.com/1412729#c36
https://voterportal.eci.gov.in https://crbug.com/1412729#c53
https://www.acesso.gov.pt https://crbug.com/1412729#c26
https://www.air.irctc.co.in https://crbug.com/1412729#c37
https://www.bancoestado.cl https://crbug.com/1412729#c15
https://www.bienlinea.bi.com.gt https://crbug.com/1412729#c50
https://www.coachoutlet.com https://crbug.com/1412729#c49
https://www.enterprise.de https://crbug.com/1412729#c40
https://www.etopaz.az https://crbug.com/1412729#c44
https://www.gamestop.com https://crbug.com/1412729#c38
https://www.iliad.it https://crbug.com/1412729#c39
https://www.irctctourism.com https://crbug.com/1412729#c42
https://www.leroymerlin.fr https://crbug.com/1412729#c18
https://www.mediafire.com https://crbug.com/1412729#c20
https://www.midatacredito.com https://crbug.com/1412729#c43
https://www.milanuncios.com https://crbug.com/1412729#c30
https://www.portaleargo.it https://crbug.com/1412729#c14
https://www.readworks.org https://crbug.com/1412729#c48
https://www.saksfifthavenue.com https://crbug.com/1412729#c46
https://www.thebay.com https://crbug.com/1412729#c45

More details are linked for each entry. The TL;DR is that although there are many distinct sources of UseCounter hits, I haven’t found a single case that actually constituted a compat problem.

aarongable pushed a commit to chromium/chromium that referenced this pull request Apr 21, 2023
A proposed change to the HTML pattern attribute [1] swaps the `u`
RegExp flag for the new `v` flag [2], enabling the use of three new
features:

- set notation
- string literal syntax
- Unicode properties of strings

[1]: whatwg/html#7908
[2]: https://v8.dev/features/regexp-v-flag

Intent to Ship with LGTMs:
https://groups.google.com/a/chromium.org/g/blink-dev/c/gIyvMw0n2qw/m/S8ZVYl89CgAJ

Bug: chromium:1412729
Change-Id: I7165cfc3f862c1feb5417723681f82459d1be6d5
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4455084
Reviewed-by: Mason Freed <masonf@chromium.org>
Commit-Queue: Mason Freed <masonf@chromium.org>
Auto-Submit: Mathias Bynens <mathias@chromium.org>
Reviewed-by: Yoav Weiss <yoavweiss@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1133838}
@zcorpan
Copy link
Member

zcorpan commented Apr 24, 2023

SGTM. I'll let @annevk do the honors.

annevk pushed a commit to web-platform-tests/wpt that referenced this pull request Apr 25, 2023
@annevk annevk merged commit 4868df6 into whatwg:main Apr 25, 2023
@mathiasbynens mathiasbynens deleted the pattern-v-flag branch April 25, 2023 16:34
@mathiasbynens
Copy link
Member Author

mathiasbynens commented Apr 25, 2023

Thanks everyone!

I’ve prepared the follow-up PR for if/when the TC39 proposal (in particular this PR) gets merged into the ECMAScript spec: #9213

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request May 22, 2023
…he RegExp `v` flag, a=testonly

Automatic update from web-platform-tests
HTML: tests for <input pattern> using the RegExp v flag

HTML PR: whatwg/html#7908.
--

wpt-commits: 000b1c60057e614908f7450353d72f9dda117d33
wpt-pr: 38547
ErichDonGubler pushed a commit to erichdongubler-mozilla/firefox that referenced this pull request May 23, 2023
…he RegExp `v` flag, a=testonly

Automatic update from web-platform-tests
HTML: tests for <input pattern> using the RegExp v flag

HTML PR: whatwg/html#7908.
--

wpt-commits: 000b1c60057e614908f7450353d72f9dda117d33
wpt-pr: 38547
@poebrand
Copy link

Chrome 112 is set to include support for the RegExp v flag. (The feature is currently “staged” i.e. gated on the --harmony V8 flag, but will be enabled-by-default later this week.)

Given that, I’d like to ask for initial feedback on this proposal. I’d be happy to write WPT tests + file implementation bugs, but wanted to double-check first if this is something worth pursuing. The functionality seems useful but perhaps the potential breakage is not worth it or warrants a different approach (like a separate attribute). Please let me know your thoughts!

WDYT @domenic @annevk @zcorpan @domfarolino?

Hi @mathiasbynens,

As you previously noted tc39/ecma262#2418 has not been merged yet and https://github.com/tc39/proposal-regexp-v-flag is not part of the official ECMAScript spec yet, if I'm following the conversation correctly. In which case, why are whatwg members like Chrome enabling the v flag by default in their official customer facing versions?

Per https://chromestatus.com/feature/5149507107422208, Chrome 114 has v flag use enabled by default for pattern. On our own sites, we're now seeing dozens of Regexs that are no longer valid... leading to invalid values being accepted. Shouldn't such changes only go out once a feature is officially part of the ECMAScript spec? Especially with something that is a potentially breaking change depending upon specific regexes on a site?

Also, in cases where whatwg is moving forward with potentially breaking changes... shouldn't more effort be done to announce/communicate that to the wider developer community? Most existing documentation on the web includes no mention of the v flag (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Errors/Bad_regexp_flag).

@annevk
Copy link
Member

annevk commented Jun 15, 2023

@poebrand that is unfortunate. We did follow our own process here:

  1. Once a TC39 proposal hits Stage 3 it can be shipped in browsers and is eligible for incorporation into WHATWG standards.
  2. Remove references to RegExp v flag proposal #9213 has been prepared for when the proposal hits Stage 4 and gets incorporated into the main ECMAScript standard. But that's essentially cleanup.

Unfortunately we only added notifying MDN of changes eight months ago and PRs older than that haven't been updated to incorporate it. I'll try to pay better attention to that going forward, although for potentially breaking changes that might still not be sufficient I suppose. Not entirely sure what would be though.

@zcorpan
Copy link
Member

zcorpan commented Jun 15, 2023

Filed:

As far as breakage, I believe the reasoning here was that websites should also do server-side validation (since users can remove the pattern attribute in devtools or submit invalid data in other ways), so the few affected sites should not break if they have server-side validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

6 participants