From fe01dbd91e9a5d877d6313f301c7a4f8c8671d7a Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Mon, 26 Aug 2024 11:49:31 -0700 Subject: [PATCH 1/2] [DESIGN] Update bidi design document to show proposed design The design I actually think we should adopt is the "hybrid approaches" one. This is a necessary first step on the highway to UAX31 compliance and I think is responsibly contained/managed. It is a hybrid approach, in that it permits testable strict implementations to be created (particularly for message serialization). This PR consists of moving text around. I added one "pro" to one option also. --- exploration/bidi-usability.md | 53 ++++++++++++++++++++++------------- 1 file changed, 34 insertions(+), 19 deletions(-) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index 3f70ed70..5cadfd6c 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -273,6 +273,37 @@ Not allowing these to mix could produce annoying parse errors. _Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange._ +I propose adopting a hybrid approach in which we permit "super-loose isolation". +This allows user to include isolates and strongly directional characters into the whitespace +portions of the syntax in order to make messages appear correctly. + +The second part of the hybrid approach would be to recommend ("SHOULD") the "strict isolation" +design for serializers. +This syntax is a subset of the super-loose syntax and can be applied selectively to messages that +have RTL sequences or which have problematic display. + + +## Alternatives Considered + +_What other solutions are available?_ +_How do they compare against the requirements?_ +_What other properties they have?_ + +### Nothing +We could do nothing. + +A likely outcome of doing nothing is that RTL users would insert bidi controls into +_messages_ in an attempt to make the _pattern_ and/or _placeholders_ display correctly. +These controls would become part of the output of the _message_, +showing up inappropriately at runtime. +Because these characters are invisible, users might be very frustrated trying to manage +the results or debug what is wrong with their messages. + +By contrast, if users insert too many or the wrong controls using the recommended design, +the _message_ would still be functional and would emit no undesired characters. + +### LTR Messages with isolating sequences + The syntax of a _message_ assumes a left-to-right base direction both for the complete text of the _message_ as well as for each line (paragraph) contained therein. @@ -383,7 +414,7 @@ ns-separator = [bidi] ":" bidi = [ %x200E-200F / %x061C ] ``` -### Open Issues with Proposed Design +**Open Issues** The ABNF changes found above put isolates and strongly directional marks into specific locations, such as directly next to `{`/`}`/`{{`/`}}` markers @@ -393,24 +424,6 @@ A more permissive design would add the isolates and strongly directional marks t whitespace in the syntax and depend on users/editors to appropriately pair or position the marks to get optimal display. -## Alternatives Considered - -_What other solutions are available?_ -_How do they compare against the requirements?_ -_What other properties they have?_ - -### Nothing -We could do nothing. - -A likely outcome of doing nothing is that RTL users would insert bidi controls into -_messages_ in an attempt to make the _pattern_ and/or _placeholders_ display correctly. -These controls would become part of the output of the _message_, -showing up inappropriately at runtime. -Because these characters are invisible, users might be very frustrated trying to manage -the results or debug what is wrong with their messages. - -By contrast, if users insert too many or the wrong controls using the recommended design, -the _message_ would still be functional and would emit no undesired characters. ### Super-loose isolation @@ -610,6 +623,8 @@ adherence to the stricter grammar. syntax errors - Provides a foundation for tools to claim strict conformance and message normalization as well as guidance to implementers to make them want to adopt it +- Messages are valid while being edited (such as when the open or close isolate has been + inserted but the corresponding opposite isolate hasn't been entered yet) **Cons** - Requires additional effort to maintain the grammar From f2def00f85426b562b2dca78261d6e1986b225f2 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Mon, 26 Aug 2024 16:58:30 -0700 Subject: [PATCH 2/2] Address comments --- exploration/bidi-usability.md | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index 5cadfd6c..49bfcc1a 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -279,6 +279,8 @@ portions of the syntax in order to make messages appear correctly. The second part of the hybrid approach would be to recommend ("SHOULD") the "strict isolation" design for serializers. +(Note that "strict" and "super-loose" use non-identical productions with the name `bidi`. +These serve different purposes and are consistent with strict being narrower with super-loose.) This syntax is a subset of the super-loose syntax and can be applied selectively to messages that have RTL sequences or which have problematic display. @@ -431,7 +433,17 @@ Add isolates and strongly directional marks to required and optional whitespace This would permit users to get the effects described by the above design, as long as they use isolates/marks in a "responsible" way. -(Omitting other changes found in #673) +The exception to this is the namespace separator, used in `identifier`. +This requires the ability to insert isolates or strongly directional marks +between the namespace and name portions, where whitespace is not permitted. +This is the only location in the syntax where such characters might be needed +but whitespace is not at least optional. +This could be defined as: +```abnf +ns-separator = [bidi] ":" [bidi] +``` + +Here are the other ABNF changes: ```abnf ; strongly directional marks and bidi isolates @@ -460,7 +472,7 @@ s = ( SP / HTAB / CR / LF / %x3000 ) ### Strict isolation all the time Apply bidi isolates in a strict way. -The main differences to the proposed solution is: +In this design: 1. The open/close isolate characters are syntactically required to be paired. This introduces parse errors for unpaired invisible characters, which could lead to bad user experiences. @@ -480,7 +492,7 @@ markup = "{" [s] "#" identifier [bidi] *(s option) *(s attribute) [s] [" / "{" [s] "/" identifier [bidi] *(s option) *(s attribute) [s] "}" ; close / "{" LRI [s] "/" identifier [bidi] *(s option) *(s attribute) [s] close-isolate "}" ; close identifier = [(namespace ns-separator)] name -ns-separator = [bidi] ":" +ns-separator = [bidi] ":" [bidi] bidi = [ %x200E-200F / %x061C ] ```