-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for BiDi in placeables #28
Comments
+100 What I've seen being useful was marking a placeholder / area as RTL / LTR. A lot more often was "smart detection", basically inspect the value of the parameter and "guess" what the best direction would be. The android solution is not ideal, the developer should explicitly "wrap" the parameter And the string would be something like
The wrapper is smart, does not add bidi control characters if not needed. |
I don't know what you think but to me, this type of feature seems related to text transformation (see related thread). If I'm not mistaken Fluent handles this with function-like wrappers. I'm trying to picture a scenario where you might want to capitalize and change text direction. If we could have a standard way to transform text, it could keep this simple:
|
No, Fluent implicitly wraps all placeables of type String in FSI/PDI to reset directionality. |
+100 doing it by default. Not 100% sure FSI/DPI is the right thing, I would have to spend some time experimenting. |
+1 to providing this by default. Note that when direction metadata is available the FSI should be replaced with the appropriate base-direction isolating control. Note too that @zbraniecki only mentions placeables of type string, but non-string placeables can have spillover effects (for example currency values). |
@mihnita yes! I imagine that we would be able to do sth like:
as an option, and then, if that's provided, we can specify the directionality if it differs from the direction of the translation. If it matches, then we can skip directionality signs. If it is unknown, then we can use FSI/PDI. @aphillips yes! In particular, we can know if the formatter provided result in the same language as the translation and wrap in marks or not. For most common scenario, where the currency formatter provided formatted text in the same direcitonality as the translation, we could skip it, but if we had to fallback and the currency is in different directionality, we would wrap. |
A few comments:
|
Safari and iOS support FSI/PDI. Edge supports it as well. Windows modern APIs do support it, win32 does not. |
We can also take a look at what Android does. The At runtime that method looks at the value of the parameter, and adds the proper BiDi control characters, a bit smarter than just FSI / PDI, or first strong, or any "fixed" approach. Dart has two methods, one "wraps" using BiDi control characters, the other one uses HTML tags: I'm not advocating for any of these "as is", just submitting them as "prior art" and source of inspiration. But if we go with this direction then I would call the wrappers inside of MessageFormat, not force the developers to wrap parameters by explicitly calling these kind of helper methods. |
Can we revisit it now? It seems like we still didn't add it to the spec. I suggest that we by default wrap any placeable in FSI/PDI marks, just like Fluent does it, in line with W3C recommendation for placeables - https://www.w3.org/International/articles/inline-bidi-markup/ We can introduce evasion logic that allows us to explicitly turn off FSI/PDI for a given message format as an option to communicate request to format a message without inserting FSI/PDI. Finally, we could start building evading logic for scenarios where the directionality of the surrounding text and the placeable is known to match. For example number/date inserted in the same locale as a surrounding message does not need FSI/PDI. Similarly, a string inserted could be marked with explicit directionality: let mf = new MessageFormat("en");
mf.format("Hello, { $user }", { user: MFString("John", { dir: "ltr" }) }); or as matching: let mf = new MessageFormat("en");
mf.format("Hello, { $user }", { user: MFString("John", { dir: "matching" }) }); In the former case the algorithm will detect directionality of "en" and if the directionality of |
I'd like to suggest making a decision on it very soon. In my experience a lot of API users are not familiar with the problem space of directionality and the body of code starts growing where people expect to be able to match the output to a particular string and are surprised when FSI/PDI shows up in the output. With Fluent we had to do quite a bit of evangelism - it was always well received, but definitely a paper cut. I'm concerned that if we wait too long the argument of "too late" will pop up. |
I tend to agree with @zbraniecki in general: to the degree possible this wants to be hidden in the "magick I18N stuff" and not be something regular developers have to think about all the time. Educating on bidi handling is hard and doesn't appear to add value until a company decides to do an RTL language. However, I don't agree that inserting FSI/PDI is what W3C recommends. In markup contexts, we prefer that markup be used and include both language and direction metadata (i.e. both For formatted values (that is, where the placeable is a number, date, time, percent, currency value, usw that is generated by the message formatter) the base direction can be known from the locale. For unknown values (mainly strings), provision of metadata is required and FSI/PDI can be a fallback. Note that some users may want to tailor the behavior because of their runtime environment, such as a few frameworks that don't yet support the isolating controls and show them as tofu. In this cases, |
Ah, good point on lang+dir, rather than just dir. I think you're bringing two separate dimensions, which I'd categorize as:
I'll use the following example: "On January 15th 2022 at 5:45pm, Addison added 5 photos" which in MF2 will look something like this:
There are three placeables in this message and we may know the locale of the message itself (or not - is it possible for the lang/dir of the message to be undetermined via If
(we use LRI because we know that datetime is in English, and we either know that the whole message is in Arabic or it is unknown) For the user name, we may have an API that informs in what lang/dir is the name provided and then compare it to the message lang/dir, or we may not know. For Now, as mentioned in my previous message, the tricky question is how the develop annotates lang/dir of the variable. I suggested MF2 to provide typed variables types much like fluent does with Second question is how to control what we inject. My initial proposal is something like this: let mf = new Intl.MessageFormat("en", {
isolates: {
lri: "\uLRI", // or MF2MarkupElement("bdo", {dir: "ltr"})
rli: "\uRLI", // or MF2MarkupElement("bdo", {dir: "rtl"})
fsi: "\uFSI", // or MF2MarkupElement("bdo", {dir: "auto"})
pdi: "\uPDI", // or // or MF2MarkupElementClose("bdo")
}
}); This way HTML bindings can provide MarkupElements for the same feature, and plain text can use the Unicode isolate characters. If LRI/RLI is set to This means that by default (if AttributesWhat this doesn't resolve is that in ideal world a message like: We may later evolve the logic to allow for population of attributes in cases where markup element is perfectly surrounding a placeable and we want to set dir/lang. |
Same as before, +100 :-) But now, with a lot more things already "settled", I think we can dig deeper on what can / can't be done. I've been thinking about it, and we probably need to answer some sub-questions. What to add, exactly? What can a low level library use to wrap placeholders? Unicode control characters? HTML recommends using tags, not control characters. HTML tags? We don't know if the consumer of the result understands HTML. So I think the only thing that the spec can really say is put this info (somehow) in the "format to parts" (this chunk from here to there is RTL). And what part of the "chain" can do it. It the engine does it, all it can do when it sees
I don't think that is a good model. I think we want to allow for functions that in fact generate multiple components. Let's think HTML... And have a matrix formatter, that produces a table. Or a list formatter that produces a drop-box. Or even a regular Maybe So I think this is can only be done properly by the functions. Do the translators needed to be able to change this, or not? I would argue that yes, they need to. The developer might know "ok, don't mirror the company logo", but you need the translator to tell you about the second one. My proposals after this round of thinking:
Of course, if an implementation is not in a generic library like ICU, but very specific to produce HTML (in a browser), then some of the steps might be short-circuited (produce HTML tags / DOM directly, without format to parts + post-process). |
Each string/substring should have a language and direction attribute (note that this is what W3C I18N is asking TC39 for with the maybe-terribly-named I suspect that MF's Note that |
Ack, thanks. |
About
|
If I correctly interpret what @mihnita @aphillips wrote below my last response we agree on the value and considerations. The only item I'd like to clarify is if @mihnita believes that The question is - what are the next steps? As I mentioned above, I'm concerned about Tech Preview being released without this and I'd like to make sure we don't have any more releases (even if they remain TP) that make testers work with MF output without this feature. |
Not entirely. Language information can be used as a fallback when no direction information is available, but we don't think it is a good general solution.
It seems that way because of JavaScript's historical (and misguided) ambivalence about saying that strings consist of Unicode code points. In reality, the three types @mihnita cites have a clear relationship to their respective representations. The point of
There already exist mappings for RDF and as-a-string serialization schemes in JSON-LD and a number of specifications use what amounts to
Yes! This is entirely an option that is on the table. We would need some group to publish a normative spec (in W3C terms, a "Recommendation" or REC-track document) with the "dictionary" in it which specs could refer to normatively. This is what we asked WebIDL to do, but they "only model things that exist in JavaScript", hence my detour to ask TC39 to make a @zbraniecki noted:
As I mentioned, it could be optional and I suspect it should be optional. Control characters insertion could also be added later, since most consumers probably don't introspect inside strings to find directional boundaries. That is, it might not be a blocker for the preview, but would be Very Nice To Have (compare to current MF, which does nothing). Current formatters, such as I think another interesting question is: does |
I think that And So if there is a part saying "from here to there we have a bidi isolate", For a low level library like ICU that should probably be an option and decided by the developer calling it (or the layers built on top of it). Probably would be good to do the same for ICU4X. In recent years it looks like ICU is going in that direction. For example And I hope we can improve things a bit with MF2. And I think that defining the result of |
My take on this, less verbose, and maybe more clear:
I think my answer would me metadata.
I think we should not concatenate parts and strings. The question is: how to we invoke older formatters which already return strings with controls. |
A few quick comments.
I'm in agreement that the data model should carry enough information to add
extra info for directional formatting where necessary. When formatting to a
plaintext string, that would be using the Unicode directional characters,
but when formatting into other formats (such as HTML) those mechanisms can
come into play. The choice of mechanism would depend on the formats.
On Fri, Oct 28, 2022 at 7:43 PM Zibi Braniecki ***@***.***> wrote:
Ah, good point on lang+dir, rather than just dir.
I think you're bringing two separate dimensions, which I'd categorize as:
1. What information we provide about placeables
2. How we annotate
I'll use the following example: "On January 15th 2022 at 5:45pm, Addison
added 5 photos" which in MF2 will look something like this:
let $dateTime = {$timestamp :datetime date=medium time=medium}
let $personName = {$person :person firstName=long}
let $count = {$photoCount :number}
match {$count}
when 1 {On {$dateTime}, {$personName} added { $count } photo.}
when 0 {On {$dateTime}, {$personName} added { $count } photos.}
There are three placeables in this message and we may know the locale of
the message itself (or not - is it possible for the lang/dir of the message
to be undetermined via new MessageFormat("und") ?).
While that is theoretically possible, in practice there should always be a
specific locale (at least to the lang code) for any message if there are
any placeholders that require formatting. We can't, however, know the base
direction of the message, because that would depend on the context in which
it is being used.
If dateTime is resolved into the same dir/lang as surrounding message we
don't want to annotate, but if the message is in arabic, but DateTimeFormat
doesn't have arabic data and resolves to English, we should annotate at
least with directionality:
On {\uLRI}January 15 2022 at 5:45pm{\uPDI}, Addison added 5 photos.
(we use LRI because we know that datetime is in English, and we either
know that the whole message is in Arabic or it is unknown)
(Minor) I don't think that is a realistic scenario. If a system is
supporting a language like Arabic in messages, then it would surely support
the basic i18n functionality for Arabic. That would be a terrible UI for
users.
On the other hand, having to shift scripts/directions for names would be a
realistic example, so I think it would be better to focus on that in your
scenario.
For the user name, we may have an API that informs in what lang/dir is the
name provided and then compare it to the message lang/dir, or we may not
know.
If we do, and it differs, we can do the same as with date - LRI/RLI and
PDI to pop. If we don't we can use FSI/PDI. If it doesn't differ we don't
inject any.
For $count we repeat the same logic as we did for datetime.
Again, numbers are so basic that this (in a well designed system) shouldn't
occur.
Now, as mentioned in my previous message, the tricky question is how the
develop annotates lang/dir of the variable. I suggested MF2 to provide
typed variables types much like fluent does with FluentDateTime
FluentNumber etc. This would allow for MF2String("Addison", {lang: "en"})
as optional (if omitted we'll use FSI/PDI).
Note, now that
https://www.unicode.org/reports/tr35/tr35-67/tr35-personNames.html#Contents
is out, I'd recommend using examples from that (and it would be great to
get comments on it). The algorithm for formatting does depend on either
receiving the explicit locale of the name to be formatted, or imputing it.
Not sure it would be a good idea to carry an imputed locale into the MF2
data model.
Second question is how to control what we inject. My initial proposal is
something like this:
let mf = new Intl.MessageFormat("en", {
isolates: {
lri: "\uLRI", // or MF2MarkupElement("bdo", {dir: "ltr"})
rli: "\uRLI", // or MF2MarkupElement("bdo", {dir: "rtl"})
fsi: "\uFSI", // or MF2MarkupElement("bdo", {dir: "auto"})
pdi: "\uPDI", // or // or MF2MarkupElementClose("bdo")
}});
This way HTML bindings can provide MarkupElements for the same feature,
and plain text can use the Unicode isolate characters. If LRI/RLI is set to
null then FSI is used. If FSI/PDI is set to null, then nothing is ever
injected.
This means that by default (if isolates is not explicitly provided) the
API will inject unicode marks and frameworks can override them.
As I think Mihai noted, exactly how the injection would work would depend a
great deal on the end environment. It might be better to consider that the
Process that manipulates the data model to produce something other than
plaintext (eg to produce HTML) needs to have enough information about the
placeholders to determine whether they need directional structure (eg
markup) or not.
… Attributes
What this *doesn't* resolve is that in ideal world a message like: Hello
{strong}{$name}{/strong} would resolve to Hello <strong
dir="auto">Addison</strong> rather than to Hello <strong><bdo
dir="auto">Addison</bdo></strong>.
We may later evolve the logic to allow for population of attributes in
cases where markup element is perfectly surrounding a placeable and we want
to set dir/lang.
—
Reply to this email directly, view it on GitHub
<#28 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMELFCGEUSKUBATYNSLWFQGFZANCNFSM4KOXYIMA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I'm concerned about Tech Preview being released without this
I'm confused. The Tech Preview was already released, about 10 days ago. Do
you mean before the production version is released?
…On Sat, Oct 29, 2022 at 11:47 AM Zibi Braniecki ***@***.***> wrote:
If I correctly interpret what @mihnita <https://github.com/mihnita>
@aphillips <https://github.com/aphillips> wrote below my last response we
agree on the value and considerations.
The only item I'd like to clarify is if @mihnita
<https://github.com/mihnita> believes that formatToString should return
the isolation marks or not (you say that the bidi/lang system should
annotate parts, but I don't see your position on the string output).
The question is - what are the next steps? As I mentioned above, I'm
concerned about Tech Preview being released without this and I'd like to
make sure we don't have any more releases (even if they remain TP) that
make testers work with MF output without this feature.
—
Reply to this email directly, view it on GitHub
<#28 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMFJUNFPB2NER3AGDLLWFTXD7ANCNFSM4KOXYIMA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
@mihnita The older formatters return controls to elicit proper ordering of ambiguous sequences within a formatted string, such as a number (especially currency values) or date. The formatters do not provide exterior wrapping/isolation to prevent spillover effects (which is what we're talking about here). @macchiati I don't agree that:
We need to know the base direction of the string, since the string itself is a placeable into its rendering context. When messages don't have a base direction, they are subject to spillover effects or wrong base direction detection, particularly if they start with a misleading strong character. Worst-case, we can use first-strong. I suppose that this might be the realm of a higher-level protocol, such as a resource language. But if strings don't have a base direction, we won't know how to decorate them automagically to get the right results. Inferring the base from the language is possible if that's all we have. The following examples can be test driven on this demo page. The Arabic pattern means roughly "price {x} + {y} shipping!" First, placeables needs isolation to avoid string-internal spillover effects. If you paste this string into the text box (this is also one of the examples in the list box at the top of the page):
You get: Adding a
If we don't know the base direction of the whole string, though, then when we insert it into a page we can get spillover effects that are unwanted. Let's simulate that by putting an opposite direction (English) wrapper around the string:
... which produces the thoroughly broken: Fixing the interior placeables helps:
... but still leaves the exclamation point on the wrong side (other effects can be produced with other strings): |
I think that a given component should take care of its interior needs and then expose its own base paragraph direction. That way, if all the directions align you don't get extra characters providing unnecessary levels of isolation and the component doesn't need to know or be told its context--it just needs to report the base paragraph direction of its output (which it already knows). Your example doesn't make that much sense to me: a date format or compact decimal format in a given locale will be assembling a string with a single base direction and the tokens it emits will be in a specific language. There can be local considerations (my example with RLM on the date If you take your example and turn it into an MF pattern string:
... and it's (Google translated) Arabic friend:
If each formatter function reports the base direction of its output string (e.g. |
It might be useful to approach this by figuring out what the to-string formatted output of MF2 should be. We may have some parts of the output for which we can know the directionality (e.g. literal text in the parent locale or |
Yes. MF2 should make it an extra step to produce multi-directional string output without isolation marks. By default it should use the information it has about placeholder positions to isolate at boundaries. |
@zbraniecki How about cases where we know the directionality matches? For example, in an |
Those should be exempted from marks.
It's a bit more tricky actually. We should evaluate whether the number formatter used to format Also, as Addison pointed out, we may want to evaluate language information alongside direction. I'm a bit less clear on how exactly this meta information should look like, but I imagine that we could have a |
Could we first figure out the absolute minimum that's required in the MF2 spec for formatted string output? That we're all agreed on as being a part of the base layer, while e.g. the shape of the formatted parts might well end up getting defined by specifications building on top of it. Maybe something like this?
Such wording would require a part sequence like LTR/RTL/RTL to include an unnecessary PDI + RLI character pair between the RTL parts if the message as a whole is LTR. Should that be optimised out? |
It doesn't work that way. If you have a base paragraph direction string that is LTR and you have two consecutive RTL insertions, you want isolation in between them to prevent spillover effects. Consider this example:
This has two placeable strings ("1,234.56 AED" and "12.99 USD") with only a space between them. Without isolation they draw like: With isolating controls they draw correctly without spillover effects: The only time that isolating markup or controls can be omitted safely is when: (i) the placeable and the host string have the same base direction This is why unknown strings need FSI/PDI around them. |
Ah, had not played around with that example; thank you, that was useful. I wasn't able to observe spillover when omitting inner isolates between parts with the same directionality, but their overall order is indeed affected. So if we're in an LTR context, and the logical order of our message is
Is it |
It is FSI if the direction of the inserted string is unknown. It is LRI or RLI if the direction is known (matching the direction of the string). |
I also think that a big part of the discussion is about who is responsible for adding those control characters, or special-bidi-control parts when we format to parts. It is pretty clear that the "function" should be do it, because of situations like this:
Where the formatted date needs internal directional control characters. But should the result of the whole placeholder be wrapped? Here is what I mean:
Should that be done by "the engine" (the part of MessageFormat implementation that is function agnostic, in only invokes functions and "glues" the result together)? The engine:
or the function:
I am inclined to say the function is also responsible for that part. |
I think there's alternative to:
We could do:
and allow the consumer to decide on injecting marks. |
@mihnita @eemeli @stasm @aphillips @echeran - thoughts? |
I agree that the isolates want to be included in specific parts, not separate elements in the "parts" array. For cases where the direction and language are the same all the way through, it allows implementations to omit isolating controls (or markup or such). For cases where the parts are separately rendered, it allows the caller to extract language and direction metadata for a given span. If we had an
The To @eemeli's point earlier, we could resolve this separately (and potentially later), provided we can agree on the "format-to-string" output. I agree that the code point sequences don't have to be identical to the concatenated Finally, note that |
So, in fact you argue that parts should be:
right? That's a pretty challenging alteration and incompatible with ECMA-402 FormatToParts, but maybe necessary? Or we could assume that people can derive |
Coming from |
A few comments on the following. (Also, I'm assuming that this corresponds
to the information relayed back to the client when the caller asks for the
'deep' model, not just toString call.)
parts = {
elements: [
{type: LITERAL, value: { value: "Expires on ", lang: "en-US", dir: "LTR" }},
{type: DATE, value: someDateValue},
{type: LITERAL, value: { value: ".", lang: "en-US", dir: "LTR" }}
],
lang: "en-US",
dir: "LTR",
}
First, I'm not sure you need the deep structure; flatter is usually simpler.
parts = {
elements: [
{type: LITERAL, value: "Expires on ", lang: "en-US", dir: "LTR"},
{type: DATE, value: someDateValue},
{type: LITERAL, value: ".", lang: "en-US", dir: "LTR"}
],
lang: "en-US",
dir: "LTR",
}
Secondly. language tagging can indeed give better results for a block
of text being Chinese vs Japanese.
However, I think fine-grained tagging for language in constructed
messages is usually unnecessary, and often counter-productive. In
practice you really don't want a message to a Japanese person to
contain a placeholder-substitution that is in a Chinese font. Nor do
you typically want a constructed message for a user's language to have
a segment that line-breaks or hyphenates differently than that user's
would expect for their language.
Do you want some German Zuk-
ker?
When a message gets constructed, you really want all the pieces of the
message to be in the same language wherever possible. I don't want a
Czech date in the parts above, but one that is really for en-US.
There are exceptions. If I'm getting voice directions to Zug, I'd like
to hear /teɪk ðə nɛkst raɪt təˈwɔrdz *tsuk*/. But only in the case
that the system knows that I speak both English and German; otherwise
/zʌg/ is probably best. So only in exceptional cases do you need the
lang value to be different than the overall language of the message,
and only in exceptional cases do you want the language of the message
to be different than the language that you ask the message to be
constructed for. So a typical case would be that the language can be
omitted from the enclosing parts.
parts = {
elements: [
{type: LITERAL, value: "Expires on ", dir: "LTR"},
{type: DATE, value: someDateValue},
{type: LITERAL, value: ".", dir: "LTR"}
],
lang: "en-US",
dir: "LTR",
}
For BIDI as well, it is only necessary to convey the status of a piece that
differs from the enclosing parts; so those can also be optional in the
cited case.
parts = {
elements: [
{type: LITERAL, value: "Expires on "},
{type: DATE, value: someDateValue},
{type: LITERAL, value: "."}
],
lang: "en-US",
dir: "LTR",
}
Now, I do think it would be useful to have examples of:
1. a 'deep model' constructed message where the BIDI tags are needed
(Addison had one that could be transformed into this syntax), and
2. where the difference in language needs to be captured.
…On Thu, Dec 8, 2022 at 1:50 PM Addison Phillips ***@***.***> wrote:
Coming from resolvedOptions() sounds right.
—
Reply to this email directly, view it on GitHub
<#28 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMGTYNRPQ2XXTSJNE7DWMJJ2JANCNFSM4KOXYIMA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
In general, I agree. However:
My example was somewhat pedantic about lang/dir metadata because I'm thinking in terms of "attributed strings" or "attributed values". There can be (and should be) an inheritance model so that data does not need to be replicated on every level. But we need the ability to tag data as appropriate. The implementation we made when I was at Amazon tied the resource format and formatter together. The template structure used selectors (just as we've moved to selectors in MFv2) which resolved to a pattern string (by evaluating plurals, selects, and such) and the resulting pattern string was in a single language and had a single base direction. When we look at Does that make sense? |
This is not correct. Even if the base direction is the same, there are cases where isolation of placeables is desirable to prevent spillover effects. Consider the example السعر 1,234.56 AED + 12.99 USD الشحن This should render:
Note that the second string has RLI/PDI around the placeables--but all of the "parts" are RTL!! The presence of LTR characters and numbers in the currency values does not mean that their locale is not |
This is not correct. Even if the base direction is the same, there are
cases where isolation of placeables is desirable to prevent spillover
effects.
I'm not saying that. What I was saying is that if you don't need to carry
the info in the element explicitly; you can inherit from the parts. That
doesn't mean that the information dir: "LTR" isn't there, nor that it can't
be used to avoid spillover effects. It just means that you can get dir:
"LTR" from the parts.
…On Thu, Dec 8, 2022 at 4:50 PM Addison Phillips ***@***.***> wrote:
For BIDI as well, it is only necessary to convey the status of a piece
that differs from the enclosing parts; so those can also be optional in the
cited case.
This is not correct. Even if the base direction is the same, there are
cases where isolation of placeables is desirable to prevent spillover
effects. Consider the example The price is ${price} + ${shipping} in
shipping in Arabic:
السعر 1,234.56 AED + 12.99 USD الشحن
This should render:
السعر 1,234.56 AED + 12.99 USD الشحن
Note that the second string has RLI/PDI around the placeables--but all of
the "parts" are RTL!! The presence of LTR characters and numbers in the
currency values does not mean that their locale is not ar-AE or that
their base direction is not RTL. Also enclosing and ending punctuation
positioning depends on direction.
—
Reply to this email directly, view it on GitHub
<#28 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMFRGQJQ7CJOTHSOCWLWMJ64PANCNFSM4KOXYIMA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I'm not saying that we don't need the ability to tag BIDI; if the dir on an element isn't equal to a dir on the parts, it needs to be present. That is, I agree with your statement "There can be (and should be) an inheritance model so that data does not need to be replicated on every level. But we need the ability to tag data as appropriate."
On the other hand, I still doubt that the lang attribute is particularly useful. I'm not against having it be an optional attribute. I just have yet to see a convincing case where it is required (as I noted earlier). And in the case you give here, I don't see that it is. An example would help: especially given that the data sources will often not have that information, what would the process do that in the presence of the lang attribute that it wouldn't do otherwise?
|
We know, and you listed it yourself, that we'll want it for TTS. I'm ok with it being optional, as it won't be used by toString reducer. |
I'm fine with optional.
…On Thu, Dec 8, 2022 at 5:52 PM Zibi Braniecki ***@***.***> wrote:
I just have yet to see a convincing case where it is required (as I noted
earlier).
We know, and you listed it yourself, that we'll want it for TTS.
I'm ok with it being optional, as it won't be used by toString reducer.
—
Reply to this email directly, view it on GitHub
<#28 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMBMEZMKDB2EWGOFXFDWMKGHTANCNFSM4KOXYIMA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I continue to think that the shape of an MF2 formatted parts result should not be defined by the core MF2 spec, but by the spec/implementation layers building on top of it. I believe that @mihnita is working on a PR stating something like that so that we'll be able to close #41 and #272. However, I do think that the formatted string result should be explicitly defined for MF2; on that, I rather like the current shape of #315. There it would be valuable to get some more input on whether the isolation should happen by default either
The current proposal is to go with option 2, but e.g. applying the change from #315 (comment) would switch to the first option. Once we have a definition of how this should work for string output, it'll make it easier for implementations (i.e. the ICU4J tech preview & the JS polyfill) to experiment with formatted-parts and verify that the intended goal is achievable. |
How do you envision it affecting building binding layers on top of MF2 to various frameworks? If ICU4C, ICU4J, ICU4X, ECMA-402 and even maybe SpiderMonkey vs V8 will have different shape of parts and differently encode information, including inevitably that some implementations will provide information allowing bindings to do things that other implementations will not provide sufficient information for? |
@zbraniecki I've replied to you here: #272 (comment). |
@eemeli I agree with you. I think #315 is close. I added a comment just now about tweaking the wording there to isolate by default but permit implementations to "optimize" their output (by omitting isolation when it is not necessary). Such optimization is harder than it looks. It is important that isolation is not only when the placeable's direction "does not match" the host string's direction. I think the requirement is that isolation is required if one of these conditions is met:
Another way to say this is that the pattern (your element?) string and each of its parts each has a direction attribute that can be queried. When a given part does not have its own direction or language value, it is inherited from the "level above" (generally the pattern string as a whole) or computed from the locale (e.g.
My example wasn't affected by lack of language metadata to be sure. @zbraniecki calls out voice selection. Any kind of language-specific processing (such as font selection in CJK, for example) would also benefit from accurate language metadata. This is more of a corner case for MFv2, since generally we're trying to make a string in a given locale, but data is data and can be multilingual. When it is not in the same language, the ability to query the metadata allows the user to e.g. decorate the text with a language appropriate In this case, every "literal" part of the message that is part of the template string will have the same language as the template string as a whole. It's only when a literal piece of data (such as the book's title, as in the example) has a different language that the metadata appears on a literal. MessageFormat won't generally use the value, but consumers might when consuming the parts. (If we permitted nested patterns, then the language would be necessary for features such as quote generation or to feed nested formatters as the locale)
The
In this case, it's a date value, such as a
Or perhaps:
|
Since placeables can be of mixed directionality, I'd like to suggest that Fluent's FSI/PDI insertion for string placeholders is added to requirements.
This allows a variable like
userName
to be inserted in a string with different directionality and inform the layout of the possible direction change.W3C backlog: https://www.w3.org/International/articles/inline-bidi-markup/
Fluent wiki: https://github.com/projectfluent/fluent/wiki/BiDi-in-Fluent
The text was updated successfully, but these errors were encountered: