Stable and compact operator dictionary #104

fred-wang · 2021-08-05T13:30:54Z

In the past, I explained that there are strong requirements for browser implementations. In previous MathML versions, the operator dictionary was constantly changing and was not really covered by exhaustive non-regression testing... something not compatible with modern W3C spec practice. It was also huge and that's why I asked for more normalization & compactification before being able to upstream it to Chromium.

One year later, we are again in a state where the operator dictionary has changed and is out of sync with WPT tests and with what we upstreamed in Chromium. And the non-normative version is out-of-sync with the compact form too (see below).

We should ensure that the operator dictionary is stable and compact ; I really hope we can do it. If people disagree with that then I believe MathML Core should have its own compact & stable version for browser implemention & WPT, separate from the one used by MathML Full / xml-entities specs.

I'm proposing December 2021 for a freeze of the MathML Core dictionary. After that, I'll check again carefully the compactness of dictionary and update the WPT and browser implementations. Some things to notice:

Unicode 14 is intended to be released on September 14 2021. I really hope we don't add more entries though.
We should review the stretch orientation ( issue review operators stretching along the inline axis #102 )
9e11468 modified the assert to prevent unintentional addition of unknown non-BMP characters. Now we have many new supplemental arrows-C in the non-normative version that are not handled by https://w3c.github.io/mathml-core/#operator-dictionary ; do we really need to add all these arrows ? If so, we can add try to add yet-another-special-case but I'd prefer to avoid that.

The text was updated successfully, but these errors were encountered:

…ce they are not handled by https://w3c.github.io/mathml-core/#operator-dictionary #104

This allows to add this info to the non-normative human-readable table. The estimated storage size for this information is also added. #104

davidcarlisle · 2021-08-08T10:16:19Z

@fred-wang I think there is broad consensus that the op dict will need to be stable. The only disagreement if it should have been frozen begore even the first public working draft of mathml core, which seemed premature, but freezing in the timescale you suggest seems perfectly reasonable.

On stretch, I agree we should review and probably explicitly distinguish h and v stretch

supplemental arrows C I added as it seemed somehow the right thing to defer to Unicode here but they are marginal really and could just as easily be considered non mathematical sign writing symbols. So if the group has consensus to remove the operator dictionary entries for this block I wouldn't object (should be same for core and full)

On Unicode 14 there is an updated Unicode 14 unicode.xml in the u14 branch at https://github.com/w3c/xml-entities/tree/u14 but it has no diffs for the operator dictionary. I do hope that (somehow) we can incorporate the script/calliographic alphabets that are added in Unicode 14, but I don't think that should affect core it's just character data for an mi at that level.

davidcarlisle · 2021-11-29T17:25:30Z

Note since this issue was raised the Unicode 14 update has been merged in to main, with no changes to the operator dictionary, the Supplemental arrows c block is

https://w3c.github.io/xml-entities/1F8.html

If this is causing issues we could drop the operator dictionary entries (and drop this 1F8 page suggesting mathematical use) I don't think that would really lose anything, any character can be used in principle even if it has no math-specific spacing.

fred-wang · 2021-11-30T06:29:43Z

As I said, I'd really prefer we don't add new entries for these arrows.

davidcarlisle · 2021-11-30T06:43:08Z

Yes OK, unless someone argues for them in next day or so, I'll lose them.

davidcarlisle · 2021-12-16T19:59:37Z

I updated unicode.xml in xml-entities at

w3c/xml-entities@1a70505

dropping the suppemental arrows C block (U+1F800) from the operator dictionary.

The python scripts here would need to be re-run to generate the versions of the operator dictionary data here. I could do that but leaving that with @fred-wang to coordinate with other changes, unless you ask me to run them.

fred-wang · 2021-12-20T14:04:32Z

Thanks @davidcarlisle ; we will resume the mathml work in january. I'll check and coordinate the mathml op dict changes everywhere.

davidcarlisle · 2021-12-20T15:28:18Z

@fred-wang thanks: sounds good, and have a good and safe festive period.

#104

fred-wang · 2022-01-27T08:56:18Z

Because of step 4 (fallback when not explicit) of https://w3c.github.io/mathml-core/#dfn-algorithm-for-determining-the-properties-of-an-embellished-operator the following entries are problematic:

U+007C is in infixEntriesWithDefaultValues but also in ['prefixEntriesWithSpacing0AndStretchySymmetric', 'postfixEntriesWithSpacing0AndStretchySymmetric']
U+2234 is in infixEntriesWithDefaultValues but also in ['prefixEntriesWithLspace0Rspace0']
U+2235 is in infixEntriesWithDefaultValues but also in ['prefixEntriesWithLspace0Rspace0']
U+223C is in infixEntriesWithDefaultValues but also in ['prefixEntriesWithLspace0Rspace0']

Indeed, the current algo works as if infixEntriesWithDefaultValues was empty. Probably we'd need an extra step in the algo to handle that. But if some of these are essential, it would be good to simplify a bit.

I see it's important for U+007C and it's covered by mathml/presentation-markup/operators/mo-stretch-properties-001.html ; but AFAIK not for the others.

fred-wang · 2022-01-27T08:58:27Z

CL to update in Chromium:
https://chromium-review.googlesource.com/c/chromium/src/+/3419439/1#message-f180e975c56b66947e855dd85100a1c3084a6714

… to one of the existing categories with a strong Python assertion. #104

fred-wang · 2022-01-28T08:21:48Z

I uploaded a new patch on https://chromium-review.googlesource.com/c/chromium/src/+/3419439 ; trying to fix the remaining failures:

(1) mathml/presentation-markup/operators/mo-stretch-properties-001.html

This one is testing U+007C in different (implicit) forms. As I mentioned on #104 (comment) the current algo of the spec is broken for operators that use the default values in the infix forms but that have different values in other forms.

That's also true for U+2234, U+2235 and U+223C ; although I don't know if that's intentional for these three ones. If not, it would be better to avoid this situation. Otherwise, we should add WPT tests for them too I guess.

I added an assert to the Python script operator-dictionary.py in order to avoid having too many operators with this situation. Once we agree on a small list of exceptions, we can tweak the script to accept the corresponding code points.

In any case, since we definitely need that for U+007C, the spec should be changed to be able to handle these cases:

https://w3c.github.io/mathml-core/#operator-dictionary
"This compact form removes about 800 entries from the original operator dictionary that actually correspond to default values. They are not necessary since they are handled by the fallback case of 3.2.4.2 Dictionary-based attributes anyway." => this is true except for the exceptions mentioned above.
"For categories that don't have an encoding in Figure 26 (namely K, M) ..." => we need to handle the cases mentioned above by returning a "ForceDefault" status or similar.

https://w3c.github.io/mathml-core/#ref-for-dfn-algorithm-for-determining-the-properties-of-an-embellished-operator-1
"If no entry is found and the form of embellished operator was not explicitly specified as an attribute on its core operator, then user agents must try other dictionary entries for different values of F in the following order: infix, prefix, postfix;" => we need to skip this fallback for a "ForceDefault" status.

(I'll also probably tweak a bit the spec to clarify the execution and results of the algorithm retrieving the properties from the dictionary.)

(2) external/wpt/mathml/presentation-markup/operators/operator-dictionary-combining.html
This is a bug in the WPT test and was already failing (with different metrics) in the past. The operators.woff font is missing glyphs for the combining characters, fixed by
https://chromium-review.googlesource.com/c/chromium/src/+/3419439/2/third_party/blink/web_tests/external/wpt/mathml/tools/operator-dictionary.py

(3) external/wpt/mathml/presentation-markup/operators/operator-dictionary-spacing-001.html
external/wpt/mathml/presentation-markup/operators/operator-dictionary-spacing-002.html
external/wpt/mathml/presentation-markup/operators/operator-dictionary-spacing-004.html
external/wpt/mathml/presentation-markup/operators/operator-dictionary-spacing-005.html

These failures happened because the multi char ".." was removed from Operators_2_ascii_chars. I'm personally fine with removing exceptions. But it would be good to check whether that was intentional.

(4) external/wpt/mathml/presentation-markup/operators/operator-dictionary-spacing-001.html
This failure happened because the char U+2982 was added in category M which is handled specially. Again, I don't know whether that was intentional and less exception is always better for me. But if it's really justified I guess it's OK.

(5) Internal MathOperatorDictionaryTest
This failure happened because categories K and L got switched somehow. The script automatically orders categories by size, this was probably be due to new operators added/removed. I updated my CL to fix that and also pushed cf9221c

#104

NSoiffer · 2022-01-31T17:53:27Z

I think U+2234 (∴) and U+2235 (∵) are almost always prefix, so the infix versions can be dropped. I need to do a little research verify this though.

U+223C(∼) is similar to the ASCII "~". Probably both are used in interchangeable manners. There really are both prefix and infix usages. I don't think one usages is overwhelmingly common, so it is important to keep both. Again, I need to do some research on usage to check this.

NSoiffer · 2022-01-31T18:03:49Z

I just looked at the algorithm and I don't understand why the entries you list are problematic. Step 4 is

If no entry is found and the form of embellished operator was not explicitly specified as an attribute on its core operator, then user agents must try other dictionary entries for different values of F in the following order: infix, prefix, postfix;

Can you explain why there is a problem with those entries being both prefix and infix given that the algorithm says to look for them (the wording probably should be tweak to say "...in the following order: infix, prefix, postfix and ignoring the value found above")? "+" and "-" have similar properties. I think I must be missing what the point/problem there is with those entries.

bkardell · 2022-01-31T18:16:00Z

Discussed on the call today, resolved to close this issue adding a comment to open a new issue for Step 4, if anyone feels that's needed.

fred-wang · 2022-01-31T19:36:55Z

Reopening since the work I mentioned is not complete.

Can you explain why there is a problem with those entries being both prefix and infix given that the algorithm says to look for them (the wording probably should be tweak to say "...in the following order: infix, prefix, postfix and ignoring the value found above")? "+" and "-" have similar properties. I think I must be missing what the point/problem there is with those entries.

One of the trick of the compact dictionary is to remove all entries that would use the default properties, since they are essentially redundant with the fallback.

However, people don't always group things with <mrow> properly so e.g. in
<mi>a</mi><mo>+</mo><mo>(</mo><mfrac><mi>x</mi><mi>y</mi></mfrac><mo>)</mo>
The inner <mo>(</mo> would be considered infix.

So to work around that kind of issue, step 4 says we should try the different forms as a fallback.

But now if an operator has an infix entry with default properties and another form with non-default properties, the non-default would now be used (since the infix one is not in the compact dictionary anymore).

So to keep making this work, we need to tweak the spec with something like as explained above and that I implemented in the CL.

The Python script checks operators having that issue and there are only four problematic operators: #104 (comment)

Note that the infix forms of + and - don't use the default properties, so we don't have this issue. Actually moving the infix forms of the problematic operators to an existing infix category could be an alternative way to resolve the problem.

... and the algorithm for determining the properties of an embellished operator. There should be no behavior change. However section B.1 now describes two algorithms: - to determine the category of an operator - to set the properties of an operator from its category which are used by the algorithm to set the properties of an embellished operator. This makes it more convenient for implementations and will allow to work around the issue with multiple forms tried on operators that can be both in the Default category for the infix form and in a different category for a non-infix form. #104

fred-wang · 2022-02-01T13:05:50Z

I rewrote the section for the operator dictionary.

This is the TODO list before the issue can be closed:

Check whether U+2982 Z NOTATION TYPE COLON was added on purpose in category M (infix, lspace=0, rspace=0.16666666666666666em, no boolean properties). That seems consistent with U+003A COLON.
Check whether ".." was removed on purpose from the list of 2-ASCII-char operators.
Fix issue with U+007C, U+2234, U+2235, U+223C having an infix form with default values but a non-infix form with different values.
Update spec and tests (and chromium implementation).

For the issue with U+007C, U+2234, U+2235, U+223C, there are several points raised:

Use non-default values for the infix form?
U+2234 and U+2235 are almost always prefix, so the infix versions can be dropped.
U+007C and U+223C have infix/prefix usages. But U+223C TILDE OPERATOR is currently not consistent with U+007E TILDE.
Introduced a small Forced Default category that would be handled as categories L and M and with the same properties as the Default category. This will make step 4 automatically skipped in the "algorithm for determining the properties of an embellished operator".

davidcarlisle · 2022-02-28T21:06:38Z

Z notation type colon was infix with lspace=rspace=5 changed to lspace=0 rspace=3 at
mathml-refresh/xml-entities@0e0e89b

I think this is probably in error it is \mathrel not \mathpunct in unicode-math for latex, and there is discussion and links to the relevant Z specifications at this stackexchange answer https://tex.stackexchange.com/a/315327

I propose we revert this to be symmetric 5 5 (@NSoiffer ?)

.. was removed in the same commit We could put back as infix, (@NSoiffer ?)
issue with having infix matching the defaults seemed to be mainly an issue with the pruning of the operator dictionary which should not drop the entry in these cases were there is a non default entry with a different form. @fred-wang do I understand you are proposing a Forced Default category here to address this (it seems that way from the above comment but I don't see it in the current core draft, so I am not sure?
agree 2234 and 2235 can be prefix-only so we can drop the entries for infix.
007E and 223C ~ and ∼ do not have the same spacing in LaTeX unicode-math either, the ascii ~ is a mathord so gets no space, \sim 223C is a \mathrel so has 5 left and right.

fred-wang · 2022-03-01T07:15:06Z

issue with having infix matching the defaults seemed to be mainly an issue with the pruning of the operator dictionary which should not drop the entry in these cases were there is a non default entry with a different form. @fred-wang do I understand you are proposing a Forced Default category here to address this (it seems that way from the above comment but I don't see it in the current core draft, so I am not sure?

Correct. I've rewritten a bit the spec, but I'm still waiting the final decision before updating the values and adding the Forced Default category. These changes are also performed in the pending CL https://chromium-review.googlesource.com/c/chromium/src/+/3419439

davidcarlisle · 2022-03-01T08:32:50Z

@fred-wang thanks for the clarification.

So to be specific I think we should make the following changes to unicode.xml

revert the change to Z colon, making it \mathrel again
delete infix because and therefore

I think I'd propose not adding .. as either infix or postfx as it's quite rare and even when used probably doesn't want extra space in many cases eg if you show ranges as 1..5 Also I propose not changing U+223C keeping at \mathrel and keeping ~ as \mathord.

so the following change in the xml-entities repository

diff --git a/unicode.xml b/unicode.xml
index 224eacf..75142d6 100644
--- a/unicode.xml
+++ b/unicode.xml
@@ -43593,7 +43593,6 @@ Barbara Beeton for the STIX project
             <desc>alias ISOTECH there4</desc>
          </entity>
          <font name="hlcry" pos="144"/>
-         <operator-dictionary priority="70" form="infix" lspace="5" rspace="5"/>
          <operator-dictionary priority="70" form="prefix" lspace="0" rspace="0"/>
          <description unicode="1.1">THEREFORE</description>
       </character>
@@ -43625,7 +43624,6 @@ Barbara Beeton for the STIX project
             <desc>alias ISOTECH becaus</desc>
          </entity>
          <font name="hlcry" pos="145"/>
-         <operator-dictionary priority="70" form="infix" lspace="5" rspace="5"/>
          <operator-dictionary priority="70" form="prefix" lspace="0" rspace="0"/>
          <description unicode="1.1">BECAUSE</description>
       </character>
@@ -59418,7 +59416,7 @@ Barbara Beeton for the STIX project
       <character id="U02982" dec="10626" type="other" mode="unknown">
          <unicodedata category="Sm" combclass="0" bidi="ON" mirror="N" mathclass="F"/>
          <mathlatex set="unicode-math">\typecolon</mathlatex>
-         <operator-dictionary form="infix" lspace="0" rspace="3" priority="60"/>
+         <operator-dictionary form="infix" lspace="5" rspace="5" priority="60"/>
          <description unicode="3.2">Z NOTATION TYPE COLON</description>
       </character>
       <character id="U02983" dec="10627" mode="math" type="opening">

Then @fred-wang can handle the forced default changes assuming those implementation changes land?

@NSoiffer does that look OK?

fred-wang · 2022-03-15T10:09:10Z

@davidcarlisle So do is there any more change to decide for the op dict before we go ahead updating everything ?

davidcarlisle · 2022-03-15T11:49:14Z

@fred-wang not as far as I know, but we have a WG call on Thursday, we could put it on the agenda to get sign off on this then? (@NSoiffer )

As you probably saw I re-normalised the priority values but that is purely mechanical and doesn't affect the data used by core.

NSoiffer · 2022-03-17T07:06:59Z

Looking at the z-notation colon

The Unicode reference http://www.unicode.org/Public/math/latest/MathClassEx-15.html calls it a fence (clearly wrong)
The TeX reference https://gist.github.com/jaantollander/b49132a36b035cdfe05345dd0c7edf0f calls it mathbin
The Z reference manua doesn't even list it.
The Formal Specification—Z Notation—Syntax, Type and Semantics Draf seems to use relational spacing

There aren't many examples of its usage, but here are two from the formal spec draft that seem to use relational operator spacing:

I'm pretty sure at one point I spent a awhile on the concrete syntax to figure out a few of the Z notation values, but my brain doesn't seem to see now what I saw earlier.

Given the examples, I agree with @davidcarlisle to go with relational op spacing.

fred-wang · 2022-03-19T13:05:26Z

OK, I understand no more changes are planned, so will proceed to the update next week

davidcarlisle · 2022-03-19T13:21:35Z

@fred-wang yes I thought I left a comment here after the working group meeting on Thursday, but apparently I didn't post it, sorry.

fred-wang · 2022-03-21T09:10:52Z

OK thanks. I pushed 3b54b10

NSoiffer · 2022-04-23T00:20:07Z

This was merged into the spec.

…onary-categories-values Issue #104 http://0.0.0.0:8000/#operator-dictionary-categories-values

fred-wang added a commit that referenced this issue Aug 7, 2021

Remove non-BMP arrows from the non-normative human-readable form, sin…

17f7757

…ce they are not handled by https://w3c.github.io/mathml-core/#operator-dictionary #104

fred-wang added a commit that referenced this issue Aug 7, 2021

Move table of operators with an inline stretch axis to a Python script

7ef780d

This allows to add this info to the non-normative human-readable table. The estimated storage size for this information is also added. #104

fred-wang added a commit that referenced this issue Jan 27, 2022

Compact operator dictionary: make the script generate a C++ example.

a80a3ca

#104

fred-wang added a commit that referenced this issue Jan 27, 2022

Add a check for #104 (comment)

6c533ee

bkardell added the Agenda label Jan 27, 2022

bkardell mentioned this issue Jan 27, 2022

MathML-Core meeting, Monday Jan 31, 2022 Agenda #113

Closed

fred-wang added a commit that referenced this issue Jan 28, 2022

Operator dictonary : replace workaround for entries that don't belong…

9e57710

… to one of the existing categories with a strong Python assertion. #104

fred-wang added a commit that referenced this issue Jan 28, 2022

Fix categories needed direct verifications

cf9221c

#104

bkardell closed this as completed Jan 31, 2022

fred-wang reopened this Jan 31, 2022

bkardell mentioned this issue Feb 24, 2022

MathML-Core meeting, Monday Feb 28, 2022 Agenda #114

Closed

davidcarlisle mentioned this issue Mar 3, 2022

remove infix because and therefore, make Z notation colon a mathrel w3c/xml-entities#8

Merged

davidcarlisle mentioned this issue Mar 17, 2022

Fixes for operator dictionary w3c/mathml#87

Closed

bkardell mentioned this issue Mar 25, 2022

MathML-Core meeting, Monday March 28, 2022 Agenda #118

Closed

dginev mentioned this issue Apr 22, 2022

MathML-Core meeting, Monday April 25, 2022 Agenda #143

Closed

NSoiffer closed this as completed Apr 23, 2022

fred-wang mentioned this issue Sep 21, 2022

Operator dictionary: Order of preference for forms #167

Closed

fred-wang added a commit that referenced this issue Sep 21, 2022

Add ForceDefault to https://w3c.github.io/mathml-core/#operator-dicti…

3f7714d

…onary-categories-values Issue #104 http://0.0.0.0:8000/#operator-dictionary-categories-values

hbghlyj mentioned this issue Feb 3, 2023

\middle| doesn't have MathML attribute stretchy="true" mathjax/MathJax#2996

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stable and compact operator dictionary #104

Stable and compact operator dictionary #104

fred-wang commented Aug 5, 2021

davidcarlisle commented Aug 8, 2021

davidcarlisle commented Nov 29, 2021

fred-wang commented Nov 30, 2021

davidcarlisle commented Nov 30, 2021

davidcarlisle commented Dec 16, 2021

fred-wang commented Dec 20, 2021

davidcarlisle commented Dec 20, 2021

fred-wang commented Jan 27, 2022

fred-wang commented Jan 27, 2022

fred-wang commented Jan 28, 2022 •

edited

Loading

NSoiffer commented Jan 31, 2022

NSoiffer commented Jan 31, 2022

bkardell commented Jan 31, 2022

fred-wang commented Jan 31, 2022

fred-wang commented Feb 1, 2022 •

edited

Loading

davidcarlisle commented Feb 28, 2022

fred-wang commented Mar 1, 2022 •

edited

Loading

davidcarlisle commented Mar 1, 2022

fred-wang commented Mar 15, 2022

davidcarlisle commented Mar 15, 2022

NSoiffer commented Mar 17, 2022

fred-wang commented Mar 19, 2022

davidcarlisle commented Mar 19, 2022

fred-wang commented Mar 21, 2022

NSoiffer commented Apr 23, 2022

Stable and compact operator dictionary #104

Stable and compact operator dictionary #104

Comments

fred-wang commented Aug 5, 2021

davidcarlisle commented Aug 8, 2021

davidcarlisle commented Nov 29, 2021

fred-wang commented Nov 30, 2021

davidcarlisle commented Nov 30, 2021

davidcarlisle commented Dec 16, 2021

fred-wang commented Dec 20, 2021

davidcarlisle commented Dec 20, 2021

fred-wang commented Jan 27, 2022

fred-wang commented Jan 27, 2022

fred-wang commented Jan 28, 2022 • edited Loading

NSoiffer commented Jan 31, 2022

NSoiffer commented Jan 31, 2022

bkardell commented Jan 31, 2022

fred-wang commented Jan 31, 2022

fred-wang commented Feb 1, 2022 • edited Loading

davidcarlisle commented Feb 28, 2022

fred-wang commented Mar 1, 2022 • edited Loading

davidcarlisle commented Mar 1, 2022

fred-wang commented Mar 15, 2022

davidcarlisle commented Mar 15, 2022

NSoiffer commented Mar 17, 2022

fred-wang commented Mar 19, 2022

davidcarlisle commented Mar 19, 2022

fred-wang commented Mar 21, 2022

NSoiffer commented Apr 23, 2022

fred-wang commented Jan 28, 2022 •

edited

Loading

fred-wang commented Feb 1, 2022 •

edited

Loading

fred-wang commented Mar 1, 2022 •

edited

Loading