-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stable and compact operator dictionary #104
Comments
This allows to add this info to the non-normative human-readable table. The estimated storage size for this information is also added. #104
@fred-wang I think there is broad consensus that the op dict will need to be stable. The only disagreement if it should have been frozen begore even the first public working draft of mathml core, which seemed premature, but freezing in the timescale you suggest seems perfectly reasonable. On stretch, I agree we should review and probably explicitly distinguish h and v stretch supplemental arrows C I added as it seemed somehow the right thing to defer to Unicode here but they are marginal really and could just as easily be considered non mathematical sign writing symbols. So if the group has consensus to remove the operator dictionary entries for this block I wouldn't object (should be same for core and full) On Unicode 14 there is an updated Unicode 14 unicode.xml in the u14 branch at https://github.com/w3c/xml-entities/tree/u14 but it has no diffs for the operator dictionary. I do hope that (somehow) we can incorporate the script/calliographic alphabets that are added in Unicode 14, but I don't think that should affect core it's just character data for an mi at that level. |
Note since this issue was raised the Unicode 14 update has been merged in to main, with no changes to the operator dictionary, the Supplemental arrows c block is https://w3c.github.io/xml-entities/1F8.html If this is causing issues we could drop the operator dictionary entries (and drop this 1F8 page suggesting mathematical use) I don't think that would really lose anything, any character can be used in principle even if it has no math-specific spacing. |
As I said, I'd really prefer we don't add new entries for these arrows. |
Yes OK, unless someone argues for them in next day or so, I'll lose them. |
I updated unicode.xml in xml-entities at dropping the suppemental arrows C block (U+1F800) from the operator dictionary. The python scripts here would need to be re-run to generate the versions of the operator dictionary data here. I could do that but leaving that with @fred-wang to coordinate with other changes, unless you ask me to run them. |
Thanks @davidcarlisle ; we will resume the mathml work in january. I'll check and coordinate the mathml op dict changes everywhere. |
@fred-wang thanks: sounds good, and have a good and safe festive period. |
Because of step 4 (fallback when not explicit) of https://w3c.github.io/mathml-core/#dfn-algorithm-for-determining-the-properties-of-an-embellished-operator the following entries are problematic: U+007C is in infixEntriesWithDefaultValues but also in ['prefixEntriesWithSpacing0AndStretchySymmetric', 'postfixEntriesWithSpacing0AndStretchySymmetric'] Indeed, the current algo works as if infixEntriesWithDefaultValues was empty. Probably we'd need an extra step in the algo to handle that. But if some of these are essential, it would be good to simplify a bit. I see it's important for U+007C and it's covered by mathml/presentation-markup/operators/mo-stretch-properties-001.html ; but AFAIK not for the others. |
… to one of the existing categories with a strong Python assertion. #104
I uploaded a new patch on https://chromium-review.googlesource.com/c/chromium/src/+/3419439 ; trying to fix the remaining failures: (1) mathml/presentation-markup/operators/mo-stretch-properties-001.html This one is testing U+007C in different (implicit) forms. As I mentioned on #104 (comment) the current algo of the spec is broken for operators that use the default values in the infix forms but that have different values in other forms. That's also true for U+2234, U+2235 and U+223C ; although I don't know if that's intentional for these three ones. If not, it would be better to avoid this situation. Otherwise, we should add WPT tests for them too I guess. I added an assert to the Python script In any case, since we definitely need that for U+007C, the spec should be changed to be able to handle these cases: https://w3c.github.io/mathml-core/#operator-dictionary https://w3c.github.io/mathml-core/#ref-for-dfn-algorithm-for-determining-the-properties-of-an-embellished-operator-1 (I'll also probably tweak a bit the spec to clarify the execution and results of the algorithm retrieving the properties from the dictionary.) (2) external/wpt/mathml/presentation-markup/operators/operator-dictionary-combining.html (3) external/wpt/mathml/presentation-markup/operators/operator-dictionary-spacing-001.html These failures happened because the multi char ".." was removed from (4) external/wpt/mathml/presentation-markup/operators/operator-dictionary-spacing-001.html (5) Internal MathOperatorDictionaryTest |
I think U+2234 (∴) and U+2235 (∵) are almost always prefix, so the infix versions can be dropped. I need to do a little research verify this though. U+223C(∼) is similar to the ASCII "~". Probably both are used in interchangeable manners. There really are both prefix and infix usages. I don't think one usages is overwhelmingly common, so it is important to keep both. Again, I need to do some research on usage to check this. |
I just looked at the algorithm and I don't understand why the entries you list are problematic. Step 4 is
Can you explain why there is a problem with those entries being both prefix and infix given that the algorithm says to look for them (the wording probably should be tweak to say "...in the following order: infix, prefix, postfix and ignoring the value found above")? "+" and "-" have similar properties. I think I must be missing what the point/problem there is with those entries. |
Discussed on the call today, resolved to close this issue adding a comment to open a new issue for Step 4, if anyone feels that's needed. |
Reopening since the work I mentioned is not complete.
One of the trick of the compact dictionary is to remove all entries that would use the default properties, since they are essentially redundant with the fallback. However, people don't always group things with So to work around that kind of issue, step 4 says we should try the different forms as a fallback. But now if an operator has an infix entry with default properties and another form with non-default properties, the non-default would now be used (since the infix one is not in the compact dictionary anymore). So to keep making this work, we need to tweak the spec with something like as explained above and that I implemented in the CL. The Python script checks operators having that issue and there are only four problematic operators: #104 (comment) Note that the infix forms of |
... and the algorithm for determining the properties of an embellished operator. There should be no behavior change. However section B.1 now describes two algorithms: - to determine the category of an operator - to set the properties of an operator from its category which are used by the algorithm to set the properties of an embellished operator. This makes it more convenient for implementations and will allow to work around the issue with multiple forms tried on operators that can be both in the Default category for the infix form and in a different category for a non-infix form. #104
I rewrote the section for the operator dictionary. This is the TODO list before the issue can be closed:
For the issue with U+007C, U+2234, U+2235, U+223C, there are several points raised:
|
I think this is probably in error it is \mathrel not \mathpunct in unicode-math for latex, and there is discussion and links to the relevant Z specifications at this stackexchange answer https://tex.stackexchange.com/a/315327 I propose we revert this to be symmetric 5 5 (@NSoiffer ?)
|
Correct. I've rewritten a bit the spec, but I'm still waiting the final decision before updating the values and adding the Forced Default category. These changes are also performed in the pending CL https://chromium-review.googlesource.com/c/chromium/src/+/3419439 |
@fred-wang thanks for the clarification. So to be specific I think we should make the following changes to unicode.xml
I think I'd propose not adding so the following change in the xml-entities repository
Then @fred-wang can handle the forced default changes assuming those implementation changes land? @NSoiffer does that look OK? |
@davidcarlisle So do is there any more change to decide for the op dict before we go ahead updating everything ? |
@fred-wang not as far as I know, but we have a WG call on Thursday, we could put it on the agenda to get sign off on this then? (@NSoiffer ) As you probably saw I re-normalised the priority values but that is purely mechanical and doesn't affect the data used by core. |
Looking at the z-notation colon
There aren't many examples of its usage, but here are two from the formal spec draft that seem to use relational operator spacing: I'm pretty sure at one point I spent a awhile on the concrete syntax to figure out a few of the Z notation values, but my brain doesn't seem to see now what I saw earlier. Given the examples, I agree with @davidcarlisle to go with relational op spacing. |
OK, I understand no more changes are planned, so will proceed to the update next week |
@fred-wang yes I thought I left a comment here after the working group meeting on Thursday, but apparently I didn't post it, sorry. |
OK thanks. I pushed 3b54b10 |
This was merged into the spec. |
In the past, I explained that there are strong requirements for browser implementations. In previous MathML versions, the operator dictionary was constantly changing and was not really covered by exhaustive non-regression testing... something not compatible with modern W3C spec practice. It was also huge and that's why I asked for more normalization & compactification before being able to upstream it to Chromium.
One year later, we are again in a state where the operator dictionary has changed and is out of sync with WPT tests and with what we upstreamed in Chromium. And the non-normative version is out-of-sync with the compact form too (see below).
We should ensure that the operator dictionary is stable and compact ; I really hope we can do it. If people disagree with that then I believe MathML Core should have its own compact & stable version for browser implemention & WPT, separate from the one used by MathML Full / xml-entities specs.
I'm proposing December 2021 for a freeze of the MathML Core dictionary. After that, I'll check again carefully the compactness of dictionary and update the WPT and browser implementations. Some things to notice:
Unicode 14 is intended to be released on September 14 2021. I really hope we don't add more entries though.
We should review the stretch orientation ( issue review operators stretching along the inline axis #102 )
9e11468 modified the assert to prevent unintentional addition of unknown non-BMP characters. Now we have many new supplemental arrows-C in the non-normative version that are not handled by https://w3c.github.io/mathml-core/#operator-dictionary ; do we really need to add all these arrows ? If so, we can add try to add yet-another-special-case but I'd prefer to avoid that.
The text was updated successfully, but these errors were encountered: