Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] A proposal for disentangling markup patterns from semantics for accessibility #217

Closed
brucemiller opened this issue May 24, 2020 · 25 comments
Labels
accessibility Issues related to improving accessibility intent Issues involving the proposed "intent" attr MathML 4 Issues affecting the MathML 4 specification

Comments

@brucemiller
Copy link
Contributor

Background

I would like to explore approaches to adding at least minimal semantic information to Presentation MathML, primarily for the purpose of accessibility. The obvious first idea is to encode the "meaning" of each symbol as an attribute on its token. That covers a lot of cases in a natural way, but begins to fail when we encounter the various purposes of sub/superscripts (eg. powers x^2, operator application like A^T, indexing) or constructs representing special notations such as binomial coefficients. In these cases there may be no (or many) tokens which deserve this meaning attribute, and it fails to capture the fact that the entire construct (eg. msup, mrow) is relevant.

It is tempting in such cases to assign the meaning at a higher level (Say put "transpose" on the msup instead of the T, or "binomial" on the mrow), but then we must conceive a large dictionary of meanings (eg. transpose, conjugate, adjoint, ...; binomial, legendre-symbol,...) along with their corresponding markup patterns. Each such markup pattern must encode which of the node descendants will also need to be translated. Moreover, we would have to distinguish different possible markup patterns associated with the same meaning (eg. transpose as superscript, transpose as function,...; different notations for binomial coefficients).

Proposal

I would like to explore here the feasibility of abstracting a (hopefully) small set of markup patterns, separately from the meanings, for distinguishing these cases; The presentation markup would thus be annotated with 2 attributes, which (for purposes of discussion) I'll call "meaning" and "composition". The set of composition keywords would need dictionary entries, with each making clear which children play the role of arguments. For many purposes, however, the set of possible meanings could be open-ended.

Simple example

Superscript seems to be used almost entirely for the purposes of; operator application; and (tensor) indexing. Take composition=power to indicate the arg2 power of arg1. So

<msup composition="power">
  <mi>x</mi>
  <mi>n</mi>
</msup>

can be read as "x to the power of n", or an agent may choose to examine the children for special cases, like "x squared".

An example of operator application might be:

<msup composition="sup-operator">
  <mi>A</mi>
  <mi meaning="transpose">T</mi>
</msup>

which could be read as "transpose of A". It also easily generalizes to meaning="conjugate", "adjoint" or even the not-yet-popular "Tralfamadorian inverse", without needing any additional dictionary entries.

Less simple example

A large set of notations have markup patterns like:

<mrow><open/><a/> [<punct/> <b/>]* <close/></mrow>
<msub><mrow><open/><a/> [<punct/> <b/>]* <close/></mrow><i/></msub>
<mrow><open/><mfrac><a/><b/></mfrac><close/></mrow>
<mrow><open/><mtable>[<mtr>[<a/>*]</mtr>]*<close/></mrow>

with various delimiters, punctuation, both with/without a visible dividing line, etc. These include common notations for binomial, Jacobi and Legendre symbols, Eulerian numbers, Pochhammer symbols, Clebsch-Gordon coefficients, 3j, 6j, etc symbols, distributions, vectors, matrices, determinants, inner-products, and so on. Of course, many of these also have other commonly used notations, so it would be a shame to lock each "semantic" to a single markup pattern.

Naming each of these patterns might simplify the task. For example, the common binomial notation might be represented by

<mrow composition="stacked-fenced" meaning="binomial">
  <mo>(</mo>
  <mfrac thickness="0pt">
    <mi>n</mi>
    <mi>m</mi>    
  </mfrac>
</mrow>

Presumably the internal representation of the stacked-fenced composition would make clear where the arguments are and that a decent default reading is "binomial of n and m". And of course and implementation is free to special case "binomial" to obtain the reading "n choose m". In any case, this pattern is trivially extended to 2D vectors, Jacobi symbols and Eulerian numbers.

Moreover, we haven't wedded binomials to a single notation, since we can still write:

<msubsup composition="base-operation">
  <mi meaning="binomial">C</mi>
  <mi>n</mi>
  <mi>m</mi>
</msubsup>

which could presumably end up with exactly the same readings as above.

Implementation note

A dictionary of the composition keywords would need to encode the paths to the children which act as arguments to whatever semantic is being applied, and provide some sort of template for output. This might be a simple pattern, possibly per-language as well as for Braille or other formats, along the lines of "the #2 power of #1" or "the #1 of #2" or similar. The appropriate translations of the children could simply be inserted into the pattern. Perhaps something fancier is needed?

The meaning attribute might well be open-ended, and read out as-is by default. Of course, that doesn't preclude recommending a standard set for common cases, nor does it preclude an implementation including a meaning dictionary to improve translations.

Summary

This idea has a lot of detail to be worked out, including at least

  • the set (or mini-language) for compositions
  • the set of MathML container elements which give rise to such constructs
    and need corresponding composition keywords
    But before going down that road, it would be good to get general reaction and feedback.
@NSoiffer NSoiffer added accessibility Issues related to improving accessibility MathML 4 Issues affecting the MathML 4 specification labels May 26, 2020
@davidfarmer
Copy link
Contributor

Here is simultaneously a report on notation used in two
different calculus texts (spoiler: there are very few
differences) and some example notation for testing any
proposed semantic markup.

The two books are:
Active Calculus by Matt Boelkins
APEX Calculus by Greg Hartman

I will write everything in LaTeX notation.

APEX writes vectors as italic letters with a arrow above
ACTIVE writes vectors as bold face (non-italic) letters

|x| APEX: absolute value of x
Active: absolute value of x,
or magnitude of the vector x

a^b exponent

\left( \frac{a}{b} \right) a fraction in parentheses
(Other contexts: Legendre symbol)

(a, b) could mean the open interval a < x < b,
or a point in the Cartesian plane

[a, b] the closed interval a \le x \le b
(In other subjects it could be the commutator of a and b,
or the Lie bracket of a and b.)

a \cdot b single variable calculus: ordinary multiplication
mutlivariable calculus: dot product

a \times b single variable calculus: ordinary multiplication
mutlivariable calculus: usually cross product,
but occasionally "by" as in 2 \times 2 matrix

\Delta y change in y
(In other subjects, the Laplacian.)

\overline{x}_i midpoint of the ith interval

\overline{PQ} line segment from P to Q

\overrightarrow{PQ} vector from point P to point Q

{ a_n } the sequence a_n

P(a, b) the point with polar coordinates r=a and \theta=b

\lVert x \rVert the length of the segment x
(looks like ||x||, with the pairs of vertical lines
close together)

a \parallel b vector a is parallel to vector b
(In other contexts: a exactly divides b)

\langle a, b \rangle the vector with coordinates a, b

[ a ] the 1\by1 matrix with entry a

@NSoiffer
Copy link
Contributor

@davidfarmer: were the only two differences between the books the first two you mention? The rest are calc notations used by both?

@brucemiller
Copy link
Contributor Author

brucemiller commented Jun 1, 2020

I've moved this material to a github page at
https://mathml-refresh.github.io/mathml/docs/layout-semantics
for further development.

@davidcarlisle
Copy link
Collaborator

I've moved this material to a github page at
https://github.com/mathml-refresh/mathml/blob/gh-pages/docs/layout-semantics.md
for further development.

which appears as

https://mathml-refresh.github.io/mathml/docs/layout-semantics

in github pages.

@davidcarlisle
Copy link
Collaborator

Note @NSoiffer has a modified version of the @brucemiller 's proposal at

https://mathml-refresh.github.io/mathml/docs/function-semantics

Neil's version has the advantage of not requiring a fixed enumeration of notation= layout schema. However I do not think we should rely so heavily on counting of child elements. If we do count it should be 1-based not 0-based (both xpath and CSS selectors are 1 based) but also once you get beyond basic child elements it means specifying a third "competing" query construct for DOM trees
and I think it would be better to avoid that.

Exposing the element nesting (@1@0 ) is tricky as it means that you may have to remove (or expose by counting) redundant <mrow> . Previously an mrow that wraps a single child
was always allowed and had no effect when generating presentation mathml (eg from Content) it is very natural to eg generate

<mfrac>
<mrow> recursively transfrom1 </mrow>
<mrow> recursively transfrom1 </mrow>
</mfrac>

This ensures the mfrac is well-formed but if the recursive transformations just produce <mn>1</nm> and <mn>2</nm> then the <mrow> are redundant, depending on the transformation technology it isn't always convenient to do a second pass to remove them.

I think it may be possible to combine the two proposals with the following (not fully baked) form.

notation as in Bruce's proposal.
The possible values would be fixed, possibly the list as in Bruce's document, or possibly a shorter list, but extended with a new possibility of overriding the default determination of the arguments, so the notation forms can be used with any layout.

Each notation/layout schema would specify the default position of the main operator and arguments.

these could be over-written with operator and arg attributes.

operator takes an operator name, and is essentially a renamed meaning from Bruce's proposal.

arg takes an integer value, if it is used then the collection of descendant arg attributes (ignoring any in nested notation subtrees) should produce the integer range 1 ....n for some n, and specify the arguments of the operator.

So binomial might be

<msubsup notation="operator-args">
  <mi operator="binomial">C</mi>
  <mi>m</mi>
  <mi>n</mi>
</msubsup>

or for a reversed convention

<msubsup notation="operator-args">
  <mi operator="binomial">C</mi>
  <mi arg="2">m</mi>
  <mi arg="1">n</mi>
</msubsup>

sub and sub forms (and mroot and mfrac and 2-child mrow) could I think be combined, basically in Bruce's document notation=sup is for a two-argument operation, power or specified elsewhere and sup-operator being a 1-argument operation specified by the second child. So here rename to
args and operator-args

By default (unless over-ridden by arg= attributes) the arguments of the operator consist of the children of the element, in order, but ignoring <mo>, <mspace>

so transpose

<msup notation="operator-args">
  <mi>A</mi>
  <mi operator="transpose">T</mn>
</msup>

power

<msup notation="args" > <!-- operator="power" -->
  <mi>x</mi>
  <mi>n</mi>
</msup>

factorial

<mrow notation="operator-args">
  <mi>a</mi>
  <mo operator="factorial">!</mo>
</mrow>

for an element with notation=args the operator should be specified on the element (rather than on a child) with default values power on <msup>, division on mfrac , root on ```

the nth derivative case could be marked as

<msup notation="args" operator="derivative-implicit-variable">
  <mi>f</mi>
  <mrow>
    <mo>(</mo>
    <mi>n</mi>
    <mo>)</mo>
  </mrow>
</msup>

with meaning derivative(f,(n)) with the parens around n being taken as part of the value,

but, especially for notations using more fancy decoration you could use the arg attributes to explictly ignore the syntax decoration:

<msup notation="args" operator="derivative-implicit-variable">
  <mi arg="1">f</mi>
  <mrow>
    <mo>(</mo>
    <mi arg="2">n</mi>
    <mo>)</mo>
  </mrow>
</msup>

with meaning derivative(f,n) with the parens around n being ignored,

For binary infix, eg dot product this leads to closer to Bruce's form (no @ counting)

<mrow notation="infix">
  <mi mathvariant="bold">a</mi>
  <mo operator="inner-product>&#x22C5;</mo>
  <mi mathvariant="bold">b</mi>
</mrow>

Arguably infix isn't needed here and you could use operator-args still, but maybe that's a simplification too far.

But I think we should distinguish the multiple operator case where the operators are being combined (with missing mrow) using implied precedence or associativity rules from the case where it is really an n-ary operator like (plus... just being written as repeated infix by convention. Something like

<mrow notation="infix" operator="plus">
  <mi>a</mi>
  <mo>+</mo>
  <mi>b</mi>
  <mo>-</mo>
  <mi>c</mi>
  <mo>+</mo>
  <mi>d</mi>
</mrow>

with meaning (plus a b (minus c) d)

and

<mrow notation="infix">
  <mi>a</mi>
  <mo operator="plus">+</mo>
  <mi>b</mi>
  <mo operator="minus">-</mo>
  <mi>c</mi>
  <mo operator="plus">+</mo>
  <mi>d</mi>
</mrow>

with meaning a+b-c+d with some implicit disambiguation rules (👋👋)

Intervals would just need notation=args operator=open interval eg not notation="open-interval(@start, @end)"

<mrow notation="args" operator="open-interval">
  <mo>]</mo>
  <mi>a</mi>
  <mo>,</mo>
  <mi>b</mi>
  <mo>[</mo>
</msup>

as the ],[ would mo would be ignored by the default rule for the args layout,

Pochammer could use a named fenced-sub layout as in Bruce's document (with meaning= replaced by operator= , but if this layout is not thought sufficiently common, could use the use args but would need arg= attributes to get inside the mrow so

<msup notation="args" operator="Pochhammer">
  <mrow>
    <mo>(</mo>
    <mi arg="1">a</mi>
    <mo>)</mo>
  </mrow>
  <mi arg="2">n</mi>
</msup>

In fact I now realise there is no need to have a general scheme at all, the args and operator-args cover the general case if you add one operator= attribute and enough arg= attributes to specify the mapping to prefix apply form. (at this point I re-wrote most of the above:-)

@NSoiffer
Copy link
Contributor

NSoiffer commented Jun 21, 2020 via email

@NSoiffer
Copy link
Contributor

NSoiffer commented Jun 21, 2020 via email

@brucemiller
Copy link
Contributor Author

Some preliminary thoughts on @davidcarlisle's suggestions; There's a lot to like ---- and dislike here.

I think rather than replace "meaning" by "operator", you'd want to keep "meaning" (by whatever name), and let arg="0" designate the operator. For one thing, you need to be able to assert that an entire subtree has a given "meaning". Secondly, the operator often will be more than just a name, it may be a tree itself, embellished, with arguments, whatever.

I'm concerned about whether the presence of a "notation" attribute properly scopes the operator/arg attributes; in a complexly nested expression, will it be completely clear which operator/arg belong to which notation? (perhaps)

It's kinda cool having a generic notation (although I'm puzzled by the term "args"), so that you don't have to use any other notation keywords, and conversely that you can override one or all of a notation's default positions. But this basically puts a lot of work on the agent consuming this: For every node with a notation, it has to search all children for (in scope) operator and arg attributes, which presumably wouldn't be present very often.

@NSoiffer
Copy link
Contributor

NSoiffer commented Jun 22, 2020 via email

@davidcarlisle
Copy link
Collaborator

@brucemiller its a bit of a stretch to call mine a "proposal", I deliberately wrote it as a comment here rather than a new draft do or a PR on one of the two existing docs as it was supposed to be just a comment, but it got long, and then I copied in code examples from the existing drafts and it got longer and crucially I completely changed my idea half way through writing the comment, as I realised the general form was possibly not too bad so could replace many of your named forms rather than just being an additional one for special cases.

So naming and all details are not fully baked.

I think the one line version of my comment would be:

I don't think we should introduce a new selector syntax and I don't think we should use numeric element counting, but I do like from @NSoiffer 's proposal that the mechanism is open ended and doesn't involve enumerating so many layout schemes, so I was trying to do a merge of the two.....

@brucemiller
Copy link
Contributor Author

I suppose I can point to a prelimary https://mathml-refresh.github.io/mathml/docs/semantics-mini,
although we're still working out the kinks and adding examples.

@NSoiffer
Copy link
Contributor

NSoiffer commented Jun 23, 2020 via email

@dginev
Copy link
Contributor

dginev commented Jun 23, 2020

Quick comment: I see the suggestion has evolved to contain the named arguments also in the notation root. If you next rename the attribute "arg" to "id", and avoid value clashes globally, you arrive at the id-based part of our selector draft. That said, the current intuition of the "arg" approach is to give the same argument of the same notation the same value, so you are also really close to the HTML "class" attribute in function. class="arg-from" , class="arg-to" definitely look sensible even as pure web-development hooks.

But most importantly, the moment you have annotations at both the root of a notation and the argument nodes, you are functionally equivalent to our id-pointing scheme. And that seems to be getting closer to a consensus position?

As to our selector approach, the rest of the child-counting-selection was needed to 1) stay open-ended while 2) still provide a vocabulary of standard notation names. If you want to include e.g. a standard "binomial" in the specification, which assumes reasonable (unannotated) children to include as arguments, you end up having to formalize that relationship. Which, as ugly as it is, looks like a descendant-counting-path selector in the general case. So once you concede that you're introducing names with fixed expectations for positional arguments, might as well expose that capability to document authors, so that they also cover currently out-of-scope syntax.

@brucemiller
Copy link
Contributor Author

@NSoiffer: I've somewhat addressed the issue you raised, and changed a few examples to use ids rather than paths. Feel free to point the discussion group to it.

@NSoiffer
Copy link
Contributor

NSoiffer commented Jun 24, 2020 via email

@samdooley
Copy link
Contributor

samdooley commented Jun 25, 2020 via email

@davidcarlisle
Copy link
Collaborator

davidcarlisle commented Jun 25, 2020 via email

@dginev
Copy link
Contributor

dginev commented Jul 23, 2020

I still find myself today wishing that we minimize the learning curve, and domain-specific novelties we introduce, in the annotation scheme. The simple references and function-call style annotations achieve that nicely, since #op(#1,#2,#3) is a form anyone can quickly learn and use. Meanwhile, David's statement of:

if you know you are using stacked-fenced

goes in the other direction. It assumes annotators will spend a bit of time training themselves into spotting the various notation patterns, from a list of pre-defined patterns the specification offers, and then use them judiciously. That is certainly going to add difficulty to becoming an annotator.

Conversely, given a presentation MathML tree, annotated with this minimal operator structure annotation #op(#1,#2), it is easy to do a tree walk which determines the notation which has been used. Non-exhaustive examples:

notation child pattern required parent
prefix siblings annotated: #op #1 mrow
postfix siblings annotated: #1 #op mrow
binary infix siblings annotated: #1 #op #2 mrow
n-ary infix sibling annotated #1 #op #2 [unannotated sibling(s)] #3 [unannotated sibling(s)] ... mrow
n-ary prefix sibling annotated #op #1 [unannotated sibling(s)] #2 [unannotated sibling(s)] #3 [unannotated sibling(s)] ... mrow
n-ary postfix sibling annotated #1 [unannotated sibling(s)] #2 [unannotated sibling(s)] #3 [unannotated sibling(s)] ... #op mrow
scripted operator siblings annotated #1 #op msup, msub
scripted implied op siblings annotated #1 #2 msup or msub, semantic someliteral(#1,#2)
full scripted all siblings have arg annotations msubsup
fenced unannotated open/close fence as first/last child mrow
stacked fenced siblings <mo>(</mo> <mfrac>...</mfrac> <mo>)</mo> mrow
piecewise siblings <mo>{</mo><mtable>...</mtable> mrow
atom semantic attribute only contains a literal any
...

Where all references such as #op, #1, #2 are used for the sake of example. I am quite open to leaving those open-ended for the convenience of the annotator (#bvar, #denominator etc.), or since Neil expressed a preference to simplify further - to only permit #op and consecutive natural numbers as values. That part should be workable either way.

My main intuition here is that it is a lot easier for an implementer who uses the spec to infer the patterns from the tree, than for an annotator to both predict the final MathML tree (they could be authoring in TeX or Word, using a transformer to MathML), and then to learn the domain-specific terminology needed to annotate it. Worst of all, we can not realistically expect to capture all notations in the specification, as they are open-ended and too many. I added a piecewise row in my examples to throw in something we've never talked about but we all knew about, and that's a smaller set than the notations we don't even know about yet.

Edit: I missed mentioning another pragmatic point. Since a11y software developers can not expect to have all MathML their users consume to be perfectly annotated, it is realistic that they will retain code which will process classic presentation MathML, with no annotations. That code will have the same task of inferring the notation and its fixity, but would have to do more guessing as it doesn't in advance know the (semantic) operator tree. So I can imagine a generalized software component that can extract a "notation-used" via a pMML tree walk, with and without semantic attribute assistance. Changes would be of the kind: "we know this node is the operator" vs "can this node be the operator?".

@davidcarlisle
Copy link
Collaborator

davidcarlisle commented Jul 23, 2020 via email

@NSoiffer
Copy link
Contributor

NSoiffer commented Jul 28, 2020 via email

@brucemiller
Copy link
Contributor Author

Simple is good! But if a syntaxy semantic cannot be avoided in many cases, it's probably better not to switch back and forth between a syntax-free semantic and a syntax based one.

JSON is a nice way to provide a block of (quasi)structured data as a separate file or <script> block in an HTML. Connecting that to the MathML elsewhere in the document would require ids or something similar, which seems not to be the favored approach.

I'm not to clear on what @+ is meant to imply in your example semantic = "function-apply(@f, @+(@A, times(@2, @b)))"; are you assuming <mo arg="+">+</mo> within the mrow?

While we can, and perhaps should, still encourage proper mrow structure, I think we'll limit usefulness if we require a very specific mrow structure. There may be other reasons besides laziness of remediators why a given structure is desirable that may not match the accessibility requirements.

@dginev
Copy link
Contributor

dginev commented Jul 28, 2020

FWIW, and I am only reminding of this for the sake of technical completeness (not my preference, and potentially eyebrow-raising), the MathML spec conceptually allows a JSON annotation for an individual formula via something akin to: <annotation encoding="application/json"> within a <semantics> parent. You could even host it externally, and link to it via src.

But I also think Neil's JSON point was something different. If I read it right, he's suggesting an alternative syntax for the value of semantic, say:

<mrow semantic='{ "function-apply": ["@f", {"@+": ["@A", {"times": ["@2", "@b"]}]} ] }'>
  • pros: no need for custom parsing, JSON is ubiquitous on all potential platforms. We still may need a "JSON Schema" definition to enumerate what we expect though.
  • cons: harder to read and write by humans compared to the functional-style syntax. I had two validation errors when quickly writing this by hand (balancing the trailing ]}]} ] } was hard). Remediators will suffer.

@NSoiffer
Copy link
Contributor

NSoiffer commented Jul 29, 2020 via email

@davidcarlisle davidcarlisle added the intent Issues involving the proposed "intent" attr label Jul 15, 2022
@brucemiller
Copy link
Contributor Author

This discussion seemed to have served its purpose as it gradually led to a more concrete (if perhaps not yet perfect) proposal. The proposal in this issue is several levels too "Meta" to be pursued :>

So, shall we close?

@davidcarlisle
Copy link
Collaborator

closing looks good to me @brucemiller

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accessibility Issues related to improving accessibility intent Issues involving the proposed "intent" attr MathML 4 Issues affecting the MathML 4 specification
Projects
None yet
Development

No branches or pull requests

6 participants