Skip to content
This repository has been archived by the owner on Jul 5, 2023. It is now read-only.

Equation Inspection/Editing #11

Open
mjakeman opened this issue Oct 29, 2021 · 17 comments
Open

Equation Inspection/Editing #11

mjakeman opened this issue Oct 29, 2021 · 17 comments
Assignees
Labels
enhancement New feature or request

Comments

@mjakeman
Copy link
Owner

Implement some form of WYSIWYG equation editing with Lasem.

A good first step would be to support picking - e.g. the user can hover over an element within the DOM and the LsmMathmlView is able to resolve the mouse coordinates to a specific element or subtree.

Following steps might include:

  • Maths cursor which navigates through a 'flattened' tree
  • Placeholder elements, to indicate no content e.g. in the top of a fraction
  • Selection of one or more elements
    • What happens if we select two elements from different subtrees?
  • The actual insertion of elements (via hotkeys, etc) should be left to consumers
  • Should expose this as toolkit-independent API (no dependency on GTK)
@mjakeman mjakeman added the enhancement New feature or request label Oct 29, 2021
@mjakeman mjakeman self-assigned this Oct 29, 2021
@mjakeman
Copy link
Owner Author

mjakeman commented Nov 1, 2021

From the MathML 3 standard, we can use <mi></mi> to represent a blank term (i.e. needing to be filled). Refer to https://www.w3.org/TR/MathML3/chapter3.html#id.3.2.3.3.

An mi element with no content is allowed; <mi></mi> might, for example, be used by an "expression editor" to represent a location in a MathML expression which requires a "term" (according to conventional syntax for mathematics) but does not yet contain one.

@mjakeman
Copy link
Owner Author

mjakeman commented Nov 1, 2021

Some notes on how cursor navigation could work:

  • MathML content is stored in a tree structure
  • Lasem implements a DOM, so navigation should be structured around this
  • Navigation options might include:
    • DFS-like navigation - explore each subtree fully using left/right arrows (currently implemented in Cursor Experiment #14)
    • Leaf navigation - cursor moves between leaf nodes of the DOM. This is problematic for a few reasons as it would prevent inserting between two fractions, for example.
    • Parent/Child navigation - similar to the DFS approach, except left/right arrow keys navigate siblings and up/down keys navigate parent/child elements. Most reliable but not so intuitive?
    • XY Navigation - Cursor navigates to the nearest "insertion point" in either the X or Y directions. Plays nicely with mouse selection. Could implement something similar to UWP's XYFocusNavigationStrategy (probably the RectilinearDistance approach).
  • Preferred option by far is XY Navigation, although this won't be too easy to implement.
    • The problem with XY navigation is it can render some elements unreachable.
    • Using distance as the selection criteria works for table-esque layouts but might be problematic when dealing with nested fractions, etc
    • Alternative approach is to use a box-based approach. Each element accepts a direction, and resolves the closest element within itself in that direction (e.g. fraction consumes up/down to navigate between top and bottom child). If it cannot resolve, it recursively asks the parent to resolve.
  • Further questions:
    • How do we define an "insertion point"?
    • If using XY Navigation, should Parent/Child navigation be included as a fallback?
    • Should this be part of lasem or split off into eqx?

@stefnotch
Copy link

stefnotch commented Dec 6, 2021

I'd recommend looking at how other editors handle this, mathlive definitely handles a lot of tricky questions rather nicely.

Also, it would be quite possible to have a flattened tree purely for navigating. See https://github.com/jacobp100/technicalc-core/tree/master/packages/technicalc-editor

The alternative, box-based approach is definitely an interesting one, I'd very much like to see an editor that implements that.

@stefnotch
Copy link

Alternative approach is to use a box-based approach. Each element accepts a direction, and resolves the closest element within itself in that direction (e.g. fraction consumes up/down to navigate between top and bottom child). If it cannot resolve, it recursively asks the parent to resolve.

Out of sheer curiosity, I tried out this approach. So, if I understood it correctly, the algorithm would be

  1. Have a current-element
  2. Get the parent
  3. Check if there is a current-element sibling in that direction. We have to use some special rules here (depending on the parent). For example, fractions support moving "up" and "down"
  4. If we found a sibling, we travel down the tree until we find a text element. Done.
  5. If there was no sibling, we set the parent to be the current-element and repeat.

For simplicity, this description ignores the part about trying to navigate inside the current element, if it's a bottom-most element and there is text in it.

@stefnotch
Copy link

After some fiddling, I ended up implementing this (Try it out with Firefox. Chrome doesn't support MathML yet)
https://jsfiddle.net/5k6y3apt/

It does not implement the "special rules here (depending on the parent)" part. So moving up in a fraction doesn't work with it, but could be added rather easily.
Interestingly MathML is laid out in such a way that editing works surprisingly ok. Like, implementing those primitive rules automatically gets you the 1_2^3 behavior, where you first go and edit the element, then the subscript and then the superscript.

I also quite recommend putting some bigger and trickier cases into the jsfiddle.

@stefnotch
Copy link

After some more fiddling and trying it out, I realized that there is at least one case that totally breaks the way I represent my cursor/caret.
If one types
image
then the MathML looks as follows. The tricky part here is: How do I represent the cursor being on top of the fraction, but outside of the square root?

<mfrac>
  <msqrt>
    <mi> π </mi>
  </msqrt>
  <mn>6</mn>
</mfrac>

Here are all caret locations that should be for that expression
image

@mjakeman
Copy link
Owner Author

Hi, thanks for the interest!

In the time since I wrote that comment, I've had the chance to prototype some of the other approaches.

To summarise my findings:

  • DFS-like search: See Cursor Experiment #14. We can probably rule this out, since it's not intuitive at all to work with.
  • XY Navigation: I've created a simple demo of this (See XY Cursor Navigation Experiment #15), albeit using the mouse cursor rather than arrow keys. It's very hit-or-miss in the sense that the closest caret location (here called an "insertion point") is mathematically closest but not at all what the user would expect to be selected. e.g. In a lopsided fraction where the denominator's contents is much larger than the numerator's, an insertion point in the bottom half of the fraction might be "closest" even when the mouse cursor is above the fraction line.

Your prototype of the box-based approach is really interesting. At a high level, it works surprisingly well using just the basic set of rules. I suppose it's quite close to what I had in mind for "Leaf Navigation", where we'd take all the leaf nodes (e.g. caret locations) and lay them out linearly. Left/Right would simply go back or forward one index, while Up/Down might do something more complex involving parent/child nodes.

After some more fiddling and trying it out, I realized that there is at least one case that totally breaks the way I represent my cursor/caret.

This is the exact same problem I had with all the methods I've tried so far.

My current thinking is to have elements define their own caret locations, and have some rules defining how and when caret locations should be "collapsed". We could start by giving every non-layout MML element two caret locations, at the start and the end:

image

And say that for every element which has a left sibling within an mrow, we collapse its starting caret location. That already simplifies the above equation into:

image

Which gives us the caret locations we want. This is easily extendable to give intermediate caret positions for e.g. text.

I'll have a go at implementing this in Lasem and see how viable it is. I think it could be a reasonable starting point paired with your cursor navigation approach from the jsfiddle.

@mjakeman
Copy link
Owner Author

So as it turns out, this approach works remarkably well:

Lasem.Cursor.Demo.mp4

Fractions also have an ending caret location, just not shown in the video.

Presentation aside, the cursor largely moves where it should in response to Left/Right arrow keys. Adding special casing for fractions and other containers (tables/matrices/etc) shouldn't be too difficult.

My current ruleset is dead simple:

  • Add starting caret location if:
    • Not a root (math) element
    • Not a row element
    • Not a script element (collapsing in favour of the child element's caret location)
  • Add an ending caret location if:
    • Not a root element
    • Not a row element
    • Does not have a sibling to the right when parent is a row element

@stefnotch
Copy link

Hi, thanks for the interest!

In the time since I wrote that comment, I've had the chance to prototype some of the other approaches.

To summarise my findings:

* **DFS-like search:** See [Cursor Experiment #14](https://github.com/mjakeman/lasem/pull/14). We can probably rule this out, since it's not intuitive at all to work with.

* **XY Navigation:** I've created a simple demo of this (See [XY Cursor Navigation Experiment #15](https://github.com/mjakeman/lasem/pull/15)), albeit using the mouse cursor rather than arrow keys. It's very hit-or-miss in the sense that the closest caret location (here called an "insertion point") is mathematically closest but not at all what the user would expect to be selected. e.g. In a lopsided fraction where the denominator's contents is much larger than the numerator's, an insertion point in the bottom half of the fraction might be "closest" even when the mouse cursor is above the fraction line.

Sweet, glad to see that you've gone ahead and built prototypes of them. Now we know for sure which approaches are good and which ones don't quite work out.

So as it turns out, this approach works remarkably well:
Lasem.Cursor.Demo.mp4

Fractions also have an ending caret location, just not shown in the video.

Presentation aside, the cursor largely moves where it should in response to Left/Right arrow keys. Adding special casing for fractions and other containers (tables/matrices/etc) shouldn't be too difficult.

My current ruleset is dead simple:

* Add starting caret location if:
  
  * Not a root (math) element
  * Not a row element
  * Not a script element (collapsing in favour of the child element's caret location)

* Add an ending caret location if:
  
  * Not a root element
  * Not a row element
  * Does not have a sibling to the right when parent is a row element

I just attempted to implement that approach, the shoddy code is over here. It works beautifully well!
However, I opted for a slightly different ruleset, where every element gets visited in order and sets a "skipNext" boolean flag. That basically says "skip the next opening caret". Here is an example of what it ends up generating
image
Downside of my approach is that I have to pre-generate all valid caret positions for any given element before being able to move around with the caret.

@stefnotch
Copy link

stefnotch commented Dec 23, 2021

After some further testing, I've come to the conclusion that your ruleset can elegantly handle pretty much anything. The only one that needs to be extended a bit is

  • Does not have a sibling to the right when parent is a row element

There are at least 3 slightly special cases:

  1. The parent can apparently be a math element instead of a row element. That should be handled the same way.
  2. 1+2^3 should have only one caret between the + and the 2. Which means we need some special handling for msup and co.
  3. Elements, such as mrow and msup can be nested. Note how there are two different ways of nesting them: a^(b^c) and (a^b)^c
<math>
      <mo>-</mo>
      <msup><mn>2</mn><mn>3</mn></msup>
      <mo>+</mo>
      <msup>
        <mn>2</mn>
        <msup><mn>4</mn><mn>8</mn></msup>
      </msup>
      <mo>*</mo>
      <msup>
        <msup><mn>2</mn><mn>3</mn></msup>
        <mn>3</mn>
      </msup>
    </math>
<math>
  <mrow>
    <mo>-</mo>
    <mrow>
      <mrow><mn>2</mn></mrow>
    </mrow>
  </mrow>
</math>

image

@stefnotch
Copy link

stefnotch commented Dec 23, 2021

After thinking a bit about the special cases, I've settled for the following solution.
Essentially, the idea is to change the rule to "Does not have a visible sibling to the right when parent is a row/math/msup/... element. If there is none, recursively go up the tree".

@stefnotch
Copy link

I fiddled around more with this and now have to ask: Is the plan to build a MathML editor or an equation editor? And depending on the answer, how should one deal with cases such as

  • empty mrow <mrow></mrow>
  • msub with too few elements <msub><mn>1</mn></msub>
  • mpadded
  • typing anything when the cursor is in front of an msub
    image
    could turn into either of those, in one case the result ends up being inside the msub, in the other case it very much does not
    image
    image
  • etc.

If one wants to simply build a equation editor, it probably means that the core focus is on the mathematical editing experience and not on being able to edit every single possible MathML tree. And then, the approach outlined below might be interesting:
https://github.com/jacobp100/technicalc-core/tree/master/packages/technicalc-editor

@mjakeman
Copy link
Owner Author

I fiddled around more with this and now have to ask: Is the plan to build a MathML editor or an equation editor?

That's a very good question actually. I'm mainly interested in supporting inline equation editing within rich text for this word processor that I'll hopefully finish someday. I've chosen ODT as the primary format, which uses MathML 2.0 for storing equation data.

My primary goal is to seamlessly integrate caret positions between the text and equation a bit like how Microsoft Word handles things, rather than the equation being a separate object within the text content. So the focus is definitely on the core editing experience rather than handling arbitrary MathML.

(I'd also like to build a standalone editor, so I'm aiming to reuse as much as possible between the two)

If one wants to simply build a equation editor, it probably means that the core focus is on the mathematical editing experience and not on being able to edit every single possible MathML tree. And then, the approach outlined below might be interesting:
https://github.com/jacobp100/technicalc-core/tree/master/packages/technicalc-editor

I've been leaning very heavily towards going down this route. While MathML works at the moment for simple equations, I can see it easily becoming a limitation on the editor in the future.

I think I'll explore the approach you've linked. Does the AST replace the MathML DOM or work parallel to it?

@stefnotch
Copy link

I fiddled around more with this and now have to ask: Is the plan to build a MathML editor or an equation editor?

That's a very good question actually. I'm mainly interested in supporting inline equation editing within rich text for this word processor that I'll hopefully finish someday. I've chosen ODT as the primary format, which uses MathML 2.0 for storing equation data.

Sounds like a solid plan then! I wonder how different MathML 2.0 is from MathML Core. The latter definitely has a nice and simple specification.

If one wants to simply build a equation editor, it probably means that the core focus is on the mathematical editing experience and not on being able to edit every single possible MathML tree. And then, the approach outlined below might be interesting:
https://github.com/jacobp100/technicalc-core/tree/master/packages/technicalc-editor

I've been leaning very heavily towards going down this route. While MathML works at the moment for simple equations, I can see it easily becoming a limitation on the editor in the future.

I think I'll explore the approach you've linked. Does the AST replace the MathML DOM or work parallel to it?

The AST approach works by taking the MathML, turning it into an AST and doing all operations on it. And to render it, you turn it back into MathML. So, I'd say it's more of a replacement.
I'll also note that there are a few different approaches regarding how to represent the AST. I'm trying out a slightly less-flat variation of it.

@stefnotch
Copy link

stefnotch commented Jan 21, 2022

I have a quick update regarding the AST approach, now that I've finally implemented it and have basic caret movement working.

I settled on the following concept:

  1. A type for "rows" where the caret can be placed.
  2. A type for longer "strings" where the caret can be placed. (mstring and merror)
  3. Container elements, which contain rows. (e.g. fraction, square root, superscripts)
  4. Symbols, as in every digit, variable, letter, etc.

And then, stuff like "a caret can only be placed in a row or a string" can be exploited to make the code much easier to reason about. I also made it so that directly nested rows are not allowed. Furthermore, every container element always contains a fixed number of rows and nothing else.

video.mp4

Here is my MathIR for the equation above

{
  "type": "row",
  "values": [
    {
      "type": "root",
      "values": [
        {
          "type": "row",
          "values": [ {"type": "symbol", "value": "2"} ]
        },
        {
          "type": "row",
          "values": [
            {
              "type": "root",
              "values": [
                {
                  "type": "row",
                  "values": [ {"type": "symbol", "value": "2"} ]
                },
                {
                  "type": "row",
                  "values": [
                    { "type": "symbol", "value": "x" },
                    { "type": "symbol", "value": "x" },
                    { "type": "symbol", "value": "x" }
                  ]
                }
              ]
            },
            {"type": "symbol", "value": "x"}
          ]
        }
      ]
    },
    {
      "type": "frac",
      "values": [
        {
          "type": "row",
          "values": [ {"type": "symbol", "value": "x"} ]
        },
        {
          "type": "row",
          "values": [ {"type": "symbol", "value": "x"} ]
        }
      ]
    }
  ]
}

@mjakeman
Copy link
Owner Author

mjakeman commented Feb 1, 2022

Looks very promising!

Coming back with fresh eyes (after taking a short break from text), I've largely come to the same conclusion. I've found that using a DOM-like structure for storing documents is not well suited for actual text manipulation. I think a similar case applies with equation rendering here; MathML makes sense for storing interoperable maths data, but using it for an editor is a "square peg round hole" sort of issue.

And then, stuff like "a caret can only be placed in a row or a string" can be exploited to make the code much easier to reason about. I also made it so that directly nested rows are not allowed. Furthermore, every container element always contains a fixed number of rows and nothing else.

I think these simplifications make a lot of sense. A problem I had with the 'mathml editor' is the scope creep of wanting to support the entire MathML standard. Whereas an equation editor can convert this to a more convenient internal format, like you've done here.

With my rich text editor, I'm leaning towards adopting something based on quill's parchment or maybe a custom AST. Continuing on from that, I'm looking into representing equations in a compatible format and how this could work with undo/redo, which is always a tricky topic.

@stefnotch
Copy link

This makes sense, I never realized how much the DOM was designed for layouting and other non-editing purposes. Using a custom AST or quill's parchment sounds like a solid approach.

One question for a math editor is: How flexible should it be? And how smart?
For instance, if the user attempts to input AB ⊥ CD, then the ideal result would be

        <mi>A</mi>
        <mi>B</mi>
        <mo>⊥</mo>
        <mi>A</mi>
        <mi>C</mi>

Notice how the is an <mo>. It means "perpendicular to"

However, if the user inputs A∧⊥, the the result should be

        <mi>A</mi>
        <mo>∧</mo>
        <mi>⊥</mi>

In this case, the is an <mi>. It means "true".

Should the editor automatically figure this out? Or should it be up to the user?

Finally, the usual way I've seen undo/redo being done is by never modifying the datastructure directly. Instead, one generates commands that both have a "redo" and an "undo" callback. And then one makes sure to implement those correctly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants