Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MathML Support #161

Closed
m-a-t opened this issue Dec 21, 2017 · 14 comments
Closed

MathML Support #161

m-a-t opened this issue Dec 21, 2017 · 14 comments

Comments

@m-a-t
Copy link

m-a-t commented Dec 21, 2017

MathML seems to be the technology how to embed formulas in web pages.

Consider adding support in openhtmltopdf.

There is a project based on flyingsaucer which adds this support: https://mvnrepository.com/artifact/com.github.rjolly/flying-saucer/9.1.1
(However, GPL licensed) Maybe this rjolly can contribute it to openhtmltopdf

@rjolly
Copy link

rjolly commented Dec 23, 2017

Hi mat,

I would be pleased to do so, however there are some caveats. Regarding the license, it is not an issue. I choose GPL because my fork is mainly a browser and not a library. But I do not even know if I have the right to do this, as flyingsaucer is LGPL. So do not hesitate do borrow anything from my project. The commit relevant to MathML addition is rjolly/flying-saucer@f4c4a9d

However, at the moment, it works only for screen rendering, and not PDF. The reason is, that I am using JEuclid, which itself uses FOP, whereas flying-saucer uses iText. So having it work in openhtmltopdf/PDF-Box is a little more work than it seems I think.

@rototor
Copy link
Contributor

rototor commented Dec 24, 2017

@rjolly Only having "screen" rendering, i.e. drawing to a Graphics2D is perfectly fine. This is what PDFBox-Graphics2D is for. Even integrating JEuclid into openhtmltopdf for a quick test is easy using object drawers #78:

        objectDrawerFactory.registerDrawer("custom/mathml", { e, x, y, width, height, outputDevice, ctx, dotsPerPixel
            ->
            val dummy: CustomEvent? = null
            var src = ""
            for (i in 0..e.childNodes.length) {
                val item = e.childNodes.item(i)
                if (item is org.w3c.dom.CharacterData)
                    src = item.data
            }
            if (src.startsWith("<["))
                src = src.substring(2)
            val node = net.sourceforge.jeuclid.parser.Parser.getInstance().parse(StreamSource(StringReader(src)))
            val jeuclidDom = DOMBuilder.getInstance().createJeuclidDom(node)
            val realWidth = width / dotsPerPixel
            val realHeight = height / dotsPerPixel
            outputDevice.drawWithGraphics(x.toFloat(), y.toFloat(), realWidth.toFloat(), realHeight.toFloat(), { gfx ->
                /*
                 * Scale
                 */
                val viewer = jeuclidDom.defaultView
                //gfx.scale(realWidth / viewer.width, realHeight / (viewer.ascentHeight + viewer.descentHeight))

                /*
                 * And paint
                 */
                viewer.draw(gfx, 0f, 20f)
            });
        })
	<object type="custom/mathml" content="" class="mathml">
		<![CDATA[
		<math xmlns="http://www.w3.org/1998/Math/MathML">
			<mi>W</mi>
			<mo>&#x2009;</mo>
			<mo>=</mo>
			<mo>&#x2009;</mo>
			<mfrac>
				<mrow>
					<mi>Q</mi>
					<mo>&#x2009;</mo>
					<mo>x</mo>
					<mo>&#x2009;</mo>
					<mn>100</mn>
				</mrow>
				<mi>G</mi>
			</mfrac>
		</math>
		]]>
	</object>

This code is written in Kotlin, but you should get the idea. The work needed to cleanly integrate it into OpenHTMLToPDF to handle the math-tag is not that much. It would work similar to the SVG integration.

BUT: JEuclid does not work on JDK9. It makes some strange reflection stuff to register its elements. I only had this working on JDK8. As nearly all my projects are JDK9 now (or at least targeting JDK 9) this is a show stopper for me. For the time being i faked the fractions I needed using a table...

So to integrate MathML support into OpenHTMLToPDF you would first need to:

  • Fork JEuclid and fix it for JDK9
  • Move the classes to your own domain packages (i.e. moving net.sourceforge.jeuclid to e.g. com.github.rjolly.jeuclid)
  • Release it on maven central (which is only possible if you release it with your own domain prefix).

As I don't really need MathML support I did not consider this future.

@danfickle
Copy link
Owner

I just started working on this. However, as well as the Java 9 problem, JEuclid depends on Batik 1.7 while our SVG support required Batik 1.9. This would make it difficult or impossible to build a project that requires both. We really need a fork as @rototor suggests. Anyone game?

@rototor
Copy link
Contributor

rototor commented Dec 24, 2017

@danfickle I'll give it a try, I will fork jeuclid and try to get it working with Batik 1.9 and Java 9.

@rototor
Copy link
Contributor

rototor commented Dec 24, 2017

@danfickle I got it working in this fork: https://github.com/rototor/jeuclid - the main problem with JDK 9 is, that batik exports some org.w3c.event-classes, and JDK 9 has the XML Module which also defines some classes in this namespace. With Java 9 you can not extend packages any more, when they are defined in a module. So the additional classes batik defines for org.w3c.event are not found any more on Java 9 ....

You can checkout the source and do a mvn install. You could then use the version 3.1.10-SNAPSHOT as dependency.

I'll make a de.rototor.jeuclid release, but I must first cleanup and rename some stuff in jeuclid for tjat. I don't think that @maxberger would like to release jeuclid with the old namespace with this changes, as it looses some feature (mainly dynamic DOM change support). Also I'll remove FOP and SWT support. So it can take some time till there is an official maven central artefact.

@rototor
Copy link
Contributor

rototor commented Dec 26, 2017

@danfickle I've just released version 3.1.11 of de.rototor.jeuclid (had a test driver problem with 3.1.10, so the release:perform did not finish ...)

Feel free to start the integration work.

@maxberger
Copy link

maxberger commented Dec 27, 2017 via email

@rototor
Copy link
Contributor

rototor commented Dec 28, 2017

@maxberger Sorry I wont take maintainer ship over JEuclid, as I only need the subset needed for the integration in openhtmltopdf. I plan to add a small (La-)TeX -> MathML converter, to enable the usage of TeX math to enter formulars in OpenHTMLToPDF, because MathML is - sorry - a tag mess, especially if you auto format the HTML sources in IntellJ. So my usage of JEuclid will be purely as a renderer.

@rototor
Copy link
Contributor

rototor commented Jan 4, 2018

@danfickle Are you going to work on this? Otherwise I will try to integrate this. For LaTeX integration I found SmuggleTex which is not yet available in maven central and uses JEuclid under the hood. I will likely fork it and release it on maven central similar to JEuclid.

@danfickle
Copy link
Owner

danfickle commented Jan 10, 2018

Thanks @m-a-t @maxberger @rjolly @rototor

@rototor - sorry I missed your offer of help on this, however the good news is that there is plenty todo! I've uploaded something that works on one example but still todo are:

  • fonts
  • sizing
  • do we include the maths font stix
  • Java 2D support
  • MathML XML entities
  • Documentation

@danfickle
Copy link
Owner

@rototor - I have been investigating fonts for MathML and there are a couple of issues:

  • The DefaultFontFactory picks up fonts from the environment. This is a recipe for a mess as it may work on development environments and not on servers, etc. The simplest solution would probably be to move the two constructor methods to separate methods.
  • The font factory is set once per JVM to a DefaultFontFactory instance. This means that in theory one can not use different font setups for different runs. This is probably a theoretical issue, but if you thought it was worth fixing, the easiest way might be to add a method on FontFactory to set a thread FontFactory and store it in a thread local. Then on getInstance, return the thread local copy or the default if the thread local is null.

Thanks, and let me know if you need a pull-request for this.

@rototor
Copy link
Contributor

rototor commented Jan 13, 2018

@danfickle Feel free to send me a pull request for this. It should be possible to change the font factory per thread, as otherwise this is only going to be a big mess in a container (e.g. Tomcat, JBoss, ...) environment.

So FontFactory.getInstance() should be ThreadLocal and also should be settable. Of course it should default the the DefaultFontFactory. And when doing the cleanup in the PdfBoxRenderer it should be set back to null if possible to avoid memory/class loader leaks. But this is something which I can do later on.

I wont have time to look into this myself till next weekend. But if you send me a pull request I can release a new version, as this does not take that long.

danfickle added a commit that referenced this issue Jan 15, 2018
This also meant altering the SVGDrawer interface so we can pass in a
box and css context from which can be obtained fonts, or any other box
property.
@danfickle
Copy link
Owner

danfickle commented Feb 15, 2018

The STIX 2.0.0 math fonts were only available in otf format, so I ran them through FontSquirrel to generate truetype versions and upload them here together with a stylesheet to use them.

Note: Entity support is not yet baked in. Now MathML has entity support.

Please advice here if this font package is useful/has issues so we can consider adding it as resources in a future version.
stix-fonts.zip

<!DOCTYPE html PUBLIC
"-//OPENHTMLTOPDF//MATH XHTML Character Entities With MathML 1.0//EN"
"">
<html>
<head>
<link rel="stylesheet" href="stix-fonts/stylesheet.css" />
<style>
body {
 font-family: sans-serif;
}
math {
  width: 100%;
}
</style>
</head>
<body>
<h1>MathML</h1>
<math xmlns="http://www.w3.org/1998/Math/MathML">
<mrow>
  <mi>x</mi>
  <mo>=</mo>
  <mfrac>
    <mrow>
      <mrow>
        <mo>-</mo>
        <mi>b</mi>
      </mrow>
      <mo>&#x022e3;</mo>
      <msqrt>
        <mrow>
          <msup>
            <mi>b</mi>
            <mn>2</mn>
          </msup>
          <mo>-</mo>
          <mrow>
            <mn>4</mn>
            <mo>&#8290;</mo>
            <mi>a</mi>
            <mo>&#8290;</mo>
            <mi>c</mi>
          </mrow>
        </mrow>
      </msqrt>
    </mrow>
    <mrow>
      <mn>2</mn>
      <mo>&#8290;</mo>
      <mi mathvariant="bold-italic">a</mi>
    </mrow>
  </mfrac>
</mrow>
</math>

<h2>End</h2>

</body>
</html>

Result:
mathml-support

@rototor rototor mentioned this issue Feb 15, 2018
danfickle added a commit that referenced this issue Mar 12, 2018
Created two DTDs, one with just character entities for XHTML and one
combined character entities for XHTML and MathML. Also:
+ Minor refactoring of entity resolver and catalog.
+ Documented the two new doctypes in author’s guide.
+ Made sure that other doctypes resolve to the empty string.
@danfickle
Copy link
Owner

MathML rendering support is now finished. Please open a new issue if you have any issues with it. Thanks.

danfickle added a commit that referenced this issue Jun 27, 2019
Now, always keep the correct aspect ratio.
danfickle added a commit that referenced this issue Jun 28, 2019
The MathML renderer (JEuclid) produces slightly different results on JDK8 vs JDK11 so is not suitable for auto testing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants