SVG text is rendered as shapes instead of glyphs #475

hbergmey · 2020-05-07T16:30:39Z

I am rendering a document with embedded fonts and SVG. I have managed to generate, place and scale my SVG element correctly and the end result is looking nice. It is displayed with the correct font and is not rasterized but consists of vector shapes.

But unfortunately the literal text is lost. When I open the resulting PDF I can mark, copy and search for every other HTML text, but not for the SVG-Text. I amrendering quite complex graphs and I have to make sure, every node can be found by text searching its label.

Check my attached full example in Scala adapted from the standard red circle example. I am embedding Arial via @font-face using a custom loader.
TestSvgText.zip

     <svg xmlns="http://www.w3.org/2000/svg" height="200" width="100">
       <circle cx="50" cy="50" r="40" stroke="black" stroke-width="3" fill="red" />
       <text x="20" y="50" font-family="'Arial'" font-size="14px">
         Simply a bit of text
       </text>
     </svg>

I have used pdfbox-graphics2d before and searchable text rendering worked well. But since I have to support hyper linking and flexible page layouts I switched over to openhtmltopdf. I am very happy so far, but if I cannot enable searching of graph labels, it breaks the whole point of my solution.

I'd be happy to contribute to the project, if there's some implementation missing in openhtmltopdf. But I really don't know where to start. Apart from that, I am sure, I simply missing something simple, as the developer of pdfbox-graphics2d @rototor has actually contributed the SVG rendering code here. This makes me very confident, that there is a solution.

Thanks in advance for any kind help you can give me.

The text was updated successfully, but these errors were encountered:

rototor · 2020-05-07T19:12:49Z

Are you generating the SVG using the Graphics2D->SVG Batik Adapter? Then the solution should be easy.

The underlying problem is, that Batik always renders text as GlyphVectors, i.e. vector shapes. And Batik is used here to render the SVG. So as soon as you are rendering a SVG you have simply lost regarding the text. It will always be a vector shape, which not only is not searchable/selectable but also bloats your PDF massively. The PdfBoxGraphics2D adapter always gets to draws vector shapes and never sees any text in this case.

So if your rendering code uses a Graphics2D you can simply use the ObjectDrawer's. There is no wiki page for that yet, but the openhtmltopdf-objects package contains some samples. An ObjectDrawer looks like:

https://github.com/danfickle/openhtmltopdf/blob/open-dev-v1/openhtmltopdf-objects/src/main/java/com/openhtmltopdf/objects/jfreechart/JFreeChartBarDiagramObjectDrawer.java

A usage example is here (look for <object type="jfreechart/bar" ...):

https://github.com/danfickle/openhtmltopdf/blob/open-dev-v1/openhtmltopdf-examples/src/main/resources/freemarker/featuredocumentation.ftl

You have then to register the object drawer. To do so you must provide a ObjectDrawerFactory. E.g.

	DefaultObjectDrawerFactory objectDrawerFactory = new DefaultObjectDrawerFactory();
	builder.useObjectDrawerFactory(objectDrawerFactory);
	objectDrawerFactory.registerDrawer("custom/imagerenderer",
			(e, x, y, width, height, outputDevice, ctx, dotsPerPixel) -> {
				double realWidth = width / dotsPerPixel;
				double realHeight = height / dotsPerPixel;
				String id = e.getAttribute("contentid");
				/*
				 * Wenn es keine ID gibt, war wohl das Value null, entsprechend einfach nix
				 * ausgeben.
				 */
				if (StringUtil.isBlankOrNull(id))
					return null;
				IImageRenderer renderer = imageRenderers.get(id);
				checkNotNull(renderer, "Es gibt keinen Renderer mit der ID " + id);

				outputDevice.drawWithGraphics((float) x, (float) y, (float) realWidth, (float) realHeight,
						gfx -> {
							/*
							 * Richtig hinscalen
							 */
							gfx.scale(realWidth / renderer.getWidth(), realHeight / renderer.getHeight());
							/*
							 * Und dann malen
							 */
							renderer.render(gfx);
						});
			return null;
		});

...
	private Map<String, IImageRenderer> imageRenderers = new HashMap<>();

	protected void registerImageRenderer(String id, IImageRenderer renderer) {
		imageRenderers.put(id, renderer);
	}

This is an example from a commercial project of mine. IImageRenderer is some internal interface, which just has a render(Graphics2D) method (and a getHeight()/getWidth()). It may not be perfectly clean from a "separation of concerns" viewpoint, but in some projects, the beans can provide an IImageRenderer to draw their content... Which has proven to be very useful.

You can then use this in the HTML like:

<object type="custom/imagerenderer"
		style="width:400px;height:400px;-fs-page-break-min-height:400px"
		title="Diagram" contentid="content1234">
</object>

While generating the HTML for the PDF for each custom-drawn Graphics2D a unique ID is generated and put into contentid inside the HTML. And the registerImageRenderer() is called with this ID.

When you then generate the PDF from the HTML the renderer are called at the right spots and get a PdfBoxGraphics2D as Graphics2D. You could even customize the PdfBoxGraphics2D settings if you wish, you just have to cast the given graphics context.

The JFreeChartBarDiagramObjectDrawer sample accesses the DOM elements of the object tag to get its data. Depending on your data/need you could also just register on static ObjectDrawer and use the custom tags to define the information you need to draw your graphics.

Hope this helps.

hbergmey · 2020-05-08T07:48:30Z

Thanks a lot for this immediate and in-depth answer. I am going to dig into this now.

Yes, when I found pdfbox-graphics2d my original approach was to use Java Swing components and layout managers to arrange everything text-related and draw graph connections on top. I had been using Swing a lot around the Millenium and until the midth of the first decade, so it felt a little like coming home. The Swing layout managers are very powerful and it saved a lot of work in my current project. So thanks for realizing pdfbox-graphics2d.

Maybe I would have stuck to "printing" customized Swing components onto Graphics2D, but then I would have had to implement page breaking, page numbering, linking and calculating referenced page numbers myself. OpenHtmlToPDF is a great answer to that.

But linking is the one thing I still will have to solve with the Graphics2D approach. JLabel is able to render HTML snippets to a Graphics context, but I would not expect hyper links and anchors to be rendered to PDF properly annotated. Or is the ObjectDrawer capable of even that?

The plan I have on my mind is to generate customized Swing components with links through a factory and render the active links in a second layer based on the component locations I can determine via SwingUtilities.convertRectangle. That said, until I have tried, I am unsure whether I will be able to realize a multi-layer approach with the ObjectDrawer. Do you think, there is a simpler solution?

Thanks again. On to coding this...

rototor · 2020-05-08T07:57:11Z

Using Swing components in print is "interesting". But if it works, why not.

The drawObject() method of the FSObjectDrawer interface returns a Map<Shape, String>. This map if not null can contain shapes and their target URLs. That can be something like "#section1" to reference <a id='section1'> tags in the HTML. See the JFreeChartBarDiagramObjectDrawer example I linked to. That uses the layout information of the JFreeChartBar and builds the shape<->link map for that. You would need to get this information in some other way of course.

hbergmey · 2020-05-08T09:09:11Z

Thanks for pointing that out. I am already breeding over this part of the code. I hope I'll be able to share some of my results here later, but for this I would first have to generate some data without confidential information.

I know, using Swing components for this sounds awkward at first, but it is really useful to have some LayoutManagers ready instead of having to implement all on your own, especially if some absolute positioning is required, which is harder to do in HTML. I am rendering the nodes as JPanels, arranging them according to a customized Sugiyama algorithm and drawing the links on a layer on top. The things most important in that approach are mapping fonts correctly between the UIManager and PDF and setting all containers to non-opaque by default. I still have some problems with clipped texts under some specific conditions, but that is another issue.

Back to ObjectDrawer now...

hbergmey · 2020-05-08T11:48:17Z

Ok, my ObjectDrawer is never called.

<div class="page">
        <a name="585-graph">
          585-graph
        </a>
        <object type="custom/decisiontreegraph" treeId="585" title="585" style="width:100%;height:100%;-fs-page-break-min-height:800px">   
      </object>
</div>

DefaultObjectDrawerFactory.isReplacedElement returns true on the element, but paintReplacedElement is never called.

rototor · 2020-05-08T15:00:19Z

Stupid question, but you registered the ObjectDrawer correctly in the factory and also set the factory into the PdfRendererBuilder using useObjectDrawerFactory()? Did you set a breakpoint in DefaultObjectDrawerFactory.createDrawer()? Is it called? What items are in the map. Is the spelling of the contentType right? I.e. the same in the HTML and when registering the ObjectDrawer?

Paves way for text as glyphs rather than vectors in SVG output once Batik is fixed with drawGlyphVectorWorks resolving to false on platforms later than JRE 1.5.

danfickle · 2020-05-08T15:45:15Z

So as soon as you are rendering a SVG you have simply lost regarding the text. It will always be a vector shape...

I investigated the source of this issue. It turns out as well as issues in our code which I think I have addressed in 670a386, that Batik has a test as to whether it will use glyphs or vectors:

https://github.com/apache/xmlgraphics-batik/blob/trunk/batik-gvt/src/main/java/org/apache/batik/gvt/font/AWTGVTGlyphVector.java

    // This is true if GlyphVector.getGlyphOutline returns glyph outlines
    // that are positioned (if it is false the outlines are always at 0,0).
    private static final boolean outlinesPositioned;
    // This is true if Graphics2D.drawGlyphVector works for the
    // current JDK/OS combination.
    private static final boolean drawGlyphVectorWorks;
    // This is true if Graphics2D.drawGlyphVector will correctly
    // render Glyph Vectors with per glyph transforms.
    private static final boolean glyphVectorTransformWorks;

    static {
        String s = System.getProperty("java.specification.version");
        if ("1.6".compareTo(s) <= 0) {
            outlinesPositioned = true;
            drawGlyphVectorWorks = false;       // [GA] not verified; needs further research
            glyphVectorTransformWorks = true;
        } else if ("1.4".compareTo(s) <= 0) {
            // TODO Java 5
            outlinesPositioned = true;
            drawGlyphVectorWorks = true;
            glyphVectorTransformWorks = true;
        } else if (Platform.isOSX) {
            outlinesPositioned = true;
            drawGlyphVectorWorks = false;
            glyphVectorTransformWorks = false;
        } else {
            outlinesPositioned = false;
            drawGlyphVectorWorks = true;
            glyphVectorTransformWorks = false;
        }
    }

This test turns off glyph rendering in Java 1.6 and above and uses vectors instead. We could perhaps file an issue with Batik to at least make it configurable.

danfickle · 2020-05-08T15:50:14Z

As part of work on #472, I noticed I couldn't get custom object drawer to work unless it was in its own layer. Could you try forcing a layer using the postion: relative surrounding position: absolute trick:

<div class="page">
        <a name="585-graph">
          585-graph
        </a>
        <div style="position: relative;width:100%;height:100%;-fs-page-break-min-height:800px">
           <object type="custom/decisiontreegraph" treeId="585" title="585"  style="width:100%;height:100%;position: absolute;">   
         </div>
      </object>
</div>

hbergmey · 2020-05-11T08:54:03Z

Stupid question, but you registered the ObjectDrawer correctly in the factory and also set the factory into the PdfRendererBuilder using useObjectDrawerFactory()

      val objectDrawerFactory = new DefaultObjectDrawerFactory()
      objectDrawerFactory.registerDrawer("custom/decisiontreegraph", new DecisionTreeObjectDrawer(dtPanels))   // register the factory

      val builder: PdfRendererBuilder = new PdfRendererBuilder
      builder.useFastMode
      val doc = new ElemExtras(html).toJdkDoc
      builder.useProtocolsStreamImplementation(classPathLoader, "classpath")
      builder.useProtocolsStreamImplementation(fontLoader, "windowsfonts")
      builder.withW3cDocument(doc, "classpath:///")
      builder.toStream(os)
      builder.useObjectDrawerFactory(objectDrawerFactory)  // activate object drawer
      builder.useSVGDrawer(new BatikSVGDrawer())
      builder.run()

Yes. The fact that DefaultObjectDrawerFactory.isReplacedObject returns true, proves the factory to be registered and the key matches the type attribute of the object.

Did you set a breakpoint in DefaultObjectDrawerFactory.createDrawer()? Is it called?

Yes and yes.

What items are in the map. Is the spelling of the contentType right? I.e. the same in the HTML and when registering the ObjectDrawer?

Looking correct.

As part of work on #472, I noticed I couldn't get custom object drawer to work unless it was in its own layer. Could you try forcing a layer using the postion: relative surrounding position: absolute trick:
Yes.

    <div style="position: relative;width:100%;height:100%;-fs-page-break-min-height:800px;">
      <object type="custom/decisiontreegraph" treeId={decisionTree.id.value} title={decisionTree.id.value}
              style="width:100%;height:100%;position: absolute;">
       </object>
    </div>

First success: , drawObject is invoked now. But nothing becomes visible in the end result. drawObject is called with a width of 13554.0, but a height of -1.0. I guess the graphics is clipped.

This is how I am drawing the component to the grahics. Sorry, I hope you have no hard time reading Scala. SwingUtilities.paintComponent is invoked but dotHeight is calculated to 0.05.

class DecisionTreeObjectDrawer(decisionTreePanels: Map[String, DecisionTreeGraphPanel]) extends FSObjectDrawer with LazyLogging {
  override def drawObject(
    e: Element,
    x: Double,
    y: Double,
    width: Double,
    height: Double,
    outputDevice: OutputDevice,
    ctx: RenderingContext,
    dotsPerPixel: Int
  ): util.Map[Shape, String] = {
    logger.info("drawing tree '{}'", e.getAttribute("treeId"))
    Option(e.getAttribute("treeId"))
      .flatMap(decisionTreePanels.get)
      .map { dtPanel =>
        val dotWidth: Float = (width / dotsPerPixel).toFloat
        val dotHeight: Float = (height / dotsPerPixel).toFloat
        outputDevice.drawWithGraphics(x.toFloat, y.toFloat, dotWidth, dotHeight, (g2d: Graphics2D) => {
          SwingHelper.layoutComponent(dtPanel)
          val dim = dtPanel.getPreferredSize
          val scaleToFitPage = Math.min(dim.getWidth / dim.width, (dim.getHeight - 50) / dim.height)
          g2d.scale(scaleToFitPage, scaleToFitPage)
          val crp = new CellRendererPane()
          SwingUtilities.paintComponent(g2d, dtPanel, crp, 0, 0,dotWidth.toInt, dotHeight.toInt)
        })
        Map.empty[Shape, String].asJava    // not returning links, yet
      } match {
      case Some(shapeLinks) => shapeLinks
      case None =>
        throw new Exception(s"Object element in HTML without valid treeId attribute value: $e")
    }
  }
}

I guess, now height:100%; is not calculated right anymore.

rototor · 2020-05-11T10:48:35Z

This is just guessing on my side, but:

Does it work to render something if you specify a fixed width/height? e.g. 10cm x 10cm or something like this?
I would scale the Components to render up. I.e. do something like this

const double SCALE_FACTOR = 1000;
g2d.scale( 1/SCALE_FACTOR, 1/SCALE_FACTOR);
 SwingUtilities.paintComponent(g2d, dtPanel, crp, 0, 0,SCALE_FACTOR*dotWidth.toInt, SCALE_FACTOR*dotHeight.toInt)

The swing components may not handle very small width/height correctly, as they only operate on integers. So scaling the component size up may help here. On the other side by scaling the g2d down this scaling is negated. I always use this if I need to draw something using an integer only API.

hbergmey · 2020-05-11T11:00:20Z

As a quick test I have implemented it as follows, but the page remains empty.

          SwingHelper.layoutComponent(dtPanel)
          val dim = dtPanel.getPreferredSize
          val scaleToFitPage = Math.min(dim.getWidth / dim.width, (dim.getHeight - 50) / dim.height)
          g2d.scale(scaleToFitPage, scaleToFitPage)
          val crp = new CellRendererPane()
          val (renderWidth, renderHeight) = ((dim.width * scaleToFitPage).toInt, (dim.height * scaleToFitPage).toInt)
          SwingUtilities.paintComponent(g2d, dtPanel, crp, 0, 0,renderWidth, renderHeight)

I've checked that the g2d does not have a Clip set, so that is not the cause either. Next I am going to try to set an absolute Pixel height in the outer div.

hbergmey · 2020-05-11T11:17:55Z

THAT is it. If I set an absolute pixel height in the outer DIV the content is drawn.

    <div style="position: relative;width:100%;height:750px;-fs-page-break-min-height:800px;">
      <object type="custom/decisiontreegraph" treeId={decisionTree.id.value} title={decisionTree.id.value}
              style="width:100%;height:100%;position: absolute;">
       </object>
    </div>

This is comparable to the usual vertical centering hassle in web pages. height:100% means 0 if the container has no explicit height > 0.

@danfickle and the wrapping div is not required, if an absolute height is set directly for the object,

      <object type="custom/decisiontreegraph" treeId={decisionTree.id.value} title={decisionTree.id.value}
              style="width:100%;height:750px;-fs-page-break-min-height:800px;">
       </object>

This works, too.

So having figured this out, what are your suggestions regarding best practice for realizing object rendering in combination with paging? One way is obviously to explicitly define a container DIV explicitly sized to the printable area, but then we lose automatic page distribution for large objects, don't we? And that could mean, the whole page has to be layed out explicitly. Do you see any way around this?

rototor · 2020-05-11T11:27:39Z

Do you have an estimate how much content is in your graph? I usually try to estimate the height as much as possible, i.e. calculating the height based on the data. Within a normal page what would 100% mean? The whole page? Or the size of the whole document spanning multiple pages? What should be the maximum here? How many pages should this graph span? You have to somehow set a guideline how height your content should be. Width 100% is no problem, as the page just has its maximum width.

I would also suggest to use cm or in (depending on the metrics you use) to specify the height.

hbergmey · 2020-05-11T12:18:11Z

I know the extents of the graph exactly from the following step:

SwingHelper.layoutComponent(dtPanel)
val dim = dtPanel.getPreferredSize

object SwingHelper { // my own Swing tools
  def layoutComponent(c : Component) : Unit = {
    c.getTreeLock.synchronized {
      c.doLayout()
      c match {
        case container: Container =>
          val cmps = container.getComponents
          cmps.foreach(layoutComponent)
        case _ =>
      }
    }
  }
}

This invokes the LayoutManagers in the containment hierarchy recursively and returns the resulting preferred size of outermost container. This mechanism is the reason, why I chose to use Swing Layouts in the first place. That way I can use GridBagLayout, FlowLayout, BorderLayout, BoxLayout, BorderFactory and so on and get to query pixel exact positions of contained components using the standard SwingUtilities.convertRectangle operations. I needed these exact positions to render graph edges on another layer.

From the graph size I can obviously calculate a regular tile split of the graph, if I know the dimensions of the available space. But cutting a large image is not feasible for my use case, because graph nodes are not homgenously distributed over the rectangular bounds. A tree for example has a lot a free space around the trunk, but less around the leaves. And there are several more constraints. You would not want to cut straight through nodes and thus split labels. And you will want to keep strongly related neighbourships together. Some edges consequently have to cross several tiles. You get the idea.

I am accounting for that with a custom layered graph layout that is easier to split over rectangles. But the resulting rectangles will be of varying sizes and edges to nodes off the same page will be represented by local proxy nodes displaying pyge references and hyperlinks.

So, to calculate good splits of the graph and optimize the distribution of sub graphs over pages, I will have to know how much space is left on a page. I think, I will have to completely keep the graph section separate from the containing chapter and calculate a page separation myself.

hbergmey · 2020-05-11T12:46:11Z

But I recognize all this is leading away from the original question. I consider my issue solved, even though this might still rise a few follow-up questions .

To sum it up, for custom rendering involving selectable text:

Depend on "com.openhtmltopdf" % "openhtmltopdf-objects" % openHtmlToPdfVersion
Implement an FSObjectDrawer, which reads attributes and children from an object XML to determine the content to display and and then renders it using Graphics2D in a callback to outputDevice.drawWithGraphics
Register the resulting object drawer with a custom Content Type ID in a DefaultObjectDrawerFactory
Enable the factory instance with builder.useObjectDrawerFactory(objectDrawerFactory)
Generate an object-Element in HTML with the Content Type ID as value of the type attribute and further attributes or children as required by the custom object drawer.
Layout the XML so the object element either has an absolute height or is contained in an element with an absolute height.
Let the object drawer return a map of the generated shapes the should provide a hyperlink.

This is a very powerful feature and absolutely worth its own Wiki page.

Thank you very much für your support, the very quick and deep response and all in all for realizing this awesome library.

Joniras · 2023-03-10T14:12:13Z

@hbergmey does the provided solution explain how to solve this problem (of the issue explanation):

When I open the resulting PDF I can mark, copy and search for every other HTML text, but not for the SVG-Text

If yes i would love to get a link to a wiki or an example code how to make the svg text selectable, i am struggling with the same.

hbergmey · 2023-03-10T16:35:42Z

You see the recipe in my last post and there are the relevant code snippets in the thread, too. If you render using Graphics2D, the text will remain selectable. I know, it is a bit lenghty, but I am sure you can figure out the important bits for you by reading this from top to bottom.

danfickle added a commit that referenced this issue May 8, 2020

#475 Better text rendering in SVG graphics. [ci skip]

670a386

Paves way for text as glyphs rather than vectors in SVG output once Batik is fixed with drawGlyphVectorWorks resolving to false on platforms later than JRE 1.5.

hbergmey closed this as completed May 11, 2020

hbergmey mentioned this issue May 12, 2020

Object Renderer link placement #477

Closed

rototor mentioned this issue May 14, 2020

Is there a replacement for onEndPage(...) to add a Watermark for every page without an additional loop through the pages #472

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SVG text is rendered as shapes instead of glyphs #475

SVG text is rendered as shapes instead of glyphs #475

hbergmey commented May 7, 2020

rototor commented May 7, 2020

hbergmey commented May 8, 2020

rototor commented May 8, 2020

hbergmey commented May 8, 2020

hbergmey commented May 8, 2020

rototor commented May 8, 2020

danfickle commented May 8, 2020

danfickle commented May 8, 2020

hbergmey commented May 11, 2020 •

edited

Loading

rototor commented May 11, 2020

hbergmey commented May 11, 2020

hbergmey commented May 11, 2020

rototor commented May 11, 2020

hbergmey commented May 11, 2020

hbergmey commented May 11, 2020 •

edited

Loading

Joniras commented Mar 10, 2023

hbergmey commented Mar 10, 2023

SVG text is rendered as shapes instead of glyphs #475

SVG text is rendered as shapes instead of glyphs #475

Comments

hbergmey commented May 7, 2020

rototor commented May 7, 2020

hbergmey commented May 8, 2020

rototor commented May 8, 2020

hbergmey commented May 8, 2020

hbergmey commented May 8, 2020

rototor commented May 8, 2020

danfickle commented May 8, 2020

danfickle commented May 8, 2020

hbergmey commented May 11, 2020 • edited Loading

rototor commented May 11, 2020

hbergmey commented May 11, 2020

hbergmey commented May 11, 2020

rototor commented May 11, 2020

hbergmey commented May 11, 2020

hbergmey commented May 11, 2020 • edited Loading

Joniras commented Mar 10, 2023

hbergmey commented Mar 10, 2023

hbergmey commented May 11, 2020 •

edited

Loading

hbergmey commented May 11, 2020 •

edited

Loading