Use the new 'simpleFigure' builder function in the readers. #7364

argent0 · 2021-06-09T13:14:08Z

Reading input with figures

Reading figures is now supported for two formats natively and one format through a Lua filter. This functionality has to be enabled with a new pandoc extension: native_figures.

#### Extension: `native_figures` ####

Use pandoc's native `Figure` element for content inside `<figure>` tags, in the case of HTML, or `figure` environments, in case of LaTeX. This, in turn, allows some writers to produce more accurate representations of figures. It also allows the use of the `Figure` element in filters, for custom figure output.

This extension can be enabled/disabled for the following formats:

input formats
: `latex` `html`

The choice of an extension was made to introduce the new behavior with minimal disruption of the old one.

Reading HTML 5 figures

This version of pandoc can now read figures from HTML into its internal representation:

Now

$ pandoc -f html+native_figures -t native
<figure class="important">
  <img src="../media/rId25.jpg" />
  <ul> <li> ITEM </li> </ul>
  <figcaption> CAP2 </figcaption>
</figure>
^D
[Figure ("",["important"],[]) (Caption Nothing [Plain [Str "CAP2"]])\
	[ Plain [Image ("",[],[]) [] ("../media/rId25.jpg","")]
	, BulletList [[Plain [Str "ITEM"]]]]]

The difference with the old handling is that, now, elements inside a figure are not limited to only one image:

Before

$ pandoc-before -f html -t native
...
^D
[Para [Image ("",[],[]) [Str "CAP2"] ("../media/rId25.jpg","fig:")]]

Notice the missing ITEM. Handling of attributes and classes has also become more granular and accurate. Notice the missing important class above.

Reading LaTeX figures

It can also read LaTeX figures:

Now

pandoc -f latex+native_figures -t native
\begin{figure}
  \begin{subfigure}[b]{0.5\textwidth}
    \begin{subfigure}[b]{0.5\textwidth}
      \centering
      \includegraphics{test/media/rId25.jpg}
      \caption{CAP1.1}
    \end{subfigure}
    \begin{subfigure}[b]{0.5\textwidth}
      \centering
      \includegraphics{test/media/rId25.jpg}
      \caption{CAP1.2}
    \end{subfigure}
    \caption{CAP1}
    \label{fig:inner1}
  \end{subfigure}
  \begin{subfigure}[b]{0.5\textwidth}
    \includegraphics{test/media/rId25.jpg}
    \caption{CAP2}
    \label{fig:inner2}
  \end{subfigure}
  \caption{CAP}
  \label{fig:outer}
\end{figure}
^D
[Figure ("fig:outer",[],[]) (Caption Nothing [Plain [Str "CAP"]]) 
	[ Figure ("fig:inner1",[],[]) (Caption Nothing [Plain [Str "CAP1"]])
		[ Figure ("",[],[]) (Caption Nothing [Plain [Str "CAP1.1"]])
			[Plain [Image ("",[],[]) [] ("test/media/rId25.jpg","")]]
		, Figure ("",[],[]) (Caption Nothing [Plain [Str "CAP1.2"]])
			[Plain [Image ("",[],[]) [] ("test/media/rId25.jpg","")]]]
	, Figure ("fig:inner2",[],[]) (Caption Nothing [Plain [Str "CAP2"]])
		[Plain [Image ("",[],[]) [] ("test/media/rId25.jpg","")]]]]

It respects the figure and sub figure hierarchies, labels and captions accurately.

Before

pandoc -f latex -t native
...
^D
[Para [Image ("",[],[]) [Str "CAP1.1"] ("test/media/rId25.jpg","fig:")]
,Para [Image ("",[],[]) [Str "CAP1.2"] ("test/media/rId25.jpg","fig:")]
,Para [Image ("fig:inner2",[],[]) [Str "CAP2"] ("test/media/rId25.jpg","fig:")]]

Figures with sub-figures where flattened into a list of figures using the construction mentioned in the original figure handling section. Some captions and labels are lost.

Markdown figures

Handling of markdown figures was implemented through a Lua filter. We opted for this approach because there is yet no agreed-upon syntax for figures in markdown (for example commonmark). Pandoc has its own markdown flavor but we have also opted to propose our syntax by implementing it as a filter. The code of the filter has been submitted in the lua-filters pull request.

I've written two filters that represent two possible markdown syntax extensions to represent figures. Both use pandoc's markdown div sytax with special classes.

The explcit caption syntax

::: { .figure }

content.

:::: {.caption }
caption
::::

:::

In this syntax a figure is a pandoc div with a figure class and the caption,
if present, is a div with a caption class.

Here is an example that converts this to HTML

$ pandoc -f markdown -t html --lua-filter=../lua-filters/markdown-figures/md-figure-explicit.lua
...

<figure>
<p>content.</p>
<figcaption><p>caption</p></figcaption>
</figure>

The implicit caption syntax

::: { .figure }
figure content

figure caption.
:::

This, more concise, syntax uses the last paragraph inside the div as the caption for the figure.

$ pandoc -f markdown -t html --lua-filter=../lua-filters/markdown-figures/md-figure-implicit.lua
...

<figure>
<p>figure content</p>
<figcaption><p>figure caption.</p></figcaption>
</figure>

Generating output with figures.

Once figures can be described in pandoc's internal representation, it is the Writers that translate them into various output formats. Not all output formats can represent figures, and for those that can, we have decided to focus on the ones that would make the least intrusive modifications first.

Next, I'll briefly enumerate the output resulting from the Figure constructor in various formats.

HTML

% pandoc -f native -t html5
[Figure ("fig-id",[],[]) (Caption Nothing [Plain [Str "caption"]]) [Para [Str "content"]]]

^D

<figure id="fig-id">
<p>content</p>
<figcaption>caption</figcaption>
</figure>

Figures are represented as <figure> tags.

Org mode format

% pandoc -f native -t org
[Figure ("fig-id",[],[]) (Caption Nothing []) [Para [Str "content"]]]
^D

<<fig-id>>
content

Emac's org mode adds an anchor to the content of the figure.

Textile

% pandoc -f native -t textile
[Figure ("fig-id",[],[]) (Caption Nothing []) [Para [Image ("",[],[]) [] ("foo.png", "")]]]
^D

<figure id="fig-id">

!foo.png!


</figure>

The textile format constructs an HTML5 figure.

Texinfo

% pandoc -f native -t texinfo
[Figure ("fig-id",[],[])
	(Caption Nothing [Para [Str "Caption"]])
	[Para [Image ("",[],[]) [] ("foo.png", "fig:")]]]

^D

@node Top
@top Top

@float Figure
@image{foo,,,Caption,png}
@caption{Caption}
@end float

GNU Info's figures.

RST

% pandoc -f native -t rst
[Figure ("fig-id",[],[]) (Caption Nothing [Para [Str "Caption"]])
	[Para [Image ("",[],[]) [] ("foo.png", "fig:")]]]

^D

.. container:: float
   :name: fig-id

   .. figure:: foo.png
      :alt:

Figures are represented as containers.

Markdown

% pandoc -f native -t markdown
[Figure ("fig-id",[],[]) (Caption Nothing [Para [Str "Caption"]])
	[Para [Image ("",[],[]) [] ("foo.png", "fig:")]]]
^D

::: {#fig-id .figure}
![](foo.png)
:::

Figures are represented as a pandoc div ith the .figure class.

MediaWiki

% pandoc -f native -t mediawiki
[Figure ("fig-id",[],[]) (Caption Nothing [Para [Str "Caption"]])
	[Para [Image ("",[],[]) [] ("foo.png", "fig:")]]]

^D

<div id="fig-id" class="figure">

[[File:foo.png|thumb|none]]


</div>

Figures are represented as a div with the figure class.

Jats

% pandoc -f native -t jats
[Figure ("fig-id",[],[]) (Caption Nothing [Para [Str "Caption"]]) [Para [Str "Text"],
Para [Image ("fig-id-2",[],[]) [] ("foo.png", "fig:")]]]

^D

<boxed-text id="fig-id">
  <p>Text</p>
  <fig id="fig-id-2">
    <graphic mimetype="image" mime-subtype="png" xlink:href="foo.png" xlink:title="" />
  </fig>
</boxed-text>

Figures are represented with the boxed-text tag in JATS.

XWiki

% pandoc -f native -t xwiki
[Figure ("fig-id",[],[]) (Caption Nothing []) [Para [Str "content"]]]

^D

(((
{{id name="fig-id" /}}content
)))

Figures are represented as groups.

Ohter formats

All other formats handle figures like they handle pandoc's divs.

Testing

To test these formats the command line testing can now be done in a particular folder:

$ test-pandoc -pfigure
pandoc tests
  Command folder: command
    5474-figures.md
      #1:                  OK (0.03s)
    html-read-figure.md
      #1:                  OK (0.01s)
      #2:                  OK (0.01s)
      #3:                  OK (0.02s)
      #4:                  OK (0.02s)
      #5:                  OK (0.02s)
    jats-figure-alt-text.md
      #1:                  OK (0.02s)
  Command folder: command/figures
    figures-haddock.md
      #1:                  OK (0.02s)
    figures-org.md
      #1:                  OK (0.01s)
      #2:                  OK (0.02s)
    figures-fb2.md
      #1:                  OK (0.02s)
    figures-zimwiki.md
      #1:                  OK (0.02s)
    figures-textile.md
      #1:                  OK (0.02s)
      #2:                  OK (0.02s)
    figures-texinfo.md
      #1:                  OK (0.02s)
      #2:                  OK (0.02s)
      #3:                  OK (0.02s)
      #4:                  OK (0.02s)
    figures-rst.md
      #1:                  OK (0.01s)
    figures-mediawiki.md
      #1:                  OK (0.01s)
    figures-markdown.md
      #1:                  OK (0.01s)
    figures-jats.md
      #1:                  OK (0.01s)
    figures-jira.md
      #1:                  OK (0.01s)
    figures-xwiki.md
      #1:                  OK (0.01s)
    figures-html.md
      #1:                  OK (0.01s)
      #2:                  OK (0.01s)
      #3:                  OK (0.01s)
      #4:                  OK (0.01s)
      #5:                  OK (0.01s)
      #6:                  OK (0.01s)
    figures-latex.md
      #1:                  OK (0.01s)
      #2:                  OK (0.02s)
  Readers
    Org
      Basic Blocks
        Figures
          Labelled figure: OK

All 33 tests passed (0.57s)

Readers

Writers

argent0 · 2021-09-01T22:56:26Z

Hi @jgm, GSoC is over. I'm interested about what you think about this.

jgm · 2021-09-02T15:34:11Z

Great! I will look at it when I have a chance.
Can you do these things, first?

rebase on to master
reorganize/rebase the commits (e.g. I don't think there need to be separate commits for adding the feature to each output format; it's easier later if there's just one commit adding to all -- but some of the commits here should still be separate, like capturing alt text in jats)
make sure CI tests are passing?

Then I can have a look.

* It provides a specific representation for figures in the pandoc's AST. * It uses the `SimpleFigure` pattern synonym to replace the previous construction: [Para [Image ("",[],[]) [Str "CAP2"] ("../media/rId25.jpg","fig:")]]

argent0 · 2021-09-09T22:30:01Z

I've squashed the commits while re-basing to current master. CI works now. Please, let me know what you think.

colindean · 2022-09-01T23:44:07Z

I'd love to see this merged. I've got a use case where I want to include a description of a figure in the caption but supply a figure title that will be shown next to the figure number and used for the TOC/LOF.

jgm · 2023-01-17T03:05:28Z

@argent0 @tarleb I assume this is now obsolete and can be closed?

tarleb · 2023-01-17T06:10:37Z

Yes, this was all merged as part of 909ced5.

argent0 marked this pull request as draft June 9, 2021 13:14

argent0 force-pushed the figures-gsoc branch 5 times, most recently from 9648414 to c3aa2de Compare June 17, 2021 13:24

argent0 force-pushed the figures-gsoc branch 4 times, most recently from 37d6e10 to 7f19e7f Compare June 22, 2021 15:46

argent0 marked this pull request as ready for review June 22, 2021 16:44

argent0 force-pushed the figures-gsoc branch 5 times, most recently from 33fa95f to 3e2ab7a Compare June 29, 2021 09:37

argent0 force-pushed the figures-gsoc branch 3 times, most recently from 63da432 to 8fe4216 Compare August 19, 2021 12:31

argent0 mentioned this pull request Aug 19, 2021

simpleFigure builder [GSoC 2021] jgm/pandoc-types#90

Closed

argent0 force-pushed the figures-gsoc branch 3 times, most recently from 461382c to d2b01ca Compare August 25, 2021 17:16

argent0 force-pushed the figures-gsoc branch from d2b01ca to 56d4537 Compare September 1, 2021 23:06

argent0 force-pushed the figures-gsoc branch 3 times, most recently from dd4fa65 to 6dc0e1f Compare September 9, 2021 22:12

Use the Figure Block constructor.

899e0c0

* It provides a specific representation for figures in the pandoc's AST. * It uses the `SimpleFigure` pattern synonym to replace the previous construction: [Para [Image ("",[],[]) [Str "CAP2"] ("../media/rId25.jpg","fig:")]]

Capture alt-text in JATS figures

6dc0e1f

Merge branch 'master' into figures-gsoc

fe42884

argent0 mentioned this pull request Sep 15, 2021

Add the SimpleFigure bidirectional pattern synonym. jgm/pandoc-types#93

Merged

argent0 mentioned this pull request Oct 6, 2021

Use the SimpleFigure in Readers #7611

Closed

This was referenced Nov 20, 2021

Capture alt-text in JATS figures #7703

Merged

Support complex figures. #7704

Merged

tarleb closed this Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the new 'simpleFigure' builder function in the readers. #7364

Use the new 'simpleFigure' builder function in the readers. #7364

argent0 commented Jun 9, 2021 •

edited

Loading

argent0 commented Sep 1, 2021

jgm commented Sep 2, 2021

argent0 commented Sep 9, 2021

colindean commented Sep 1, 2022

jgm commented Jan 17, 2023

tarleb commented Jan 17, 2023

Use the new 'simpleFigure' builder function in the readers. #7364

Use the new 'simpleFigure' builder function in the readers. #7364

Conversation

argent0 commented Jun 9, 2021 • edited Loading

Reading input with figures

Reading HTML 5 figures

Reading LaTeX figures

Markdown figures

The explcit caption syntax

The implicit caption syntax

Generating output with figures.

HTML

Org mode format

Textile

Texinfo

RST

Markdown

MediaWiki

Jats

XWiki

Ohter formats

Testing

Readers

Writers

argent0 commented Sep 1, 2021

jgm commented Sep 2, 2021

argent0 commented Sep 9, 2021

colindean commented Sep 1, 2022

jgm commented Jan 17, 2023

tarleb commented Jan 17, 2023

argent0 commented Jun 9, 2021 •

edited

Loading