Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the new 'simpleFigure' builder function in the readers. #7364

Closed
wants to merge 3 commits into from

Conversation

argent0
Copy link
Contributor

@argent0 argent0 commented Jun 9, 2021

Reading input with figures

Reading figures is now supported for two formats natively and one format through a Lua filter. This functionality has to be enabled with a new pandoc extension: native_figures.

#### Extension: `native_figures` ####

Use pandoc's native `Figure` element for content inside `<figure>` tags, in the case of HTML, or `figure` environments, in case of LaTeX. This, in turn, allows some writers to produce more accurate representations of figures. It also allows the use of the `Figure` element in filters, for custom figure output.

This extension can be enabled/disabled for the following formats:

input formats
: `latex` `html`

The choice of an extension was made to introduce the new behavior with minimal disruption of the old one.

Reading HTML 5 figures

This version of pandoc can now read figures from HTML into its internal representation:

Now

$ pandoc -f html+native_figures -t native
<figure class="important">
  <img src="../media/rId25.jpg" />
  <ul> <li> ITEM </li> </ul>
  <figcaption> CAP2 </figcaption>
</figure>
^D
[Figure ("",["important"],[]) (Caption Nothing [Plain [Str "CAP2"]])\
	[ Plain [Image ("",[],[]) [] ("../media/rId25.jpg","")]
	, BulletList [[Plain [Str "ITEM"]]]]]

The difference with the old handling is that, now, elements inside a figure are not limited to only one image:

Before

$ pandoc-before -f html -t native
...
^D
[Para [Image ("",[],[]) [Str "CAP2"] ("../media/rId25.jpg","fig:")]]

Notice the missing ITEM. Handling of attributes and classes has also become more granular and accurate. Notice the missing important class above.

Reading LaTeX figures

It can also read LaTeX figures:

Now

pandoc -f latex+native_figures -t native
\begin{figure}
  \begin{subfigure}[b]{0.5\textwidth}
    \begin{subfigure}[b]{0.5\textwidth}
      \centering
      \includegraphics{test/media/rId25.jpg}
      \caption{CAP1.1}
    \end{subfigure}
    \begin{subfigure}[b]{0.5\textwidth}
      \centering
      \includegraphics{test/media/rId25.jpg}
      \caption{CAP1.2}
    \end{subfigure}
    \caption{CAP1}
    \label{fig:inner1}
  \end{subfigure}
  \begin{subfigure}[b]{0.5\textwidth}
    \includegraphics{test/media/rId25.jpg}
    \caption{CAP2}
    \label{fig:inner2}
  \end{subfigure}
  \caption{CAP}
  \label{fig:outer}
\end{figure}
^D
[Figure ("fig:outer",[],[]) (Caption Nothing [Plain [Str "CAP"]]) 
	[ Figure ("fig:inner1",[],[]) (Caption Nothing [Plain [Str "CAP1"]])
		[ Figure ("",[],[]) (Caption Nothing [Plain [Str "CAP1.1"]])
			[Plain [Image ("",[],[]) [] ("test/media/rId25.jpg","")]]
		, Figure ("",[],[]) (Caption Nothing [Plain [Str "CAP1.2"]])
			[Plain [Image ("",[],[]) [] ("test/media/rId25.jpg","")]]]
	, Figure ("fig:inner2",[],[]) (Caption Nothing [Plain [Str "CAP2"]])
		[Plain [Image ("",[],[]) [] ("test/media/rId25.jpg","")]]]]

It respects the figure and sub figure hierarchies, labels and captions accurately.

Before

pandoc -f latex -t native
...
^D
[Para [Image ("",[],[]) [Str "CAP1.1"] ("test/media/rId25.jpg","fig:")]
,Para [Image ("",[],[]) [Str "CAP1.2"] ("test/media/rId25.jpg","fig:")]
,Para [Image ("fig:inner2",[],[]) [Str "CAP2"] ("test/media/rId25.jpg","fig:")]]

Figures with sub-figures where flattened into a list of figures using the construction mentioned in the original figure handling section. Some captions and labels are lost.

Markdown figures

Handling of markdown figures was implemented through a Lua filter. We opted for this approach because there is yet no agreed-upon syntax for figures in markdown (for example commonmark). Pandoc has its own markdown flavor but we have also opted to propose our syntax by implementing it as a filter. The code of the filter has been submitted in the lua-filters pull request.

I've written two filters that represent two possible markdown syntax extensions to represent figures. Both use pandoc's markdown div sytax with special classes.

The explcit caption syntax

::: { .figure }

content.

:::: {.caption }
caption
::::

:::

In this syntax a figure is a pandoc div with a figure class and the caption,
if present, is a div with a caption class.

Here is an example that converts this to HTML

$ pandoc -f markdown -t html --lua-filter=../lua-filters/markdown-figures/md-figure-explicit.lua
...
<figure>
<p>content.</p>
<figcaption><p>caption</p></figcaption>
</figure>

The implicit caption syntax

::: { .figure }
figure content

figure caption.
:::

This, more concise, syntax uses the last paragraph inside the div as the caption for the figure.

$ pandoc -f markdown -t html --lua-filter=../lua-filters/markdown-figures/md-figure-implicit.lua
...
<figure>
<p>figure content</p>
<figcaption><p>figure caption.</p></figcaption>
</figure>

Generating output with figures.

Once figures can be described in pandoc's internal representation, it is the Writers that translate them into various output formats. Not all output formats can represent figures, and for those that can, we have decided to focus on the ones that would make the least intrusive modifications first.

Next, I'll briefly enumerate the output resulting from the Figure constructor in various formats.

HTML

% pandoc -f native -t html5
[Figure ("fig-id",[],[]) (Caption Nothing [Plain [Str "caption"]]) [Para [Str "content"]]]

^D
<figure id="fig-id">
<p>content</p>
<figcaption>caption</figcaption>
</figure>

Figures are represented as <figure> tags.

Org mode format

% pandoc -f native -t org
[Figure ("fig-id",[],[]) (Caption Nothing []) [Para [Str "content"]]]
^D
<<fig-id>>
content

Emac's org mode adds an anchor to the content of the figure.

Textile

% pandoc -f native -t textile
[Figure ("fig-id",[],[]) (Caption Nothing []) [Para [Image ("",[],[]) [] ("foo.png", "")]]]
^D
<figure id="fig-id">

!foo.png!


</figure>

The textile format constructs an HTML5 figure.

Texinfo

% pandoc -f native -t texinfo
[Figure ("fig-id",[],[])
	(Caption Nothing [Para [Str "Caption"]])
	[Para [Image ("",[],[]) [] ("foo.png", "fig:")]]]

^D
@node Top
@top Top

@float Figure
@image{foo,,,Caption,png}
@caption{Caption}
@end float

GNU Info's figures.

RST

% pandoc -f native -t rst
[Figure ("fig-id",[],[]) (Caption Nothing [Para [Str "Caption"]])
	[Para [Image ("",[],[]) [] ("foo.png", "fig:")]]]

^D
.. container:: float
   :name: fig-id

   .. figure:: foo.png
      :alt: 

Figures are represented as containers.

Markdown

% pandoc -f native -t markdown
[Figure ("fig-id",[],[]) (Caption Nothing [Para [Str "Caption"]])
	[Para [Image ("",[],[]) [] ("foo.png", "fig:")]]]
^D
::: {#fig-id .figure}
![](foo.png)
:::

Figures are represented as a pandoc div ith the .figure class.

MediaWiki

% pandoc -f native -t mediawiki
[Figure ("fig-id",[],[]) (Caption Nothing [Para [Str "Caption"]])
	[Para [Image ("",[],[]) [] ("foo.png", "fig:")]]]

^D
<div id="fig-id" class="figure">

[[File:foo.png|thumb|none]]


</div>

Figures are represented as a div with the figure class.

Jats

% pandoc -f native -t jats
[Figure ("fig-id",[],[]) (Caption Nothing [Para [Str "Caption"]]) [Para [Str "Text"],
Para [Image ("fig-id-2",[],[]) [] ("foo.png", "fig:")]]]

^D
<boxed-text id="fig-id">
  <p>Text</p>
  <fig id="fig-id-2">
    <graphic mimetype="image" mime-subtype="png" xlink:href="foo.png" xlink:title="" />
  </fig>
</boxed-text>

Figures are represented with the boxed-text tag in JATS.

XWiki

% pandoc -f native -t xwiki
[Figure ("fig-id",[],[]) (Caption Nothing []) [Para [Str "content"]]]

^D
(((
{{id name="fig-id" /}}content
)))

Figures are represented as groups.

Ohter formats

All other formats handle figures like they handle pandoc's divs.

Testing

To test these formats the command line testing can now be done in a particular folder:

$ test-pandoc -pfigure
pandoc tests
  Command folder: command
    5474-figures.md
      #1:                  OK (0.03s)
    html-read-figure.md
      #1:                  OK (0.01s)
      #2:                  OK (0.01s)
      #3:                  OK (0.02s)
      #4:                  OK (0.02s)
      #5:                  OK (0.02s)
    jats-figure-alt-text.md
      #1:                  OK (0.02s)
  Command folder: command/figures
    figures-haddock.md
      #1:                  OK (0.02s)
    figures-org.md
      #1:                  OK (0.01s)
      #2:                  OK (0.02s)
    figures-fb2.md
      #1:                  OK (0.02s)
    figures-zimwiki.md
      #1:                  OK (0.02s)
    figures-textile.md
      #1:                  OK (0.02s)
      #2:                  OK (0.02s)
    figures-texinfo.md
      #1:                  OK (0.02s)
      #2:                  OK (0.02s)
      #3:                  OK (0.02s)
      #4:                  OK (0.02s)
    figures-rst.md
      #1:                  OK (0.01s)
    figures-mediawiki.md
      #1:                  OK (0.01s)
    figures-markdown.md
      #1:                  OK (0.01s)
    figures-jats.md
      #1:                  OK (0.01s)
    figures-jira.md
      #1:                  OK (0.01s)
    figures-xwiki.md
      #1:                  OK (0.01s)
    figures-html.md
      #1:                  OK (0.01s)
      #2:                  OK (0.01s)
      #3:                  OK (0.01s)
      #4:                  OK (0.01s)
      #5:                  OK (0.01s)
      #6:                  OK (0.01s)
    figures-latex.md
      #1:                  OK (0.01s)
      #2:                  OK (0.02s)
  Readers
    Org
      Basic Blocks
        Figures
          Labelled figure: OK

All 33 tests passed (0.57s)

Readers

  • JATS, also capture the alt-text tag in figures and test.
  • HTML
  • Markdown
  • Latex
  • MediaWiki: This could be refactored when I find a pattern repeated in other readers.
  • VimWiki
  • DokuWiki
  • Ipynb
  • Docx
  • ODT
  • OrgMode
  • RTS

Writers

  • HTML
  • JATS
  • Markdown
  • LaTex
  • MediaWiki
  • Org
  • RTS
  • AsciiDoc
  • ConTeXt
  • Docbook
  • Docx
  • DokuWiki
  • FB2
  • Haddock
  • ICML
  • Muse. It works directly on Inline
  • OpenDocument
  • TEI. Code is commented. Seems like it's fixable. Figure descriptions have a 'fig:' prefix in the output.
  • Texinfo
  • Textile
  • ZimWiki

@argent0 argent0 marked this pull request as draft June 9, 2021 13:14
@argent0 argent0 force-pushed the figures-gsoc branch 5 times, most recently from 9648414 to c3aa2de Compare June 17, 2021 13:24
@argent0 argent0 force-pushed the figures-gsoc branch 4 times, most recently from 37d6e10 to 7f19e7f Compare June 22, 2021 15:46
@argent0 argent0 marked this pull request as ready for review June 22, 2021 16:44
@argent0 argent0 force-pushed the figures-gsoc branch 5 times, most recently from 33fa95f to 3e2ab7a Compare June 29, 2021 09:37
@argent0 argent0 force-pushed the figures-gsoc branch 3 times, most recently from 63da432 to 8fe4216 Compare August 19, 2021 12:31
@argent0 argent0 force-pushed the figures-gsoc branch 3 times, most recently from 461382c to d2b01ca Compare August 25, 2021 17:16
@argent0
Copy link
Contributor Author

argent0 commented Sep 1, 2021

Hi @jgm, GSoC is over. I'm interested about what you think about this.

@jgm
Copy link
Owner

jgm commented Sep 2, 2021

Great! I will look at it when I have a chance.
Can you do these things, first?

  • rebase on to master
  • reorganize/rebase the commits (e.g. I don't think there need to be separate commits for adding the feature to each output format; it's easier later if there's just one commit adding to all -- but some of the commits here should still be separate, like capturing alt text in jats)
  • make sure CI tests are passing?

Then I can have a look.

@argent0 argent0 force-pushed the figures-gsoc branch 3 times, most recently from dd4fa65 to 6dc0e1f Compare September 9, 2021 22:12
* It provides a specific representation for figures in the pandoc's AST.
* It uses the `SimpleFigure` pattern synonym to replace the previous
  construction:

  [Para [Image ("",[],[]) [Str "CAP2"] ("../media/rId25.jpg","fig:")]]
@argent0
Copy link
Contributor Author

argent0 commented Sep 9, 2021

I've squashed the commits while re-basing to current master. CI works now. Please, let me know what you think.

@colindean
Copy link

I'd love to see this merged. I've got a use case where I want to include a description of a figure in the caption but supply a figure title that will be shown next to the figure number and used for the TOC/LOF.

@jgm
Copy link
Owner

jgm commented Jan 17, 2023

@argent0 @tarleb I assume this is now obsolete and can be closed?

@tarleb
Copy link
Collaborator

tarleb commented Jan 17, 2023

Yes, this was all merged as part of 909ced5.

@tarleb tarleb closed this Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants