Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add markdown figure filters. #187

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

argent0
Copy link

@argent0 argent0 commented Jul 29, 2021

This PR requires a Figure constructor in pandoc's AST.

The code for a pandoc fork that has such constructor can be found here.

Details

This filter provides two syntaxs to represent figures in markdown.

Explicit syntax

The explicit syntax is constructed using a div with "figure" class. The
caption is also specified using a div but with a "caption" class.

Here is an example.

::: { .figure }

content.

:::: {.caption }
caption
::::

:::

All elements inside the figure that are an image without a caption in its own
paragraph become html's img tags.

Here is an example of figure containing two images and a caption.

::: { .figure }

![](test/media/rId25.jpg "")

![](test/media/rId25.jpg "")

:::: {.caption }
caption
::::

:::

This will result in a single figure containing multiple images.

$ pandoc -f markdown -t native --lua-filter=md-figure-explicit.lua fig-explicit.md

[Figure ("",[],[]) (Caption (Just []) [Para [Str "caption"]])
	[ Plain [Image ("",[],[]) [] ("test/media/rId25.jpg","")]
	, Plain [Image ("",[],[]) [] ("test/media/rId25.jpg","")]]]
<figure>
<img src="test/media/rId25.jpg" />
<img src="test/media/rId25.jpg" />
<figcaption><p>caption</p></figcaption>
</figure>

This will result in a single figure containing multiple images.

Implicit syntax

The second syntax uses the last paragraph inside the figure as the caption.

::: { .figure }


![](test/media/rId25.jpg "")

![](test/media/rId25.jpg "")

This is a caption with
multiple lines

:::

This results in the following output:

$ pandoc -f markdown -t native --lua-filter=md-figure-implicit.lua fig-implict.md
[Figure ("",[],[])
	(Caption
		(Just [])
		[ Para [Str "This",Space,Str "is",Space,Str "a",Space,Str "caption",Space,Str "with",SoftBreak,Str "multiple",Space,Str "lines"]]) 
	[Plain [Image ("",[],[]) [] ("test/media/rId25.jpg","")],Plain [Image ("",[],[]) [] ("test/media/rId25.jpg","")]]]
<figure>
<img src="test/media/rId25.jpg" />
<img src="test/media/rId25.jpg" />
<figcaption><p>This is a caption with multiple lines</p></figcaption>
</figure>

Sample Firefox's HTML rendering

For the implicit syntax example, this is firefox's render.

render

@argent0 argent0 force-pushed the markdown-figures branch from b0a6556 to 50cbeba Compare July 30, 2021 13:45
@alerque
Copy link
Collaborator

alerque commented Jul 31, 2021

Very interesting indeed. Ironically this (at least the implicit syntax) is almost exactly the same input markdown I conjured up for a book project just 3 days ago. I didn't prefix the class name with a dot but that's the only difference in input. Since the details of my production workflow are completely different and I don't want to distract too much from this issue I'll hide them away here, the curious can click for details.

Sample implementation typesetting figures from Markdown in SILE

For my use case I didn't use a filter at all, but I am using the SILE writer in my Pandoc fork. This writer just takes the div syntax and outputs block wrappers based on the classes, so basically what I get in SILE is a Div content block with a class attribute of figure. Then from the Lua side I can easily handle the nested image and the remaining content as the caption. For example:

Markdown input:

::: figure
![Grossmünster Katedrali](resimler/grossmunster.jpg)

*Huldrych Zwingli’nin 1519–1531 yılları arasında vaaz verdiği İsviçre Zürih’teki Grossmünster Katedrali.*
:::

Gets converted to SIL format thus:

\begin[classes="figure"]{Div}
\img[src=resimler/grossmunster.jpg,title=fig:]{Grossmünster Katedrali}

\Emph{Huldrych Zwingli’nin 1519–1531 yılları arasında vaaz verdiği İsviçre Zürih’teki Grossmünster Katedrali.}
\end{Div}

To typeset this I have a Lua function in the project for the figure class that makes assumptions about how to layout the figures for that book. It overrides the image function to make the images the full frame width and centers everything on the page:

SILE.registerCommand("class:figure", function (options, content)
  local old_img = SILE.Commands["img"]
  SILE.registerCommand("img", function (options, content)
    options.width = "100%fw"
    old_img(options, content)
    SILE.call("skip", { height = "1en" })
  end)
  SILE.call("open-double-page", { double = true, odd = false })
  SILE.call("topfill")
  SILE.call("vfill")
  SILE.call("center", {}, function ()
    SILE.process(content)
  end)
  SILE.Commands["img"] = old_img
end)

The finished result is this page:

image

Obviously my implementation is just overloading the block syntax without introducing a new AST element. This works because of the flexibility I have on the typesetter side but may or may not work well for all output formats. There are pros and cons to overloading an existing object and giving it "magic" smarts vs. having a dedicated content type.

Back to your filter (and your Pandoc fork). I'm not sure we want to merge anything here that doesn't work out of the box with Pandoc, but I'm very interested in seeing something worked out that makes this easier on everybody. I'd be happy to play along with other implementations in the name of keeping things standardized and hence inter-operable if possible.

What are your thoughts on the Pandoc fork? Has the idea of a new AST object for this been brought up yet?

@argent0 argent0 marked this pull request as draft August 2, 2021 12:27
@argent0
Copy link
Author

argent0 commented Aug 2, 2021

Yes, the idea for a new AST object is being discussed since 2016. There have been concrete proposals before our work, that I've merged into my fork.

We are working on improving pandoc's figure support and keep public online discussions . We mainly focus on HTML, LaTeX and Markdown.

So my fork includes the Figure AST element. And most output formats can handle it. On the input side we've decided to go with this method for markdown (which doesn't have support for floats yet) and I'm working on the HTML input (the <figure> tag, which is currently handled in a limited fashion).

<figure class="important">
<ul> <li> Delete me? </li> </ul>
<figcaption> CAP2 </figcaption>
</figure>

Pandoc 2.14

$ pandoc -f html -t native test/figures/figure-simple.html
[BulletList
 [[Plain [Str "Delete",Space,Str "me?"]]]
,Para [Str "CAP2"]]
$ pandoc -f html -t html test/figures/figure-simple.html
<ul>
<li>Delete me?</li>
</ul>
<p>CAP2</p>

My Fork

$ pandoc-fork -f html+native_figures -t native test/figures/figure-simple.html
[Figure ("",["important"],[]) (Caption Nothing [Plain [Str "CAP2"]]) [BulletList [[Plain [Str "Delete",Space,Str "me?"]]]]]
$ pandoc-fork -f html+native_figures -t html test/figures/figure-simple.html
<figure class="important">
<ul>
<li>Delete me?</li>
</ul>
<figcaption>CAP2</figcaption>
</figure>

Thoughts

An ad hoc Figure AST element (which should be understood as a float) will improve consistency and flexibility in many output formats.

@argent0 argent0 marked this pull request as ready for review August 19, 2021 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants