Feature request: Ability to specify a YAML metadata file for all reader types #1960

ghost · 2015-02-19T20:41:36Z

An idea I thought would be useful. Many of the Readers have little or no way to set metadata, and the -M option on the commend line only accepts strings and not arbitrary YAML. Right now you can include a separate file of YAML metadata for Markdown formats (which is simple concatenated with the markdown files during parsing.)

This idea would be to specify a metadata file on the command line ("pandoc -Y metafile.yml" or something), which would be parsed separately and the contents added to the document metadata, regardless of the input file type.

Thanks

mpickering · 2015-02-20T09:20:21Z

When this gets suggested, people generally suggest to use Makefiles.

meowsqueak · 2015-03-18T22:59:55Z

One place where this would be useful is to allow the reading of YAML bibliography files. At the moment (unless I'm mistaken), a YAML references file (as you might insert in the document directly) is not an accepted input format for pandoc-citeproc. This makes it difficult to share a YAML bibliography file between multiple documents. If pandoc could bring in arbitrary YAML files, then both documents could bring in the same YAML bibliography file.

Note: it is desirable (to me) to use the YAML bibliography because it supports URL references better than other formats.

jgm · 2015-03-19T07:52:27Z

If YAML bibliographies are not accepted by pandoc-citeproc, they could certainly be added. Open a ticket on jgm/pandoc-citeproc, once you've confirmed that they don't work (I can't recall whether they do).

meowsqueak · 2015-03-22T20:42:24Z

I double-checked and it seems that external YAML files are supported via both the --bibliography option and via inline YAML "bibliography: " when --filter pandoc-citeproc is specified, provided the YAML file is correctly formatted - there is no warning or error if it's not formatted correctly.

I raised this point because this is not documented as supported at http://johnmacfarlane.net/pandoc/README.html in the citations extension section (doc has no anchors).

So would this suggest I open both a documentation ticket against pandoc, and a ticket against pandoc-citeproc for the lack of warnings? Or is pandoc itself suppressing them?

jgm · 2015-04-13T04:35:13Z

So would this suggest I open both a documentation ticket against pandoc, and a ticket against pandoc-citeproc for the lack of warnings?

Yes, that sounds right. And this can then be closed.

DivineDominion · 2015-11-13T16:30:01Z

I'd still prefer to write metadata in YAML over XML/Dublin Core. Wouldn't this be possible to parse even when LaTeX or something else is the input format?

infinity0 · 2015-12-05T21:59:22Z

+1 for this. If it's too hard to parse in the input document, an easier option could be to specify --metadata-file=XXX. At the moment the other options only allow you to specify a string value for a specific key; this file would allow us to set complex (nested) values for arbitrary keys.

This is useful when using non-default templates. For example, I am trying to generate docbook output from rst input, and adding <author><affiliaton><address><email>..</**> into the existing <articleinfo> is pretty hard.

tarleb · 2016-08-20T11:06:24Z

Allowing for an additional YAML meta-data file would bring one problem: Which markup would be allowed for text in the YAML file? I see three possibilities, neither of which I really like:

Always use Markdown. Inconvenient and unexpected for people used to other formats.
Use the same format as the reader: would make this feature close to useless for epub, docx, etc.
Allow to specify the format for the meta-data file separately. Likely to be complex and unintuitive.

vyp · 2016-08-20T11:16:07Z

What about 2 but do 1 for formats like epub, docx etc?

tarleb · 2016-08-20T11:26:40Z

Let's call that 4. Though better than the other three, it would still be inconvenient for users unfamiliar with Markdown.

infinity0 · 2016-08-20T14:07:46Z

Why not just do (2)? If it's useless for epub/docx people would simply not use it for that format. It still helps with the other formats, though.

tarleb · 2016-08-20T17:19:21Z

The benefit of option (2) is that it's consistent. I experimented a bit, here is a proof of concept for an equivalent but slightly different approach: How about simply supporting the yaml_metadata_block extension for more readers? The linked code implements it for the org reader, but with the restriction that the YAML block is allowed at the top of the document only. The approach builds on existing options and is basically identical to (2) as one can simply cat the two files together.

tarleb · 2016-08-20T19:48:31Z

Extended PoC, adding YAML support to Org, RST, and LaTeX.

tarleb · 2016-08-22T19:36:05Z

No opposing opinions have been voiced yet, so I opened PR #3084 for this.

iandol · 2016-08-25T01:51:26Z

What about the priority of which metadata variable to use when the same variable is specified twice: I want to have a default YAML meta-data file for all conversions, but if there is YAML metadata in the source file (markdown), then that gets priority. So for example define a standard mainfont in metadata.yaml but this could be overridden using the YAML block in the source.md file when converting to LaTeX?

tarleb · 2016-08-25T12:17:29Z

What you want is already doable by passing the defaults containing YAML file as the last argument:

pandoc -f markdown -s your-input-file.md defaults.yaml

Metadata definitions seen first are kept and left unchanged, even if conflicting data is parsed at a later point.

iandol · 2016-08-26T03:00:57Z

OK, I had tried that previously and it never worked, but just found that it was a small error in my Markdown file (my last line was a figure block and not terminated with a newline which caused the yaml to become appended as plain text). Adding a couple of newlines and yaml is now correctly parsed with the document metadata correctly taking precedence. Thank you!

ickc · 2017-01-17T19:28:56Z

I came across this issue from pandoc-discuss, and I find a way that kind of work currently. The idea is to convert the source yml and source document to native first and cat them together (plus a little detail):

pandoc -f markdown -t native -s metadata.yml | sed '$ d' > metadata.native
pandoc -f <fromFormat> -t native -o document.native document.<fromFormat>
pandoc -f native -t <toFormat> -s -o document.<toFormat> metadata.native <document>.native

The extra detail is the sed, because the metadata.yml is regarding as a markdown document with no body, so the last line of the file is [], which you need to remove. Another way of removing it is head -n -1 (would not work on Mac's default head). From my test it seems the meta in native is always in one-line, if true then head -n1 will work (which also works on Mac).

Any cli options should be added to the last line only (to avoid having extra metadata somewhere else).

This approach is kind of hacky since metadata.native can only contain meta and document.native cannot contain one. And the syntax in native is not well-known so I'm not sure if there's any other gotcha.

But it seems this is the only currently working method (alternatively one can convert the document to markdown first and cat from there, but the extra conversion can introduce extra loss.)

Edit: Fixed some typos and add some more comments:

this is basically @tarleb's (1), while working now. The 3 lines are long, but a thin wrapper using shell script or a makefile can hide them away.
Unix only, since I used the shell. But the idea should be applicable to Windows too.
The script above is a sketch. But I tested the idea on real documents to verify it works.

bpj · 2017-01-18T20:11:31Z

Why must the text in a metadata file necessarily be interpteted as any format rather than as plain text?

One alternative would be to make a top-level field like _metadata_format: markdown 'magic'.

jgm · 2017-01-19T09:53:08Z

+++ Benct Philip Jonsson [Jan 18 17 12:11 ]:

Why must the text in a metadata file necessarily be interpteted as any format rather than as plain text?

Well, if you're writing an abstract for example, it's nice to be able to include formatting. If you have a title with math in it, you'd probably like to include math. Plain text is too limiting.

One alternative would be to make a top-level field like _metadata_format: markdown 'magic'.

This is prioritizing the less common case over the more common one. People will find it confusing if *this* makes emphasis in the body of the text but not the title. Better to provide a special way to create a "raw" metadata field when this is needed. See #2139 for that.

ickc · 2017-01-28T22:57:22Z

I made the 3 commands I suggested above shorter. This require bash though (using process substitution).

YAML=metadata.yml; INPUT=document.md; OUTPUT=document.pdf
pandoc -f native -s -o $OUTPUT <(pandoc -f markdown -t native -s $YAML | sed '$ d') <(pandoc -t native $INPUT)

I will later add it to Pandoc Tricks · jgm/pandoc Wiki.

jgm · 2017-11-16T23:14:19Z

See https://groups.google.com/d/msg/pandoc-discuss/6KLbZk7NVWk/0XMWewhLCQAJ
for a way to do this using lua filters.

mb21 · 2017-11-17T09:16:43Z

Which markup would be allowed for text in the YAML file?

It could be argued that if you want to use a specific format to specify metadata, you should use that format's metada block syntax inside the document (e.g. .. meta:: for RST). If that doesn't work for you for some reason, you can use an external YAML file but at that point you have to learn both YAML and markdown. This would at least keep this mechanism simple and predictable.

If you absolutely must, you can also use generic raw snippets and use whatever syntax you like inside "markdown".

mb21 · 2018-03-27T09:25:44Z

I stand by my last comment: let's introduce a --metadata-file option that takes a YAML file (or JSON file, determined by file suffix) where the strings are interpreted as markdown. (Definitions in the file have lower priority than the ones inside the document, solving #3115.)

We can always add more things later, like:

parsing .. meta:: in RST or <meta> in HTML (which would act analogous to the current YAML metadata blocks in markdown)
adding an additional option that specifies the markup language the metadata is interpreted as (overriding the default which would be set to markdown).

jgm · 2018-03-27T16:22:33Z

I think I like @mb21's suggestion. It's simple, and it would help in some of the practical cases described above.

ssolidus · 2018-03-29T05:09:42Z

Re: the thread, I have been using gfm+yaml_metadata_block and passing in a .yml file in the inputs. Or, I use --include-in-header=$file.tex.
@jgm re: use-cases, a very common one you can find a lot of instances of on forums, TeX StackOverflow and so on is the ability the text-wrap code in fenced code fields, as well as apply other styling information to it, such as line numbers. This is highly desirable in many different kinds of documentation, but there is currently no practical way to do it.

What I would like to see is the equivalent of stuff like --variable urlcolor=$color for more/all LaTeX options (at least the styling ones), or, as mentioned above, the ability to pass through custom LaTeX options more easily than is currently possible.

A problem with the JSON/YML solution is it is more technical than a lot of users require, and so a lot of people would simply give up and move to another solution than continuing to fiddle with Pandoc arguments and config files.

jgm · 2018-03-29T16:35:39Z

@jgm re: use-cases, a very common one you can find a lot of instances of on forums, TeX StackOverflow and so on is the ability the text-wrap code in fenced code fields, as well as apply other styling information to it, such as line numbers. This is highly desirable in many different kinds of documentation, but there is currently no practical way to do it.

Sorry, I didn't understand this comment or what it has to do with the topic of this thread.

What I would like to see is the equivalent of stuff like --variable urlcolor=$color for more/all LaTeX options (at least the styling ones), or, as mentioned above, the ability to pass through custom LaTeX options more easily than is currently possible.

This is just a matter of template design. You can always create a custom template that allows you to control some LaTeX option with a variable. And you can also propose modifications to the default template along these lines.

mb21 · 2018-03-31T11:50:46Z

I had a quick look at implementing this, but unfortunately the YAML parsing is quite intertwined with the rest of the Markdown reader.

This is due to the fact that we share state between the YAML metadata block and the rest of the markdown document (I'm guessing for footnotes etc?). This is not going to happen when the YAML is read in from an external file and merged with the document metadata after the reader has produced a Pandoc Meta [Block], and it wouldn't work for other input formats anyway. Thus we'll just have to make users aware that there's a small difference between pandoc --metadata-file m.yaml input.md and pandoc m.yaml input.md.

Still, we have a choice:

either we refactor the existing YAML parsing and export it as a function from the Markdown reader: PandocMonad m => yamlToMeta :: Yaml.Value -> m Meta (or even one taking a ByteString so we could reuse the decoding with error handling). Then all Strings in the YAML metadata file would share one markdown reader state.
or we reimplement the actual YAML parsing somewhere else and apply readMarkdown to each String individually, in which case state wouldn't be shared (and possibly a few more inconsistencies might pop up).
Finally, we could even do (2), but also use the new implementation in the Markdown reader. This seems the cleanest solution (especially if we'd want to parse other syntax than markdown in the future), but possibly might break some existing documents in subtle ways?

I'm unsure what the implications of (not) sharing ParserState are, in practice, with regard to markdown parsing...

jgm · 2018-03-31T18:35:15Z

Mauro Bieg <notifications@github.com> writes:

block and the rest of the markdown document (I'm guessing for footnotes etc?).

Yes, exactly.

This is not going to happen when the YAML is read in from an external file and merged with the document metadata after the reader has produced a `Pandoc Meta [Block]`, and it wouldn't work for other input formats anyway. Thus we'll just have to make users aware that there's a small difference between `pandoc --metadata-file m.yaml input.md` and `pandoc m.yaml input.md`.

Agreed.

Still, we have a choice: 1. either we refactor the existing YAML parsing and export it as a function from the Markdown reader: `PandocMonad m => yamlToMeta :: Yaml.Value -> m Meta` (or even one taking a `ByteString` so we could reuse the decoding with error handling). Then all Strings in the YAML metadata file would share one markdown reader state.

This seems simplest to me, and I don't see a drawback to sharing state. This way, for example, you could define footnotes and reference links within the yaml metadata file. Of course they'd only work within that file, but still people might expect they can do this. Is there a downside?

3. Finally, we could even do (2), but also use the new implementation in the Markdown reader. This seems the cleanest solution (especially if we'd want to parse other syntax than markdown in the future), but possibly might break some existing documents in subtle ways?

If other syntaxes are the issue, then we might try to decouple the markdown-specific parts of the function from the parts that deal with YAML. Perhaps the reader could be passed in as a function? Maybe we could do this in such a way that we don't hard-code use of ParserState?

mpickering added the enhancement label Feb 20, 2015

DavidAntliff mentioned this issue Apr 13, 2015

Documentation: external YAML files for citations not documented as supported #2076

Closed

tarleb mentioned this issue Aug 22, 2016

Support yaml_metadata_block extension in more formats #3084

Closed

tarleb mentioned this issue Apr 2, 2017

Lua module: add readers submodule #3550

Merged

tarleb mentioned this issue Apr 17, 2017

Provide method for metadata processing in lua filters #3584

Merged

mb21 mentioned this issue Mar 27, 2018

Allow complex metadata fields to be defined on the command line (e.g. using JSON) #3732

Closed

mb21 mentioned this issue Apr 23, 2018

the documentation about metadata is a bit fragmented #4584

Open

mb21 added a commit to mb21/pandoc that referenced this issue Apr 27, 2018

introduce --metadata-file option, closes jgm#1960

fbe7891

mb21 mentioned this issue Apr 27, 2018

Introduce --metadata-file option #4604

Merged

mb21 added a commit to mb21/pandoc that referenced this issue May 2, 2018

introduce --metadata-file option, closes jgm#1960

3ffbaa1

mb21 mentioned this issue Sep 14, 2018

* There is no abstract outputed into docx with command option "--metadata=abstract:METADATA"? #4900

Closed

jgm closed this as completed in 6aa5fca Sep 15, 2018

greut mentioned this issue Sep 22, 2018

Is compatibility to the RMarkdown format a good idea? mb21/panrun#1

Open

cavo789 mentioned this issue Sep 10, 2019

Question: support of YAML config as pandoc command line arguments? Wandmalfarbe/pandoc-latex-template#117

Closed

mb21 mentioned this issue Nov 18, 2019

--metadata-file with non-markdown contents? #5914

Open

mb21 mentioned this issue Nov 16, 2020

Allow to choose Markdown extensions used to parse metadata-file #6832

Closed

jgm mentioned this issue Feb 17, 2022

Raw header-includes #7926

Closed

Koala mentioned this issue Dec 2, 2022

[BUG] Unknown option --metadata-file OliverBalfour/obsidian-pandoc#74

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Ability to specify a YAML metadata file for all reader types #1960

Feature request: Ability to specify a YAML metadata file for all reader types #1960

ghost commented Feb 19, 2015

mpickering commented Feb 20, 2015

meowsqueak commented Mar 18, 2015

jgm commented Mar 19, 2015 via email

meowsqueak commented Mar 22, 2015

jgm commented Apr 13, 2015

DivineDominion commented Nov 13, 2015

infinity0 commented Dec 5, 2015

tarleb commented Aug 20, 2016 •

edited

Loading

vyp commented Aug 20, 2016

tarleb commented Aug 20, 2016

infinity0 commented Aug 20, 2016

tarleb commented Aug 20, 2016

tarleb commented Aug 20, 2016

tarleb commented Aug 22, 2016

iandol commented Aug 25, 2016

tarleb commented Aug 25, 2016

iandol commented Aug 26, 2016

ickc commented Jan 17, 2017 •

edited

Loading

bpj commented Jan 18, 2017

jgm commented Jan 19, 2017 via email

ickc commented Jan 28, 2017

jgm commented Nov 16, 2017

mb21 commented Nov 17, 2017 •

edited

Loading

mb21 commented Mar 27, 2018 •

edited

Loading

jgm commented Mar 27, 2018

ssolidus commented Mar 29, 2018 •

edited

Loading

jgm commented Mar 29, 2018

mb21 commented Mar 31, 2018

jgm commented Mar 31, 2018 via email

Feature request: Ability to specify a YAML metadata file for all reader types #1960

Feature request: Ability to specify a YAML metadata file for all reader types #1960

Comments

ghost commented Feb 19, 2015

mpickering commented Feb 20, 2015

meowsqueak commented Mar 18, 2015

jgm commented Mar 19, 2015 via email

meowsqueak commented Mar 22, 2015

jgm commented Apr 13, 2015

DivineDominion commented Nov 13, 2015

infinity0 commented Dec 5, 2015

tarleb commented Aug 20, 2016 • edited Loading

vyp commented Aug 20, 2016

tarleb commented Aug 20, 2016

infinity0 commented Aug 20, 2016

tarleb commented Aug 20, 2016

tarleb commented Aug 20, 2016

tarleb commented Aug 22, 2016

iandol commented Aug 25, 2016

tarleb commented Aug 25, 2016

iandol commented Aug 26, 2016

ickc commented Jan 17, 2017 • edited Loading

bpj commented Jan 18, 2017

jgm commented Jan 19, 2017 via email

ickc commented Jan 28, 2017

jgm commented Nov 16, 2017

mb21 commented Nov 17, 2017 • edited Loading

mb21 commented Mar 27, 2018 • edited Loading

jgm commented Mar 27, 2018

ssolidus commented Mar 29, 2018 • edited Loading

jgm commented Mar 29, 2018

mb21 commented Mar 31, 2018

jgm commented Mar 31, 2018 via email

tarleb commented Aug 20, 2016 •

edited

Loading

ickc commented Jan 17, 2017 •

edited

Loading

mb21 commented Nov 17, 2017 •

edited

Loading

mb21 commented Mar 27, 2018 •

edited

Loading

ssolidus commented Mar 29, 2018 •

edited

Loading