Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Ability to specify a YAML metadata file for all reader types #1960

Closed
ghost opened this issue Feb 19, 2015 · 29 comments
Closed

Comments

@ghost
Copy link

ghost commented Feb 19, 2015

An idea I thought would be useful. Many of the Readers have little or no way to set metadata, and the -M option on the commend line only accepts strings and not arbitrary YAML. Right now you can include a separate file of YAML metadata for Markdown formats (which is simple concatenated with the markdown files during parsing.)

This idea would be to specify a metadata file on the command line ("pandoc -Y metafile.yml" or something), which would be parsed separately and the contents added to the document metadata, regardless of the input file type.

Thanks

@mpickering
Copy link
Collaborator

When this gets suggested, people generally suggest to use Makefiles.

@meowsqueak
Copy link

One place where this would be useful is to allow the reading of YAML bibliography files. At the moment (unless I'm mistaken), a YAML references file (as you might insert in the document directly) is not an accepted input format for pandoc-citeproc. This makes it difficult to share a YAML bibliography file between multiple documents. If pandoc could bring in arbitrary YAML files, then both documents could bring in the same YAML bibliography file.

Note: it is desirable (to me) to use the YAML bibliography because it supports URL references better than other formats.

@jgm
Copy link
Owner

jgm commented Mar 19, 2015 via email

@meowsqueak
Copy link

I double-checked and it seems that external YAML files are supported via both the --bibliography option and via inline YAML "bibliography: " when --filter pandoc-citeproc is specified, provided the YAML file is correctly formatted - there is no warning or error if it's not formatted correctly.

I raised this point because this is not documented as supported at http://johnmacfarlane.net/pandoc/README.html in the citations extension section (doc has no anchors).

So would this suggest I open both a documentation ticket against pandoc, and a ticket against pandoc-citeproc for the lack of warnings? Or is pandoc itself suppressing them?

@jgm
Copy link
Owner

jgm commented Apr 13, 2015

So would this suggest I open both a documentation ticket against pandoc, and a ticket against pandoc-citeproc for the lack of warnings?

Yes, that sounds right. And this can then be closed.

@DivineDominion
Copy link

I'd still prefer to write metadata in YAML over XML/Dublin Core. Wouldn't this be possible to parse even when LaTeX or something else is the input format?

@infinity0
Copy link

+1 for this. If it's too hard to parse in the input document, an easier option could be to specify --metadata-file=XXX. At the moment the other options only allow you to specify a string value for a specific key; this file would allow us to set complex (nested) values for arbitrary keys.

This is useful when using non-default templates. For example, I am trying to generate docbook output from rst input, and adding <author><affiliaton><address><email>..</**> into the existing <articleinfo> is pretty hard.

@tarleb
Copy link
Collaborator

tarleb commented Aug 20, 2016

Allowing for an additional YAML meta-data file would bring one problem: Which markup would be allowed for text in the YAML file? I see three possibilities, neither of which I really like:

  1. Always use Markdown. Inconvenient and unexpected for people used to other formats.
  2. Use the same format as the reader: would make this feature close to useless for epub, docx, etc.
  3. Allow to specify the format for the meta-data file separately. Likely to be complex and unintuitive.

@vyp
Copy link

vyp commented Aug 20, 2016

What about 2 but do 1 for formats like epub, docx etc?

@tarleb
Copy link
Collaborator

tarleb commented Aug 20, 2016

Let's call that 4. Though better than the other three, it would still be inconvenient for users unfamiliar with Markdown.

@infinity0
Copy link

Why not just do (2)? If it's useless for epub/docx people would simply not use it for that format. It still helps with the other formats, though.

@tarleb
Copy link
Collaborator

tarleb commented Aug 20, 2016

The benefit of option (2) is that it's consistent. I experimented a bit, here is a proof of concept for an equivalent but slightly different approach: How about simply supporting the yaml_metadata_block extension for more readers? The linked code implements it for the org reader, but with the restriction that the YAML block is allowed at the top of the document only. The approach builds on existing options and is basically identical to (2) as one can simply cat the two files together.

@tarleb
Copy link
Collaborator

tarleb commented Aug 20, 2016

Extended PoC, adding YAML support to Org, RST, and LaTeX.

@tarleb
Copy link
Collaborator

tarleb commented Aug 22, 2016

No opposing opinions have been voiced yet, so I opened PR #3084 for this.

@iandol
Copy link
Contributor

iandol commented Aug 25, 2016

What about the priority of which metadata variable to use when the same variable is specified twice: I want to have a default YAML meta-data file for all conversions, but if there is YAML metadata in the source file (markdown), then that gets priority. So for example define a standard mainfont in metadata.yaml but this could be overridden using the YAML block in the source.md file when converting to LaTeX?

@tarleb
Copy link
Collaborator

tarleb commented Aug 25, 2016

What you want is already doable by passing the defaults containing YAML file as the last argument:

pandoc -f markdown -s your-input-file.md defaults.yaml

Metadata definitions seen first are kept and left unchanged, even if conflicting data is parsed at a later point.

@iandol
Copy link
Contributor

iandol commented Aug 26, 2016

OK, I had tried that previously and it never worked, but just found that it was a small error in my Markdown file (my last line was a figure block and not terminated with a newline which caused the yaml to become appended as plain text). Adding a couple of newlines and yaml is now correctly parsed with the document metadata correctly taking precedence. Thank you!

@ickc
Copy link
Contributor

ickc commented Jan 17, 2017

I came across this issue from pandoc-discuss, and I find a way that kind of work currently. The idea is to convert the source yml and source document to native first and cat them together (plus a little detail):

pandoc -f markdown -t native -s metadata.yml | sed '$ d' > metadata.native
pandoc -f <fromFormat> -t native -o document.native document.<fromFormat>
pandoc -f native -t <toFormat> -s -o document.<toFormat> metadata.native <document>.native

The extra detail is the sed, because the metadata.yml is regarding as a markdown document with no body, so the last line of the file is [], which you need to remove. Another way of removing it is head -n -1 (would not work on Mac's default head). From my test it seems the meta in native is always in one-line, if true then head -n1 will work (which also works on Mac).

Any cli options should be added to the last line only (to avoid having extra metadata somewhere else).

This approach is kind of hacky since metadata.native can only contain meta and document.native cannot contain one. And the syntax in native is not well-known so I'm not sure if there's any other gotcha.

But it seems this is the only currently working method (alternatively one can convert the document to markdown first and cat from there, but the extra conversion can introduce extra loss.)

Edit: Fixed some typos and add some more comments:

  1. this is basically @tarleb's (1), while working now. The 3 lines are long, but a thin wrapper using shell script or a makefile can hide them away.

  2. Unix only, since I used the shell. But the idea should be applicable to Windows too.

  3. The script above is a sketch. But I tested the idea on real documents to verify it works.

@bpj
Copy link

bpj commented Jan 18, 2017

Why must the text in a metadata file necessarily be interpteted as any format rather than as plain text?

One alternative would be to make a top-level field like _metadata_format: markdown 'magic'.

@jgm
Copy link
Owner

jgm commented Jan 19, 2017 via email

@ickc
Copy link
Contributor

ickc commented Jan 28, 2017

I made the 3 commands I suggested above shorter. This require bash though (using process substitution).

YAML=metadata.yml; INPUT=document.md; OUTPUT=document.pdf
pandoc -f native -s -o $OUTPUT <(pandoc -f markdown -t native -s $YAML | sed '$ d') <(pandoc -t native $INPUT)

I will later add it to Pandoc Tricks · jgm/pandoc Wiki.

@jgm
Copy link
Owner

jgm commented Nov 16, 2017

See https://groups.google.com/d/msg/pandoc-discuss/6KLbZk7NVWk/0XMWewhLCQAJ
for a way to do this using lua filters.

@mb21
Copy link
Collaborator

mb21 commented Nov 17, 2017

Which markup would be allowed for text in the YAML file?

It could be argued that if you want to use a specific format to specify metadata, you should use that format's metada block syntax inside the document (e.g. .. meta:: for RST). If that doesn't work for you for some reason, you can use an external YAML file but at that point you have to learn both YAML and markdown. This would at least keep this mechanism simple and predictable.

If you absolutely must, you can also use generic raw snippets and use whatever syntax you like inside "markdown".

@mb21
Copy link
Collaborator

mb21 commented Mar 27, 2018

I stand by my last comment: let's introduce a --metadata-file option that takes a YAML file (or JSON file, determined by file suffix) where the strings are interpreted as markdown. (Definitions in the file have lower priority than the ones inside the document, solving #3115.)

We can always add more things later, like:

  • parsing .. meta:: in RST or <meta> in HTML (which would act analogous to the current YAML metadata blocks in markdown)
  • adding an additional option that specifies the markup language the metadata is interpreted as (overriding the default which would be set to markdown).

@jgm
Copy link
Owner

jgm commented Mar 27, 2018

I think I like @mb21's suggestion. It's simple, and it would help in some of the practical cases described above.

@ssolidus
Copy link

ssolidus commented Mar 29, 2018

Re: the thread, I have been using gfm+yaml_metadata_block and passing in a .yml file in the inputs. Or, I use --include-in-header=$file.tex.
@jgm re: use-cases, a very common one you can find a lot of instances of on forums, TeX StackOverflow and so on is the ability the text-wrap code in fenced code fields, as well as apply other styling information to it, such as line numbers. This is highly desirable in many different kinds of documentation, but there is currently no practical way to do it.

What I would like to see is the equivalent of stuff like --variable urlcolor=$color for more/all LaTeX options (at least the styling ones), or, as mentioned above, the ability to pass through custom LaTeX options more easily than is currently possible.

A problem with the JSON/YML solution is it is more technical than a lot of users require, and so a lot of people would simply give up and move to another solution than continuing to fiddle with Pandoc arguments and config files.

@jgm
Copy link
Owner

jgm commented Mar 29, 2018

@jgm re: use-cases, a very common one you can find a lot of instances of on forums, TeX StackOverflow and so on is the ability the text-wrap code in fenced code fields, as well as apply other styling information to it, such as line numbers. This is highly desirable in many different kinds of documentation, but there is currently no practical way to do it.

Sorry, I didn't understand this comment or what it has to do with the topic of this thread.

What I would like to see is the equivalent of stuff like --variable urlcolor=$color for more/all LaTeX options (at least the styling ones), or, as mentioned above, the ability to pass through custom LaTeX options more easily than is currently possible.

This is just a matter of template design. You can always create a custom template that allows you to control some LaTeX option with a variable. And you can also propose modifications to the default template along these lines.

@mb21
Copy link
Collaborator

mb21 commented Mar 31, 2018

I had a quick look at implementing this, but unfortunately the YAML parsing is quite intertwined with the rest of the Markdown reader.

This is due to the fact that we share state between the YAML metadata block and the rest of the markdown document (I'm guessing for footnotes etc?). This is not going to happen when the YAML is read in from an external file and merged with the document metadata after the reader has produced a Pandoc Meta [Block], and it wouldn't work for other input formats anyway. Thus we'll just have to make users aware that there's a small difference between pandoc --metadata-file m.yaml input.md and pandoc m.yaml input.md.

Still, we have a choice:

  1. either we refactor the existing YAML parsing and export it as a function from the Markdown reader: PandocMonad m => yamlToMeta :: Yaml.Value -> m Meta (or even one taking a ByteString so we could reuse the decoding with error handling). Then all Strings in the YAML metadata file would share one markdown reader state.
  2. or we reimplement the actual YAML parsing somewhere else and apply readMarkdown to each String individually, in which case state wouldn't be shared (and possibly a few more inconsistencies might pop up).
  3. Finally, we could even do (2), but also use the new implementation in the Markdown reader. This seems the cleanest solution (especially if we'd want to parse other syntax than markdown in the future), but possibly might break some existing documents in subtle ways?

I'm unsure what the implications of (not) sharing ParserState are, in practice, with regard to markdown parsing...

@jgm
Copy link
Owner

jgm commented Mar 31, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.