-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Ability to specify a YAML metadata file for all reader types #1960
Comments
When this gets suggested, people generally suggest to use Makefiles. |
One place where this would be useful is to allow the reading of YAML bibliography files. At the moment (unless I'm mistaken), a YAML references file (as you might insert in the document directly) is not an accepted input format for pandoc-citeproc. This makes it difficult to share a YAML bibliography file between multiple documents. If pandoc could bring in arbitrary YAML files, then both documents could bring in the same YAML bibliography file. Note: it is desirable (to me) to use the YAML bibliography because it supports URL references better than other formats. |
If YAML bibliographies are not accepted by pandoc-citeproc,
they could certainly be added. Open a ticket on jgm/pandoc-citeproc,
once you've confirmed that they don't work (I can't recall whether
they do).
|
I double-checked and it seems that external YAML files are supported via both the --bibliography option and via inline YAML "bibliography: " when --filter pandoc-citeproc is specified, provided the YAML file is correctly formatted - there is no warning or error if it's not formatted correctly. I raised this point because this is not documented as supported at http://johnmacfarlane.net/pandoc/README.html in the citations extension section (doc has no anchors). So would this suggest I open both a documentation ticket against pandoc, and a ticket against pandoc-citeproc for the lack of warnings? Or is pandoc itself suppressing them? |
Yes, that sounds right. And this can then be closed. |
I'd still prefer to write metadata in YAML over XML/Dublin Core. Wouldn't this be possible to parse even when LaTeX or something else is the input format? |
+1 for this. If it's too hard to parse in the input document, an easier option could be to specify This is useful when using non-default templates. For example, I am trying to generate docbook output from rst input, and adding |
Allowing for an additional YAML meta-data file would bring one problem: Which markup would be allowed for text in the YAML file? I see three possibilities, neither of which I really like:
|
What about 2 but do 1 for formats like epub, docx etc? |
Let's call that 4. Though better than the other three, it would still be inconvenient for users unfamiliar with Markdown. |
Why not just do (2)? If it's useless for epub/docx people would simply not use it for that format. It still helps with the other formats, though. |
The benefit of option (2) is that it's consistent. I experimented a bit, here is a proof of concept for an equivalent but slightly different approach: How about simply supporting the |
Extended PoC, adding YAML support to Org, RST, and LaTeX. |
No opposing opinions have been voiced yet, so I opened PR #3084 for this. |
What about the priority of which metadata variable to use when the same variable is specified twice: I want to have a default YAML meta-data file for all conversions, but if there is YAML metadata in the source file (markdown), then that gets priority. So for example define a standard |
What you want is already doable by passing the defaults containing YAML file as the last argument:
Metadata definitions seen first are kept and left unchanged, even if conflicting data is parsed at a later point. |
OK, I had tried that previously and it never worked, but just found that it was a small error in my Markdown file (my last line was a figure block and not terminated with a newline which caused the yaml to become appended as plain text). Adding a couple of newlines and yaml is now correctly parsed with the document metadata correctly taking precedence. Thank you! |
I came across this issue from pandoc-discuss, and I find a way that kind of work currently. The idea is to convert the source yml and source document to native first and pandoc -f markdown -t native -s metadata.yml | sed '$ d' > metadata.native
pandoc -f <fromFormat> -t native -o document.native document.<fromFormat>
pandoc -f native -t <toFormat> -s -o document.<toFormat> metadata.native <document>.native The extra detail is the Any cli options should be added to the last line only (to avoid having extra metadata somewhere else). This approach is kind of hacky since But it seems this is the only currently working method (alternatively one can convert the document to markdown first and Edit: Fixed some typos and add some more comments:
|
Why must the text in a metadata file necessarily be interpteted as any format rather than as plain text? One alternative would be to make a top-level field like |
+++ Benct Philip Jonsson [Jan 18 17 12:11 ]:
Why must the text in a metadata file necessarily be interpteted as any
format rather than as plain text?
Well, if you're writing an abstract for example, it's nice
to be able to include formatting. If you have a title with
math in it, you'd probably like to include math. Plain text
is too limiting.
One alternative would be to make a top-level field like
_metadata_format: markdown 'magic'.
This is prioritizing the less common case over the more
common one. People will find it confusing if *this*
makes emphasis in the body of the text but not the title.
Better to provide a special way to create a "raw"
metadata field when this is needed. See #2139 for that.
|
I made the 3 commands I suggested above shorter. This require bash though (using process substitution). YAML=metadata.yml; INPUT=document.md; OUTPUT=document.pdf
pandoc -f native -s -o $OUTPUT <(pandoc -f markdown -t native -s $YAML | sed '$ d') <(pandoc -t native $INPUT) I will later add it to Pandoc Tricks · jgm/pandoc Wiki. |
See https://groups.google.com/d/msg/pandoc-discuss/6KLbZk7NVWk/0XMWewhLCQAJ |
It could be argued that if you want to use a specific format to specify metadata, you should use that format's metada block syntax inside the document (e.g. If you absolutely must, you can also use generic raw snippets and use whatever syntax you like inside "markdown". |
I stand by my last comment: let's introduce a We can always add more things later, like:
|
I think I like @mb21's suggestion. It's simple, and it would help in some of the practical cases described above. |
Re: the thread, I have been using What I would like to see is the equivalent of stuff like A problem with the JSON/YML solution is it is more technical than a lot of users require, and so a lot of people would simply give up and move to another solution than continuing to fiddle with Pandoc arguments and config files. |
Sorry, I didn't understand this comment or what it has to do with the topic of this thread.
This is just a matter of template design. You can always create a custom template that allows you to control some LaTeX option with a variable. And you can also propose modifications to the default template along these lines. |
I had a quick look at implementing this, but unfortunately the YAML parsing is quite intertwined with the rest of the Markdown reader. This is due to the fact that we share state between the YAML metadata block and the rest of the markdown document (I'm guessing for footnotes etc?). This is not going to happen when the YAML is read in from an external file and merged with the document metadata after the reader has produced a Still, we have a choice:
I'm unsure what the implications of (not) sharing |
Mauro Bieg <notifications@github.com> writes:
block and the rest of the markdown document (I'm guessing for
footnotes etc?).
Yes, exactly.
This is not going to happen when the YAML is read in
from an external file and merged with the document metadata after the
reader has produced a `Pandoc Meta [Block]`, and it wouldn't work for
other input formats anyway. Thus we'll just have to make users aware
that there's a small difference between `pandoc --metadata-file m.yaml
input.md` and `pandoc m.yaml input.md`.
Agreed.
Still, we have a choice:
1. either we refactor the existing YAML parsing and export it as a
function from the Markdown reader: `PandocMonad m => yamlToMeta ::
Yaml.Value -> m Meta` (or even one taking a `ByteString` so we could
reuse the decoding with error handling). Then all Strings in the YAML
metadata file would share one markdown reader state.
This seems simplest to me, and I don't see a drawback to sharing
state. This way, for example, you could define footnotes and
reference links within the yaml metadata file. Of course they'd
only work within that file, but still people might expect they
can do this. Is there a downside?
3. Finally, we could even do (2), but also use the new implementation
in the Markdown reader. This seems the cleanest solution (especially
if we'd want to parse other syntax than markdown in the future), but
possibly might break some existing documents in subtle ways?
If other syntaxes are the issue, then we might try to decouple
the markdown-specific parts of the function from the parts that
deal with YAML. Perhaps the reader could be passed in as a
function? Maybe we could do this in such a way that we
don't hard-code use of ParserState?
|
An idea I thought would be useful. Many of the Readers have little or no way to set metadata, and the -M option on the commend line only accepts strings and not arbitrary YAML. Right now you can include a separate file of YAML metadata for Markdown formats (which is simple concatenated with the markdown files during parsing.)
This idea would be to specify a metadata file on the command line ("pandoc -Y metafile.yml" or something), which would be parsed separately and the contents added to the document metadata, regardless of the input file type.
Thanks
The text was updated successfully, but these errors were encountered: