Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source map for code blocks #4657

Closed
ehildenb opened this issue May 16, 2018 · 9 comments
Closed

Source map for code blocks #4657

ehildenb opened this issue May 16, 2018 · 9 comments

Comments

@ehildenb
Copy link

I found this discussion on pandoc-discuss: https://groups.google.com/forum/#!searchin/pandoc-discuss/source$20line$20numbers$20for$20code%7Csort:date/pandoc-discuss/pDs41Da4KjA/SKt35OWyhxAJ

Did anything ever become of it? We use pandoc for literate programming in the K language without having to write a literate programming language (Markdown sources, generate K files). See examples at https://github.com/kframework/evm-semantics.

Currently we use pandoc-tangle, a filter at https://github.com/ehildenb/pandoc-tangle, for selecting code blocks to be included in each resulting K file (basically pandoc-tangle is a Writer for code). For each other line of input, blank lines are generated to make the target K files have the same line numbers as the source markdown. This is a bit fragile, and assumes that people write "sane-ish" markdown.

A better solution would be to have access to the source information of each CodeBlock as attributes. I have a patch which adds source attributes to CodeBlocks in the Markdown reader, along with a command-line flag --insert-source-attrs and reader option readerInsertSourceAttr. I'm working on tests, let me know if this has a chance getting merged, or if a different approach should be taken.

I've thought about whether it makes sense to add these source attributes for all Block which contain Attr, and would be willing to do that as well, though I don't think it will be useful for literate programming using Markdown.

@jgm
Copy link
Owner

jgm commented May 16, 2018

I can see the use of this. It might make more sense to develop it as a markdown extension, so you'd do -f markdown+source_attributes or something like that (+code_source_attributes?).

Adding source attributes to everything is impossible, since only a few things currently have attributes. Adding just to code blocks seems a bit ad hoc -- it's useful for you, but how generally useful is it? So I'm not sure what to think about this. (Of course, I wish I'd designed for source position information from the beginning, and I am doing that in the brand new markdown parsing library I'm working on, commonmark-hs.)

@ehildenb
Copy link
Author

I thought about doing it as an extension, but the file which has extensions seemed to indicate that that extensions are specifically for syntactic changes in the source, and this does not change the Markdown syntax, just the data collected at parse time. I don't really mind implementing it either way.

I could be persuaded about adding this to more Block or just to CodeBlock, but am not sure either way. I can only think of my use (which is literate programming), and for that I want accurate mappings back to Markdown line numbers, which can be inserted using http://gcc.gnu.org/onlinedocs/cpp/Line-Control.html or the like (to be done in pandoc-tangle, not pandoc) before writing each CodeBlock. So I have no need to have source attributes on anything but CodeBlocks. In particular, I guess knowing for Header might be useful.

In terms of general usefulness, I guess I've been thinking of pandoc as my literate programming tool for a few years now. pandoc-tangle has seen several iterations, starting as a tool for making it so I could specify (i) assignment given code, (ii) assignment example tests, (iii) assignment solutions, and (iv) assignment PDF spec, all as one document for a PL course I was helping with. I like to think of it as "language independent literate programming", similar to org-mode or WEB/noweb, but with markdown sources (so displays nice online).

I also have a patched version of Pandoc which adds "code-writers" directly, so that you can do --to sh for instance, and it will comment out non-sh-code-blocks using # appropriately. It was a bit too much to ask people to install that instead of downloading the Lua filter though.

I think I agree that the "correct" way to do it is to have every node of the Block and Inline AST have enough parse-time information stored that it's possible to have pandoc --from X --to X --preserve-parse == cat.

@gpoore
Copy link

gpoore commented May 16, 2018

This would be very useful, even if only CodeBlocks get source information. I'm currently working on a program that will allow code blocks in arbitrary languages to be executed and optionally include stdout and stderr in the filtered markdown. Without something like this, I'll be parsing the --trace output or using regex to try to determine source information.

The one additional thing that would be helpful for my case is source information for inline Code.

@ehildenb
Copy link
Author

@gpoore have you seen Panpipe? https://hackage.haskell.org/package/panpipe

@gpoore
Copy link

gpoore commented May 16, 2018

@ehildenb No, I hadn't seen that. Thanks!

I think what I'm working on is more general. I want the option to have, say, all Python code blocks and inline code executed within a single Python session, so that all variables and data are shared. I've done something similar in the past in LaTeX with PythonTeX, and now want something similar that's based on markdown. I'm also hoping to have optional support for Jupyter kernels to manage the execution.

@jgm
Copy link
Owner

jgm commented May 16, 2018

thought about doing it as an extension, but the file which has extensions seemed to indicate that that extensions are specifically for syntactic changes in the source, and this does not change the Markdown syntax, just the data collected at parse time. I don't really mind implementing it either way.

It would be like the auto_identifiers extension, which doesn't change the syntax but adds additional information to attributes.

Maybe for now adding a code_block_source_lines extension would make sense.

@ehildenb
Copy link
Author

Ok, I will do it like that.

@mb21
Copy link
Collaborator

mb21 commented May 17, 2018

btw, this issue is a subset of #4565, right?

@ehildenb
Copy link
Author

ehildenb commented May 17, 2018

If #4565 was a PR and not an issue, I would agree (though maybe there is an associated PR, see #4659 ). It looks promising though, so maybe a bit more discussion. But I don't think we should halt progress to find the perfect solution for this, incremental approaches seem better. Will continue discussion over there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants