Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce convert_chunk_header() to convert fenced chunk head to block chunk option #2149

Merged
merged 21 commits into from
Jul 21, 2022

Conversation

cderv
Copy link
Collaborator

@cderv cderv commented Jul 11, 2022

This PR introduces a new function to allow a user to convert all chunks to new syntax.
https://yihui.org/en/2022/01/knitr-news/

First supported conversion is to multiline R chunk option, but YAML chunk header could be supported next.

To introduce the feature, I refactor some small pieces of already existing parsing code in some utility function.

Target format is one option per line.

@yihui opening this PR so that you can already give some thoughts and feedback

  • You mentioned getParseData() would be useful to get one per line, but it seems I manage to have something working without it. Did I miss anything ?

  • For the label part, do we want unexplicit or explicit ?

#| label = "first-chunk",
#| echo = FALSE

or

#| first-chunk,
#| echo = FALSE

For now, I did the first;

  • What behavior do we want for the function:
    • Overwrite by default ?
    • Output in console ?
    • Ask interactively if overwrite should happen ?

Also for checks, I am aiming the function to .Rmd and .qmd for this. I think that is ok.

I need to test a bit more, and also see if I could derive YAML syntax from this implementation

@cderv cderv linked an issue Jul 11, 2022 that may be closed by this pull request
3 tasks
Copy link
Owner

@yihui yihui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • You mentioned getParseData() would be useful to get one per line, but it seems I manage to have something working without it. Did I miss anything ?
  • For the label part, do we want unexplicit or explicit ?

I see that you parsed the options. In that case, there is no need to use getParseData(). Originally I was thinking of simply extracting the chunk options as a single string, and then applying strwrap(prefix = '#| '). If users want one option per line, we'll need to figure out which commas are true separators, e.g., getParseData(parse(text = 'alist(foo, bar=1, baz="a,b,c")')) will tell you.

I feel you don't have to start with Rmd but make a general converter (Rmd does need a tiny special treatment, though). What I'd do is:

  1. Read the file.
  2. detect_pattern() and determine the regular expression chunk.begin from all_patterns.
  3. Extract the chunk options and strwrap() them.
  4. Append the wrapped #| options and write out the content.

We can consider the two cases later in different PRs: one option per line (comma-separated), and converting to YAML.

  • What behavior do we want for the function:

Have an argument output = NULL, which means writing output to console. Users can specify the output file path, in which case we write to that file. Alternatively, it can take a function that takes the input value and returns a file path, e.g., output = identity means overwriting the original input file.

* detect pattern on file extension
* simplify for now and use strwrap()
* correctly handle output feature: default to console and allow identity for overwriting
@cderv
Copy link
Collaborator Author

cderv commented Jul 12, 2022

Thanks for your feedback. That was mainly my idea at first, but I was not sure if making it generic would be expected.

applying strwrap(prefix = '#| ')

I have redone a solution with that. I did that at first, but what I don't like is that it is completely splitting in the middle of the code. I don't think anyone would use like that - they would split on comma. Example for long chunks

```{verbatim}
#| lang = "markdown", code =
#| stringr::str_trim(knitr::knit_child("examples/py-engine.Rmd",
#| quiet = TRUE))
```

from

```{verbatim, lang = "markdown", code = stringr::str_trim(knitr::knit_child("examples/py-engine.Rmd", quiet = TRUE))}
```

Have an argument output = NULL, which means writing output to console. Users can specify the output file path, in which case we write to that file. Alternatively, it can take a function that takes the input value and returns a file path, e.g., output = identity means overwriting the original input file.

I always forgot about this trick of using a function. So now output can be

  • NULL which will write to console
  • a function which will apply on input
  • a character which will be the output file path.

Copy link
Owner

@yihui yihui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall. Thanks!

R/parser.R Outdated Show resolved Hide resolved
R/parser.R Outdated Show resolved Hide resolved
R/parser.R Outdated Show resolved Hide resolved
@cderv
Copy link
Collaborator Author

cderv commented Jul 15, 2022

@yihui after giving more thoughts and trying the getParseData() way, I switched back to parse_params() with calling deparse(). It seems to me it give either formatting that trying to split the result of getParseData() on commas. I ended selecting some commas in the middle of some function or list, and did not like that.

This way seems cleaner to me. What do you think ?

There is still some room to improve line wrapping for multiline type so that very long option gets split on several lines too with the comment pipe (#|) added.

You can review once more maybe, then I'll document on monday, and do further test. We can add YAML support later.

@cderv
Copy link
Collaborator Author

cderv commented Jul 19, 2022

@yihui I just saw that Rnw file does not have an engine unlike Rmd. Is engine setup only for .Rmd document ?
As we are supposed to handle different pattern with detect_pattern, I guess we need to account for all the syntax.

It seems I need to take this into account:

knitr/R/parser.R

Lines 73 to 80 in ca398c2

parse_block = function(code, header, params.src, markdown_mode = out_format('markdown')) {
params = params.src
engine = 'r'
# consider the syntax ```{engine, opt=val} for chunk headers
if (markdown_mode) {
engine = sub('^([a-zA-Z0-9_]+).*$', '\\1', params)
params = sub('^([a-zA-Z0-9_]+)', '', params)
}

Probably doing

markdown_mode = identical(patterns, all_patterns$md)

right ?

But if no engine, what are we suppose to use inside comment for none markdown document ?
For Rnw, this is %| I believe, (comment_chars$tikz), but what are the other non-Rmd extension supported by knitr.
Rhtml ?

I guess we have all of this to support

names(knitr:::all_patterns)
#> [1] "rnw"      "brew"     "tex"      "html"     "md"       "rst"      "asciidoc"
#> [8] "textile"

Thanks

cderv added 3 commits July 19, 2022 16:03
Chunk does not have an engine. They defaults to `r` engine for the comment char. (based on `partition_chunk` implementation.
@cderv
Copy link
Collaborator Author

cderv commented Jul 19, 2022

Ok. In fact based on partition_chunk I think I got it right by assuming not engine for non markdown doc and assigning engine = 'r' to get associated comment character

@cderv
Copy link
Collaborator Author

cderv commented Jul 19, 2022

Test in knitr-examples went well

files <- fs::dir_ls(regexp = "[0-9].*\\.(brew|R(nw|md|tex|html|rst|asciidoc|textile))")
purrr::walk(files, ~ { message(.x) ; convert_chunk_header(.x, output = identity, type = "multiline")})

and same for type = 'wrap'

I added a powershell script in knitr example to help me test everything. yihui/knitr-examples@f2cc2eb

@cderv cderv marked this pull request as ready for review July 19, 2022 15:13
@cderv cderv requested a review from yihui July 19, 2022 15:19
Copy link
Owner

@yihui yihui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some cosmetic changes, and I think it's ready to merge now. Thanks!

I might give type = "yaml" a try later.

@yihui yihui merged commit 553b692 into master Jul 21, 2022
@yihui yihui deleted the convert-chunk-header branch July 21, 2022 15:30
@cderv
Copy link
Collaborator Author

cderv commented Jul 21, 2022

Thanks !

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: function to convert chunk options to YAML-style
2 participants