Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use knitr::spin() with R and Python code in one document #1773

Open
3 tasks done
fdetsch opened this issue Nov 18, 2019 · 12 comments
Open
3 tasks done

Use knitr::spin() with R and Python code in one document #1773

fdetsch opened this issue Nov 18, 2019 · 12 comments
Labels
feature Feature requests

Comments

@fdetsch
Copy link

fdetsch commented Nov 18, 2019

Question directly copied from StackOverflow:

With the advent of reticulate, combining R and Python in a single .Rmd document has become increasingly popular among the R community (myself included). Now, my personal workflow usually starts with an R script and, at some point, I create a shareable report using knitr::spin() with the plain .R document as input in order to avoid code duplication (see also Knitr's best hidden gem: spin for more on the topic).

However, as soon as Python code is involved in my analysis, I am currently forced to break this workflow and manually convert (ie. copy and paste) my initial .R script into .Rmd before compiling the report. I wonder, does anybody know whether it is – or for that matter, will ever be – possible to make knitr::spin() work with both R and Python code chunks in a single .R file without taking this detour? I mean, just like it works when mixing the two languages, and exchanging objects between them, in a .Rmd file. There is, at least to the best of my knowledge, no possibility to add something like engine = 'python' to spin documents at the moment.


By filing an issue to this repo, I promise that

  • I have fully read the issue guide at https://yihui.name/issue/.
  • I have provided the necessary information about my issue.
    • If I'm asking a question, I have already asked it on Stack Overflow or RStudio Community, waited for at least 24 hours, and included a link to my question there.
    • If I'm filing a bug report, I have included a minimal, self-contained, and reproducible example, and have also included xfun::session_info('knitr'). I have upgraded all my packages to their latest versions (e.g., R, RStudio, and R packages), and also tried the development version: remotes::install_github('yihui/knitr').
    • If I have posted the same issue elsewhere, I have also mentioned it in this issue.
  • I have learned the Github Markdown syntax, and formatted my issue correctly.

I understand that my issue may be closed if I don't fulfill my promises.

@cderv
Copy link
Collaborator

cderv commented Nov 18, 2019

I find this an interesting question. It make me dig into spin function.

For now, spin is used to convert an Rscript to a literate programming document. One fact here is that an Rscript can only contains comments and R code. Thus, spin uses internally the R base parser parse() to do its magic

knitr/R/spin.R

Line 69 in dccdad7

parsed_data = getParseData(parse(text = x, keep.source = TRUE))

and parse from R base only know how do deal with R code.

Independently of knitr and spin, I don't think an R file script can contain python code too. It works in literate programming format like Rmd but not in scripted filed format. You would not put literal R code in a .py script, and thus no python code in .R script. Seems fair to me.

I think all this makes difficult for R script to contains anything else than R code. Technically, with knitr you can change engine with the engine option (see knitr engines) but it would not work with spin because it will try to parse all code as R code before even looking a the engine.

Example

here is an example of what is blocking here. I put this in test.R

#+ setup, include = FALSE
library(reticulate)
use_miniconda()

# Using python directly
#+ engine = python
import os
os.getenv("RSTUDIO_PANDOC")

# Using R with reticulate
os <- import("os")
os$getenv("RSTUDIO_PANDOC")

We get this error

Error in parse(text = x, keep.source = TRUE) : 
  <text>:7:8: unexpected symbol
6: #+ engine = python
7: import os
          ^

This is because what we know of being python code is parsed by R as R code and it is invalid R code.

Solutions ?

If we really want to have an equivalent of spin (comments and codes only instead of texts and code chunks) for multi engine code, I think it would require a new format (.Rmix ?) I think with its own parsing logic to identify non R code in the uncommented parts and process them correctly.
However it would be really similar to Rmd format I guess and not easy to maintain 🤔

Out of curiosity, why do you prefer spin and R script file when you are working on python + R project instead of using Rmd files directly ?
Rmd files are one of the good container to mix both languages in order to produce report.

Also, the answer you got on SO is really interesting: using source_python in R script to mix python and R. this is like child document in Rmarkdown where you split you report in several reusable document. Pretty clever and really easy !

I am just sharing some thoughts here on this topic to contribute to the discussion.
Hope it helps.

@fdetsch
Copy link
Author

fdetsch commented Nov 19, 2019

Thanks for sharing your thoughts on that. I fully agree that it's a fair split that R and Python have their own places in .R and .py files, respectively. With the seamless integration of Python in .Rmd at hand (and probably lacking in-depth knowledge of the underlying mechanisms), I just became curious whether a similar integration could be feasible for spin(). Call it wishful thinking 😉

As regards your question: for me coming from the R side, it feels like a more native approach to start out with a plain .R script. My files usually include >95% code (mostly R) vs. <5% (mostly informal) comments, which renders spin() the ideal solution. In .Rmd, you need to explicitly insert a code chunk whenever you want to perform actual coding. Maybe I am just lazy about writing, but this seems a little over the top for my purposes where results are mostly conveyed via tables and figures, which can conceivably simple be accomplished using spin().

@yihui
Copy link
Owner

yihui commented Nov 22, 2019

#+ engine="python" should have worked. The PR #1605 made it fail. This is a known bug, as I replied at #1605 (comment). Since @Hemken didn't file a new issue, I have completely forgotten it. Sorry.

That said, I probably won't have time to fix it in the near future...

@cderv
Copy link
Collaborator

cderv commented Nov 22, 2019

Oh thanks ! I did not noticed that.

It was not intuitive for me that an .R script should contain several languages in the same file using knitr features. I am rather expecting that special multi-language format are in special files like .Rmd to indicate clearly that the file can't be run like a Rscript (i.e. Rscript -f my-py-and-Rcode.R) but requires a special tool, here knitr. (.Rk, .Rknitr or else). So I did not think of it as bug.

My opinion being shared (☺️ ) and know that I know this is a bug, I can dig into that during my spare time. It does not mean just now but in a near future maybe closer than yours 😉

@Hemken
Copy link
Contributor

Hemken commented Nov 22, 2019 via email

@yihui
Copy link
Owner

yihui commented Mar 28, 2024

With 74bcff8, using engine = "python" should work now, although I tend to agree with @cderv above that it doesn't feel right to have both Python and R code in the same script.

cat(spin(knit = FALSE, text = '#+ setup, include = FALSE
library(reticulate)
use_miniconda()

# Using python directly
#+ engine = "python"
import os
os.getenv("RSTUDIO_PANDOC")

# Using R with reticulate
os <- import("os")
os$getenv("RSTUDIO_PANDOC")'), sep = '\n')
```{r setup, include = FALSE}
library(reticulate)
use_miniconda()

# Using python directly
```
```{r engine = "python"}
import os
os.getenv("RSTUDIO_PANDOC")

# Using R with reticulate
os <- import("os")
os$getenv("RSTUDIO_PANDOC")
```

@katrinabrock
Copy link

I might be missing something...is there currently a way to spin a fully python script into Rmd?

As in start with example.py that contains

#' Example text

# example comment
print([i for i in 'abcdefg'])

Run something like knitr::spin('pyexample.py', knit = FALSE) and end up with example.Rmd something like:

Example text

```{python}
# example comment
print([i for i in 'abcdefg'])
```

The closest I've got is adding this to the top of my python file:

#+ eval=TRUE, include=FALSE
knitr::opts_chunk$set(engine = 'python')

However, here I'm adding some R to a python script, so the script can no longer run fully on its own. I know one way is to set the option at the knit stage, but I would like to produce an Rmd script that runs on its own.

@Hemken
Copy link
Contributor

Hemken commented Oct 7, 2024 via email

@cderv
Copy link
Collaborator

cderv commented Oct 7, 2024

@katrinabrock currently, #+ engine = 'python' needs to be set on each part of the code. No way to globally set it I think. cc @yihui - should we consider python if spin is on a .py file ?

#' Example text
#' 

#+ engine='python'
# example comment
print([i for i in 'abcdefg'])

I think the concept here is that you skip the Rmd file – it is never produced.

@Hemken with spin() a Rmd file is produced, and then rmarkdown::render() called on it.

@katrinabrock
Copy link

katrinabrock commented Oct 8, 2024

@cderv
Indeed #+ engine = 'python' only applies to the subsequent chunk (and is ignored by python 👍 ), but running knitr::opts_chunk$set(engine = 'python') inside a chunk sets it for the whole document (or until unset). This behavior is documented here.

I'm trying to think if there is a creative way...maybe with multiline quotes and/or adjusting the spin regexes ...to get python interpreter to ignore that line while spin can see it.

EDIT: Here's the best I've come up with so far:

# /*
''' # */
#+ eval=TRUE, include=FALSE
knitr::opts_chunk$set(engine = 'python')
# /*
''' # */

It's super ugly, but it does result in both not creating a python syntax error (or any behavior change), and adding adding the following to the .Rmd (which results in subsequent {r} blocks interpretted as python).

```{r eval=TRUE, include=FALSE}
knitr::opts_chunk$set(engine = 'python')
```

@cderv
Copy link
Collaborator

cderv commented Oct 8, 2024

running knitr::opts_chunk$set(engine = 'python') inside a chunk sets it for the whole document (or until unset). This behavior is documented here.

I know that, but this is R code, so it will lead to R cells in the .Rmd created. I thought you did not want that. Especially because script is .py

I think the best way would be to maybe consider that all code in .py script are to be place in a engine = "python" code cell.

But that is not how spin() works for now unfortunately

@katrinabrock
Copy link

Yes, the result of my workaround in the Rmd is there is one (real, hidden) R cell that sets to the option. Then all the rest of the cells are "R" cells with {r}, but they contain python code and the python code runs successfully when I knit. To me, for .py files this is better than the workaround of adding #+ engine = "python" to each cell because I would then have to sprinkle that line all over my script and if I missed a spot, knitting would fail. With the crazy ''' # */ chunk above, my .py file even still runs cleanly by itself because the R code it contains is sequestered into a string. (But released at the spin stage.)

Indeed, I would prefer a "real" fix where spin recognizes that this is a .py file and inserts {python} instead of {r} and thus neither my workaround nor yours would be needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants