Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode \u2139 from dplyr causes spin output to fail with latex #2231

Closed
ggrothendieck opened this issue Feb 21, 2023 · 20 comments
Closed

unicode \u2139 from dplyr causes spin output to fail with latex #2231

ggrothendieck opened this issue Feb 21, 2023 · 20 comments
Assignees

Comments

@ggrothendieck
Copy link

ggrothendieck commented Feb 21, 2023

Suppose we have file a.R. If we paste it into R it does result in a dplyr warning saying to use all_of(nms) instead of just nms but it runs and gives correct output.

library(dplyr)
nms <- names(BOD)
BOD %>% mutate(across(nms, scale))

Now suppose we run:

knitr::spin("a.R")
rmarkdown::render("a.md", "pdf_document")

This results in the following error

! LaTeX Error: Unicode character ℹ (U+2139)
               not set up for use with LaTeX.

Error: LaTeX failed to compile a.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See a.log for more info.

The problem is that the a.R source code shown above causes dplyr to issue a warning and that warning message contains unicode \u2139 . MiKTeX, tinytex and texlive all gave the error shown above on Windows 10. The bottom line is that one cannot spin dplyr code that has such warnings and I think all dplyr warnings contain that character.

Note that there was no \u2139 in the a.R source code so this was pretty mysterious until I realized what was going on.

@DavisVaughan
Copy link

I actually saw a similar thing when running revdep checks on dplyr with the flexsurv package. It notably uses Rnw based vignettes, and when the dplyr warning is thrown when rendering the vignette it seems to use the unicode based i and the LaTeX output doesn't like that.

My personal notes about this were:

Maybe it is because the cli helper cli::is_latex_output() returns FALSE here but should really be returning TRUE? Which I think is because knitr::is_latex_output() is accidentally returning FALSE?

> knitr:::is_latex_output
function () 
{
    out_format("latex") || pandoc_to(c("latex", "beamer"))
}

knitr::is_latex_output() says it works for Rnw but maybe there is a bug

Here is the check output from when I ran that awhile back

Package: flexsurv
Check: re-building of vignette outputs
New result: WARNING
  Error(s) in re-building vignettes:
    ...
  --- re-building 'standsurv.Rmd' using rmarkdown
  --- finished re-building 'standsurv.Rmd'
  
  --- re-building 'flexsurv.Rnw' using knitr
  --- finished re-building 'flexsurv.Rnw'
  
  --- re-building 'multistate.Rnw' using knitr
  Error: processing vignette 'multistate.Rnw' failed with diagnostics:
  Running 'texi2dvi' on 'multistate.tex' failed.
  LaTeX errors:
  ! LaTeX Error: Unicode character ℹ (U+2139)
                 not set up for use with LaTeX.
  
  See the LaTeX manual or LaTeX Companion for explanation.
  Type  H <return>  for immediate help.
  ! Emergency stop.
   ...                                              
                                                    
  l.1097 ℹ Please use `reframe()` instead.
                                            
  !  ==> Fatal error occurred, no output PDF file produced!
  --- failed re-building 'multistate.Rnw'
  
  --- re-building 'distributions.Rnw' using Sweave
  --- finished re-building 'distributions.Rnw'
  
  --- re-building 'flexsurv-examples.Rnw' using Sweave
  Loading required package: survival
  Forming integrated rmst function...
  Forming integrated mean function...
  Loading required package: TH.data
  Loading required package: MASS
  
  Attaching package: 'TH.data'
  
  The following object is masked from 'package:MASS':
  
      geyser
  
  --- finished re-building 'flexsurv-examples.Rnw'
  
  SUMMARY: processing the following file failed:
    'multistate.Rnw'
  
  Error: Vignette re-building failed.
  Execution halted

@ggrothendieck
Copy link
Author

ggrothendieck commented Feb 23, 2023

Apparently both MiKTeX and TeXLive includue the xetex engine whicih supports unicode. Don't know about tinytex. Is there some way to modify this R code to force the use of xetex?

knitr::spin("a.R")
rmarkdown::render("a.md", "pdf_document")

@cderv
Copy link
Collaborator

cderv commented Feb 27, 2023

@ggrothendieck yes using xelatex is required for Unicode characters support.

You can pass argument to pdf_document() format in two ways.

either using a complete format that would override any set in YAML field in document

rmarkdown::render("a.md", rmarkdown::pdf_document(latex_engine = "xelatex")) 

or add some options to override or set options to default or in addition to any format specified in Rmd document

rmarkdown::render("a.md", "pdf_document", output_options = list(latex_engine = "xelatex"))

Also note that you don't need to call spin on its own. If you call render() on a .R file, it will do spinning for you.

rmarkdown::render("a.R", "pdf_document", output_options = list(latex_engine = "xelatex"))

@DavisVaughan

Which I think is because knitr::is_latex_output() is accidentally returning FALSE

Did you observed that or do you think there could be a problem in the knitr function ? I re-read the code and we set the internal option for out_format("latex") to be TRUE when rnw is used.

l.1097 ℹ Please use reframe() instead.

I see this in your log. This seems to be the same issue reported in #2234 which I believe is cli still outputing some ANSI character in knitr output. Maybe related to r-lib/cli#581

Is there still issue with flexsurv that we can run to reproduce and see if this is other than what I mentioned just above ?

@ggrothendieck
Copy link
Author

ggrothendieck commented Feb 27, 2023

@cderv, Thanks! I tried it with TeXLive on Windows and it worked great. Also the tip about giving the .R file straight to render is really handy.

@DavisVaughan
Copy link

@cderv I don't think I reproduced it locally, that was from the revdepcheck result.

I imagine that you can probably reproduce it locally by forking it locally with:

usethis::create_from_github("chjackson/flexsurv-dev", "~/Desktop/r/playground/packages/")

and then running this git command to checkout the commit before the flexsurv author made the necessary changes to fix it

git checkout d369ce5bb41308384046bb20f45b4ff7a2f89ebe -b "testing"

and then running a devtools::check() with dplyr 1.1.0 installed.

I tried but got other errors like ! LaTeX Error: File xcolor.sty' not found. so I couldn't render the whole thing, but maybe you know how to get past that.

That commit corresponds to chjackson/flexsurv@d369ce5 which was right before these 2 commits which look to be targeted at fixing the UTF-8 issues:

@cderv
Copy link
Collaborator

cderv commented Feb 27, 2023

Great thank you I'll have a look

@cderv
Copy link
Collaborator

cderv commented Mar 1, 2023

@ggrothendieck can you try with dev cli package as it could have solved this issue also ( maybe with r-lib/cli#581) ? Thank you !

@ggrothendieck
Copy link
Author

@cderv, It is not clear to me what you are suggesting. How do I modify the code in my first post in this thread?

@cderv
Copy link
Collaborator

cderv commented Mar 2, 2023

@ggrothendieck you just need to install development version of cli (pak::pak("r-lib/cli") or remotes::install_github("r-lib/cli"))

Then dplyr should use this new version of cli in any context, including inside R Markdown document. No need to change anything to your code.

@ggrothendieck
Copy link
Author

install_github failed with non-zero exit status. Will try it once it is released to CRAN.

@yihui
Copy link
Owner

yihui commented Mar 7, 2023

@ggrothendieck Perhaps try install.packages("cli", repos = "https://r-lib.r-universe.dev")? r-universe.dev provides binaries for dev versions of packages.

@ggrothendieck
Copy link
Author

The installation from r-universe worked but rendering the code did not.

> knitr::spin("a.R")


processing file: a.Rmd

  |                                                          
  |                                                    |   0%
  |                                                          
  |.................                                   |  33%                  
  |                                                          
  |...................................                 |  67% (unnamed-chunk-1)
  |                                                          
  |....................................................| 100%                  
                                                                                                            
output file: a.md

> rmarkdown::render("a.md", "pdf_document")
"C:/PROGRA~3/CHOCOL~1/bin/pandoc" +RTS -K512m -RTS a.md --to latex --from markdown+autolink_bare_uris+tex_math_single_backslash --output a.tex --lua-filter "C:\Users\Louis\AppData\Local\R\win-library\4.2\rmarkdown\rmarkdown\lua\pagebreak.lua" --lua-filter "C:\Users\Louis\AppData\Local\R\win-library\4.2\rmarkdown\rmarkdown\lua\latex-div.lua" --embed-resources --standalone --highlight-style tango --pdf-engine pdflatex --variable graphics --variable "geometry:margin=1in" 
! LaTeX Error: Unicode character ℹ (U+2139)
               not set up for use with LaTeX.

Error: LaTeX failed to compile a.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See a.log for more info.

As previously in this thread I am using TeXLive on Windows and the following does work.

rmarkdown::render("a.md", "pdf_document", output_options = list(latex_engine = "xelatex"))

@cderv
Copy link
Collaborator

cderv commented Mar 7, 2023

Thanks I'll have a closer look about the difference with spin() and render().

In current situation, dplyr shows Unicode in message which requires xelatex or lualatex for PDF. Not so much an issue, just a config. We could try set an option based on latex_engine

I need to understand what the tidyverse stack is already doing.

Thanks

@cderv cderv self-assigned this Mar 7, 2023
@cderv
Copy link
Collaborator

cderv commented Mar 7, 2023

So following what Davis shared also above, cli should not use unicode when knitr::is_latex_output() is false.
https://github.com/r-lib/cli/blob/79119446955972eaadb07397764c8c039ef6e0c5/R/utf8.R#L13-L20

is_utf8_output <- function() {
  opt <- getOption("cli.unicode", NULL)
  if (! is.null(opt)) {
    isTRUE(opt)
  } else {
    l10n_info()$`UTF-8` && !is_latex_output()
  }
}

Unicode should not be used when LaTeX output is detected.

This explains why this works

rmarkdown::render("a.R", "pdf_document")

because the output is indeed LaTeX when R code from a.R is evaluated in the intenral spin()

But when this is run first

knitr::spin("a.R")

the output is .md - Following the cli detection above, if UTF8 platform is used, then unicode will be used (as this is not LaTeX output)

So if you run spin() operation on its own, you need to set the cli.unicode option to FALSE so that ANSI is used.

withr::with_options(
    list(cli.unicode = FALSE),
    knitr::spin("a.R")
)
rmarkdown::render("a.md", "pdf_document")

I don't think knitr can do much more than that. The .md resulting from spin() can contains unicode, this is what is done with it that does not support it. Moreover, render("a.R", "pdf_document") works as expected.

@DavisVaughan
Copy link

DavisVaughan commented Mar 7, 2023

@cderv, I guess something similar must happen when the flexsurv Rnw vignette is rendered? Like, it probably converts to some intermediate md first? So it looks like unicode is available? And then that is further converted to LaTeX, but the unicode is already in there by that point.

I do feel that since knitr seems to control that whole process of Rnw->md->LaTeX (assuming that is right), then knitr could still be in charge of ensuring that the intermediate result doesn't have unicode in it (since it knows it is going to be converted to LaTeX eventually)

@cderv
Copy link
Collaborator

cderv commented Mar 7, 2023

@DavisVaughan yes for Rnw to LaTeX I agree if this is indeed mixing. Different issue that this one here which spin() + render() though.

I'll look into this. Thanks for the input !

@cderv
Copy link
Collaborator

cderv commented Mar 7, 2023

So the issue is specific to flexsurv vignette multistate.Rnw

In the setup chunk, they are using render_sweave()

This has a side effect of modifying the out.format value in knitr

opts_knit$set(out.format = 'sweave')

which means is_latex_output() will return FALSE because it only detect latex

knitr/R/utils.R

Lines 373 to 375 in db4eafb

is_latex_output = function() {
out_format('latex') || pandoc_to(c('latex', 'beamer'))
}

I don't know if render_sweave() is supposed to be used in .Rnw vignette, or if this is quite specific to flexsurve.

@yihui should we add sweave out.format inside is_latex_output() ? Or is it not there for some reason ? Don't know much about Sweave, that is why I am asking.

knitr::is_latex_output() says it works for Rnw but maybe there is a bug

I can confirm to you @DavisVaughan that knitting .Rnw will correctly set the out.format to be latex and is_latex_output() will be TRUE. That happens in

knitr/R/output.R

Lines 223 to 226 in db4eafb

opts_knit$set(out.format = switch(
pattern, rnw = 'latex', tex = 'latex', html = 'html', md = 'markdown',
rst = 'rst', brew = 'brew', asciidoc = 'asciidoc', textile = 'textile'
))

So dplyr message should output ok in usual Rnw file

@yihui yihui closed this as completed in 2df02ce Mar 8, 2023
@yihui
Copy link
Owner

yihui commented Mar 8, 2023

@yihui should we add sweave out.format inside is_latex_output() ?

Yes, and done. Thanks!

@cderv
Copy link
Collaborator

cderv commented Mar 8, 2023

Awesome !

Thanks @DavisVaughan and @ggrothendieck for the report about all this specific behavior

@github-actions
Copy link

github-actions bot commented Sep 6, 2023

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants