Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF8 characters in code blocks #963

Open
kellertuer opened this issue Jul 19, 2021 · 14 comments
Open

UTF8 characters in code blocks #963

kellertuer opened this issue Jul 19, 2021 · 14 comments

Comments

@kellertuer
Copy link

In Julia code it is quite common to use UTF8 characters; the example I stumbled upon was just π but also the bold face ℝ and letters like that.

When writing a JOSS paper and using these within code, one gets an error, so for example when doing

```julia
a = π
```

one would get an error like

https://github.com/ranocha/SummationByPartsOperators.jl/runs/3105953836?check_suite_focus=true

Now I have not yet completely checked which LaTeX packages are used to typeset the code, but with `lstlistings` (and I think also in general) one can resolve this by declaring the UTF8 letter accordingly. This was for example done in the `jlcode` package, see for example https://github.com/wg030/jlcode/blob/df9ba698b37e868bf8debb4fde60b8991bff19c6/jlcode.sty#L754 for the π or https://github.com/wg030/jlcode/blob/df9ba698b37e868bf8debb4fde60b8991bff19c6/jlcode.sty#L1175 which defines the ℝ.

Could something like this maybe also be done for the JOSS paper style?

If I can be of any help with that let me know :) 
@jedbrown
Copy link
Member

This would be nice to support. Pandoc is converting code such as

f(x) = π * x

to

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{f(x) }\OperatorTok{=}\NormalTok{ π }\OperatorTok{*}\NormalTok{ x}
\end{Highlighting}
\end{Shaded}

This looks to be the same issue, solved using lualatex.
https://tex.stackexchange.com/questions/568030/pandoc-codeblock-containing-unicode-characters

If you can find an acceptable solution using xelatex, I think we'd be happy to add it in the Whedon repo.

@kellertuer
Copy link
Author

Thanks for the fast answer. Can you see, which packages are used for typesetting? I know either Shaded nor Highlighting as environments. The above examples are for use with listings, but I do not know what Pandoc uses there.

@kellertuer
Copy link
Author

kellertuer commented Jul 19, 2021

Ah from the paper draft docker file it seems it might use listings actually; then the above commands (linked) should work, too :) https://github.com/openjournals/whedon/blob/919a44c1cae3a44f4475db2cd3bbc17b3b3ff721/paperdraft.Dockerfile#L23

edit but the latex template is too complicated for me, it seems it sometimes uses listings, but sometimes not – so if not I do not know what to do – and if it uses listings I do not know where to put new stuff.

@jedbrown
Copy link
Member

I see jlcode is not available through tlmgr so it would need to be copied in. jlcode.sty could be put in resources/, for example. But I think listings is not used for markdown code blocks. Indeed, it looks like it's just a fonts issue.

@jedbrown
Copy link
Member

Indeed, I copied Hack-Regular.ttf into a paper directory and added the following to the markdown header:

header-includes:
- |
  ```{=latex}
  \setmonofont[Path=./]{Hack-Regular.ttf}
  ```

image

The output looks acceptable. So the question for JOSS is really about adopting a new monospace font. I personally like DejaVu Sans Mono or its close relative Hack. Other strong competitors include JuliaCode and FiraCode.

@kellertuer
Copy link
Author

jlcode is indeed not available in can als also a little preliminary (I just added a few symbols back then), and the symbols are also only required for (pdf)latex, since then you do not necessarily have utf8 capabilities. That should work better with xelatex, and then you can also choose a nice font, yes :)

My personal favourite is – I think – FiraCode, but JuliaCode is also nice, I do not know much about DejaVu Sans or Hack.

jedbrown added a commit to jedbrown/whedon that referenced this issue Jul 19, 2021
This allows unicode in code blocks, as is especially popular in Julia
submissions (openjournals/joss#963).
jedbrown added a commit to jedbrown/inara that referenced this issue Jan 6, 2022
This allows unicode in code blocks, as is especially popular in Julia
submissions (openjournals/joss#963).

Cc: openjournals/whedon#105
tarleb pushed a commit to openjournals/inara that referenced this issue Jan 7, 2022
This allows unicode in code blocks, as is especially popular in Julia
submissions (openjournals/joss#963).

Cc: openjournals/whedon#105
@lrnv
Copy link

lrnv commented Feb 4, 2024

Hey I just wanted to report that I have a similar issue there lrnv/Copulas.jl#121

The paper is full of glitches from unicode not rendering correctly. Is a lot of latex characters do render correctly, there are several Julia-allowed ones that do not. In my particular case, I have:

X₁  X₂  X₃ # Indices 1,2,3 do not render.# the \hat does not render.
ϕ # \phi does simply not render. 
ϕ⁽ᵈ⁾ # upper parenthesis are OK, but upper d is not (while upper 1 is OK) 

What can i do as a workaround right now ? It looks like the Hack font does not cover everything I need, is there an option to force usage of another user-suplied font (e.g; JuliaMono in my case) ?

@kellertuer
Copy link
Author

I tweaked jlcode for a while successfully – but it would be better to use something like minted (though it requires LaTeX to be able to escape to shell, since the code highlighting works with that).

There (with XeLaTeX) you can also easily use JuilaMono.

@lrnv
Copy link

lrnv commented Feb 4, 2024

There (with XeLaTeX) you can also easily use JuilaMono.

Do you have a link on how to do that in current joss papers ?

@kellertuer
Copy link
Author

That would more be an issue how to tell that to pandoc I think; no I have not yet had the time (nor idea) to write a second JOSS paper and the first one 2021 I did with jlcode if I remember correctly (that was a busy time, I just had migrated to another country).

@kellertuer
Copy link
Author

Ah see above in this thread, you can possibly hack the latex Header directly in the Markdown.

@lrnv
Copy link

lrnv commented Feb 4, 2024

Thanks @kellertuer, adding

header-includes:
- |
  ```{=latex}
  \setmonofont[Path=./]{JuliaMono-Regular.ttf}
  ```

to my header solves the issue perfectly.

@jedbrown
Copy link
Member

jedbrown commented Feb 5, 2024

Note that JOSS uses lualatex, not xelatex (which is no longer active). Building this paper gives

[WARNING] Missing character: There is no ₁ (U+2081) (U+2081) in font Hack:mode=node;script=latn;l
[WARNING] Missing character: There is no ₂ (U+2082) (U+2082) in font Hack:mode=node;script=latn;l
[WARNING] Missing character: There is no ₃ (U+2083) (U+2083) in font Hack:mode=node;script=latn;l
[WARNING] Missing character: There is no ₁ (U+2081) (U+2081) in font Hack:mode=node;script=latn;l
[WARNING] Missing character: There is no ₂ (U+2082) (U+2082) in font Hack:mode=node;script=latn;l
[WARNING] Missing character: There is no ₃ (U+2083) (U+2083) in font Hack:mode=node;script=latn;l
[WARNING] Missing character: There is no ̂ (U+0302) (U+0302) in font Hack:mode=node;script=latn;la
[WARNING] Missing character: There is no ₁ (U+2081) (U+2081) in font Hack:mode=node;script=latn;l
[WARNING] Missing character: There is no ₂ (U+2082) (U+2082) in font Hack:mode=node;script=latn;l
[WARNING] Missing character: There is no ₃ (U+2083) (U+2083) in font Hack:mode=node;script=latn;l
[WARNING] Missing character: There is no ₁ (U+2081) (U+2081) in font Hack:mode=node;script=latn;l
[WARNING] Missing character: There is no ₂ (U+2082) (U+2082) in font Hack:mode=node;script=latn;l
[WARNING] Missing character: There is no ₃ (U+2083) (U+2083) in font Hack:mode=node;script=latn;l
[WARNING] Missing character: There is no ϕ (U+03D5) (U+03D5) in font Hack:mode=node;script=latn;la

Interestingly, all these characters render perfectly in my terminal (which uses Hack) and in the Hack playground, though I think both cases are due to OS font fallbacks since fclist :charset=03d5 says Hack does not have Greek phi symbol. Note that it does have 03c6 (Greek small letter phi). I think we should figure out how to configure a fallback font.

Stylistically, I find subscript numerals in variable names dubious, but that's a subjective choice JOSS need not make. OT for fonts, but on my mind reading this example: For languages that have standard formatters, I wonder if it should be applied. (For example, the Copulas paper.md seems to follow no pattern as to whether comma is followed by space.)

@jedbrown
Copy link
Member

jedbrown commented Feb 5, 2024

This using luaotfload looks like it should be a solution, but I haven't yet gotten it to work in Inara and the docs are in a different context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants