Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render book as PDF in publish.yml workflow #1572

Merged
merged 11 commits into from
Jan 15, 2024

Conversation

max-heller
Copy link
Contributor

@max-heller max-heller commented Dec 9, 2023

Renders the book as a PDF and includes it in the published HTML bundle as comprehensive-rust.pdf.

Unfinished business:

  • This doesn't add a link to the PDF from anywhere (see Look into publishing PDFs with mdbook-typst #1543 (comment)).
  • This also may or may not do anything useful yet, thanks yaml scripting. I haven't tested rendering the translated books locally, so that may need extra work.
  • The book contains unicode characters that the default LaTeX font doesn't support (e.g. 🦀), so you may want to swap out the fonts.
    • Use different fonts for translations (e.g., Chinese)
  • Slightly smaller margins
  • Color hyperlinks
  • Do you know of a way to exclude/include content in the PDF? As an example, 1.2. Keyboard Shortcuts should not be shown in the PDF. Similarly, the "In this segment" texts (as seen in 12. Pattern Matching) should also go away.
  • It would also be great if we could apply the formatting in 67. Glossary — it's done via a stylesheet in the HTML, but that is of course lost in the PDF.
    • Ignoring for now

Closes #1543

Rendered

Copy link

google-cla bot commented Dec 9, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@mgeisler
Copy link
Collaborator

mgeisler commented Jan 1, 2024

Hi @max-heller, thanks for creating the PR!

  • This also may or may not do anything useful yet, thanks yaml scripting. I haven't tested rendering the translated books locally, so that may need extra work.

Yeah, I hate that too 😄 Could I get you to enable running the workflows in your fork? That should allow us to see the effect of this PR in your fork (you probably need to merge the main branch into the rendered-pdf branch so that the publish workflow triggers).

  • The book contains unicode characters that the default LaTeX font doesn't support (e.g. 🦀), so you may want to swap out the fonts.

Good point! From looking through the PDF at

Rendered

I don't immediately spot any bad effects of this — the little crab seems to have been ignored and disappeared. I'm thinking of removing the emoji from the course since it's tedious to type...

As for a font, it's a bit ironic that Computer Modern doesn't feel super modern to me any longer. I guess I should experiment a bit with finding a more crisp (sans-serif?) font... perhaps slightly smaller margins would be nice too as well as some color to the hyperlinks.


Do you know of a way to exclude/include content in the PDF? As an example, 1.2. Keyboard Shortcuts should not be shown in the PDF. Similarly, the "In this segment" texts (as seen in 12. Pattern Matching) should also go away.

It would also be great if we could apply the formatting in 67. Glossary — it's done via a stylesheet in the HTML, but that is of course lost in the PDF.

@mgeisler
Copy link
Collaborator

mgeisler commented Jan 1, 2024

I don't immediately spot any bad effects of this — the little crab seems to have been ignored and disappeared. I'm thinking of removing the emoji from the course since it's tedious to type...

Now that I've tried this locally, I see that there are a bunch of other places where emojis cause problems:

[WARNING] Missing character: There is no 🌍 (U+1F30D) (U+1F30D) in font [lmmono10-regular]:!
[WARNING] Missing character: There is no α (U+03B1) (U+03B1) in font [lmmono10-regular]:!
[WARNING] Missing character: There is no 🪐 (U+1FA90) (U+1FA90) in font [lmmono10-regular]:!

plus the line drawing characters in the notes of 26.2. Filesystem Hierarchy.

@max-heller
Copy link
Contributor Author

max-heller commented Jan 2, 2024

Yeah, I hate that too 😄 Could I get you to enable running the workflows in your fork? That should allow us to see the effect of this PR in your fork (you probably need to merge the main branch into the rendered-pdf branch so that the publish workflow triggers).

Running (rendered English book)

The translated books are built, but some are very broken because of missing fonts.

Do you know of a way to exclude/include content in the PDF? As an example, 1.2. Keyboard Shortcuts should not be shown in the PDF. Similarly, the "In this segment" texts (as seen in 12. Pattern Matching) should also go away.

There's no builtin way to accomplish this that I can tell. Some options:

  • Run a Pandoc filter (AST transformer) to exclude marked sections or whole files. This would be the most flexible and doesn't require changes to mdbook-pandoc
  • Add functionality to mdbook-pandoc to exclude marked sections as in the previous option. This might be useful functionality to have built in, but adds complexity
  • Add a configuration option to mdbook-pandoc to exclude certain files. Less fine-grained but doesn't require annotating the source

It would also be great if we could apply the formatting in 67. Glossary — it's done via a stylesheet in the HTML, but that is of course lost in the PDF.

Not sure how to accomplish this with LaTeX unfortunately

.github/workflows/publish.yml Outdated Show resolved Hide resolved
@max-heller
Copy link
Contributor Author

max-heller commented Jan 3, 2024

It would also be great if we could apply the formatting in 67. Glossary — it's done via a stylesheet in the HTML, but that is of course lost in the PDF.

One option might be to format the glossary with <dl>, <dt>, and <dd> HTML elements, which Pandoc turns into something reasonable when converting from HTML:

<dl>
  <dt>Coffee</dt>
  <dd>Black hot drink</dd>
  <dt>Milk</dt>
  <dd>White cold drink</dd>
</dl>

passed to pandoc -f html -t latex produces

\begin{description}
\tightlist
\item[Coffee]
Black hot drink
\item[Milk]
White cold drink
\end{description}

which renders as
Screenshot 2024-01-02 at 7 18 10 PM
However, Pandoc mostly ignores HTML embedded in Markdown, so mdbook-pandoc would need to find a way to force Pandoc to process the embedded HTML like it does with -f html

@mgeisler
Copy link
Collaborator

mgeisler commented Jan 3, 2024

Running (rendered English book)

Yay, great to see that it works!

The translated books are built, but some are very broken because of missing fonts.

Uh, yeah that looks pretty rough 😄

I did a bunch of testing here and found that I can get nice output for the Chinese translations, Korean, Japanese and the Russian translation by

  • Installing the Noto family of fonts (with sudo apt install fonts-noto)
  • Setting CJK*font, but only for the Chinese, Japanese, and Korean translations.

So my book.toml file ended up looking like this for Simplified Chinese (SC):

[output.pandoc.profile.pdf.variables]
mainfont = "Noto Serif"
sansfont = "Noto Sans"
monofont = "Noto Sans Mono"

CJKmainfont = "Noto Serif CJK SC"
CJKsansfont = "Noto Sans CJK SC"
CJKmonofont = "Noto Sans Mono CJK SC"

There are other fonts for the other CJK languages.

The CJKmainfont variable trigger the inclusion of luatexja-fontspec, which I installed with sudo apt install texlive-lang-cjk. This package is not just used for Japanese, but also for the Chinese translations. However, including this package gives weird output for other languages, so it's important that it's only set for the CJK languages. I wanted to set it via an environment variable, but it seems that they cannot express a mixed-case key like CJKmainfont.

I've played around quite a bit here, so I guess the packages to install can be trimmed down a lot, but this should be a good starting point.

It would also be great if we could apply the formatting in 67. Glossary — it's done via a stylesheet in the HTML, but that is of course lost in the PDF.

One option might be to format the glossary with <dl>, <dt>, and <dd> HTML elements, which Pandoc turns into something reasonable when converting from HTML:

Right, if we can include HTML, then things should be easier — however, I very much want to avoid HTML since it is hard to edit in the English version and it looks weird and complicated for the translators.

I think we can ignore the glossary formatting for now.

book.toml Show resolved Hide resolved
@max-heller max-heller requested a review from