Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe use xelatex instead of pdflatex by default #4159

Open
JulienPalard opened this issue Oct 17, 2017 · 19 comments
Open

Maybe use xelatex instead of pdflatex by default #4159

JulienPalard opened this issue Oct 17, 2017 · 19 comments
Assignees
Labels
builder:latex type:proposal a feature suggestion

Comments

@JulienPalard
Copy link
Contributor

Subject: I built the cpython documentation in french and japanese, and found it non-trivial to find the right set of options.

Problem

Given that:

  • Non-ascii characters are more and more used, even in english (see the Є in https://docs.python.org/3.7/whatsnew/3.7.html#optimizations)
  • Documentations are sometimes written in other languages like Japanese, French, and so on.
  • Documentations are sometimes translated from one language to another.

We could expect to find non-ascii characters everywhere, which are badly supported by pdflatex, even by using utf8x which come with another set of issues.

Proposed solution

I finally found that xelatex handle very well unicode characters, but does not work well with japanese. And platex works well with japanese.

platex is already the default with the latex_engine is not explicitly configured, which is already nice, but there is no way to configure xelatex for all languages and platex for japanese (#4150).

It forces everyone to learn a lot about latex and PDF generation, and finally force them to use -D with external logic to switch between working engines like https://github.com/python/docsbuild-scripts/pull/34/files.

Also, the documentation is not very explicit about the usages of those engines (see #4149).

What I propose is to switch the default from 'pdflatex' if language != 'ja' else 'platex' to 'xelatex' if language != 'ja' else 'platex' which is a combination that works without any other modification to build cpython documentation in english, french, and japanese.

@tk0miya
Copy link
Member

tk0miya commented Oct 17, 2017

@jfbu could you give us comments for this please?

I can't determine it's better or not. I don't know which is good successor of PDFLatex, LuaTeX and XeTeX. And Also don't know they are enough stable or not for usage of Sphinx.

Of course, I will agree to change default latex engine if either one is enough stable.

@jfbu
Copy link
Contributor

jfbu commented Oct 17, 2017

I will make general remarks.

LuaLaTeX is actively maintained and will probably offer more and more features via dedicated packages which achieve things currently impossible in TeX. But these advantages are probably not needed by vast majority of Sphinx projects. Besides, it appears that ̀LuaLaTeX opens up new security concerns in TeX world, due to scripting language Lua not having the restrictions for file opening and writing which exist with pdfTeX binaries as distributed with major installations (TeXLive and MikTeX). It may be said that such concerns already exist from running Python scripts,... nevertheless this makes one think twice before adopting it on grand scale by default. Thus, I would not recommend using lualatex by default, before experience has accumulated elsewhere.

XeLaTeX does not have this issue.

  • but switching to it does not solve all Unicode related problems: in fact, in hand-written documents authors manually switch languages according to needed glyphs, and they set-up appropriate fonts for languages (at least indirectly via polyglossia package)

  • so far Sphinx LaTeX writer does not support multi-lingual documents. Even if it did, author of Sphinx project would need to manually add mark-up to source in case of exotic Unicode characters to signal the change of language, hence possible OpenType font to use: fonts do not support all scripts, although indeed some fonts do support a wide range of scripts.

  • it appears that polyglossia support for French lags behind babel+french features, so if we at Sphinx set usage of ̀xelatex+polyglossia default, we may raise specific French issues -- admittedly they may be relevant only to expert LaTeX users which will know how to switch back to babel+french usage.

  • they are issues with xelatex regarding math mode: it has some currently non-fixed bugs there, but this is arguably not a very strong deterrent for Sphinx projects.

Making xelatex default will modify looks of all Sphinx produced PDFs, because xelatex should be used with OpenType fonts. It can be used with traditional TeX fonts, but then hyphenation mechanism of TeX is broken in some languages. Recently the LaTeX team has modified behaviour of LaTeX so that by default if used with xelatex engine it will use OpenType version of lmodern font.

So making xelatex default also requires reviewing font configuration and all Sphinx supported languages and as I said it will change the default looks of all Sphinx build PDF documentations.

This looks like quite some work at Sphinx side... I think first step is to move Sphinx towards supporting multi-lingual documents. Because making xelatex default engine is not by itself a 100% solution to all problems related to Unicode input. It requires extra steps.

@jfbu
Copy link
Contributor

jfbu commented Oct 17, 2017

One last pros and cons:

  • typically xelatex produced PDFs are smaller than pdflatex produced ones, when using traditional TeX fonts, because xelatex better compresses the font; but as explained already, xelatex should not be used with traditional TeX fonts for optimal results,

  • compilation times with xelatex or lualatex are often significantly increased compared to pdflatex builds.

@JulienPalard
Copy link
Contributor Author

That's a lot to consider and I'm no latex expert. I just noticed that the current default (pdflatex/platex) put me in a hard situation when building english, french, and japanese:

  • With default configuration, only japanese succeed (platex)
  • Adding utf8x fix english build, (don't remember if it break platex-japanese)
  • Building with xelatex (and removing utf8x) to try to fix french breaks japanese (no more platex by default for japanese)

So I just can't have a successful build with conf.py, I have to use sphinx-build -D flags to pass the right latex_engine for the right language, with an external logic.

It took me some time to find the "right combination", which looks in fact really simple, just replace pdflatex with xelatex as a default engine but keep the "default to platex for japanese if default engine is used".

In one hand I may be short sighted as I tested a single project, in the other hand the Python documentation is huge (230k lines of rst).

@fyears
Copy link
Contributor

fyears commented Oct 21, 2017

+1 to xelatex

I can confidently say that most Chinese LaTeX users prefer xelatex to pdflatex nowadays, because xelatex has MUCH better support for opentype fonts, thus Chinese uses find it WAY MORE easier to display Chinese characters in the generated pdf. The same technology applies to Japanese and Korean characters too (we often refer their fonts together as CJKfonts).

@jfbu In my understanding, sphinx-doc maintains its default template of pdf, thus something like “front issue” should not be a problem (to users)?

@JulienPalard switching from pdflatex to xelatex for JP doc is not THAT trivial. At least you should set \setCJKmainfont , otherwise JP characters are not expected to be displayed correctly. Still, it’s kind of easy for simple cases, see https://tex.stackexchange.com/questions/139081/cjk-blank-output-for-japanese-characters

@fyears
Copy link
Contributor

fyears commented Oct 21, 2017

Some more helpful info here:

  1. xelatex is stable enough to use. luatex is not as popular as xelatex for Chinese users.
  2. Refer https://www.sharelatex.com/learn/Japanese (also, check pages for Chinese and Korean). the xetex packages is universal for CJK environments, if we only need to display some characters and not consider complicated locales (e.g. how dates are rendered). One true issue to be considered, is how to determine the \setCJKmainfont for different systems (win/linux/osx maintain different fonts!) and different languages (sorry but people in CJK develop different fonts).

@jfbu
Copy link
Contributor

jfbu commented Oct 21, 2017

There is no notion of seamless experience in LaTeX regarding Unicode, although xelatex and lualatex have considerably improved the situation.

Already, Sphinx does the minimal right thing regarding xelatex which is not to use inputenc nor fontenc. With a recent LaTeX this means it will automatically use the Latin Modern OpenType font which has good coverage of European (in the large sense) languages.

$ otfinfo -s lmroman10-regular.otf
DFLT		Default
cyrl		Cyrillic
latn		Latin
latn.AZE	Latin/Azeri
latn.CRT	Latin/Crimean Tatar
latn.MOL	Latin/Moldavian
latn.NLD	Latin/Dutch
latn.PLK	Latin/Polish
latn.ROM	Latin/Romanian
latn.TRK	Latin/Turkish

It has no coverage for Chinese or Hebrew for example. This means Sphinx user for a project in these languages must customize LaTeX preamble to appropriately use \setmainfont (or \setCJKmainfont as documented by @fyears) to pick suitable font (Sphinx loads fontspec which provides this macro; but xelatex has its own font loading primitives which advanced xelatex users use directly; normal users will use fontspec and they will have had to read partly its documentation; does this include the average Sphinx-doc user?).

The way this is done is system dependent regarding fonts which are provided with TeX itself (and on Mac OS X one must use different methods depending on whether the OpenType font is a system/user font or in the TeX tree).

Even the minimal Sphinx set-up for xelatex contains elements which are not satisfactory: the coverage of French language by polyglossia is far more restricted than what the babel-frenchb module provides: with polyglossia there is no conformity regarding footnotes and lists with the French typographical rules.

Besides, latex-babel is now (after some years of stagnation) actively maintained and being developed in direction of xelatex/lualatex support. As a result it is not clear if polyglossia will remain preferable to babel in future.

Regarding French as I said it is not. Sphinx French user of xelatex is now well advised to modify latex_elements 'babel''s key to set it to '\usepackage{babel}'. Sphinx internally has 'polyglossia' but will obey 'babel' key if the user has set it:

        # set up multilingual module...
        # 'babel' key is public and user setting must be obeyed
        if self.elements['babel']:
            # this branch is not taken for xelatex/lualatex if default settings

Making xelatex default makes no sense if reasonable font defaults for all Sphinx covered languages are not provided.

For example, similarly as we have specific coverage of japanese [1]_, we can provide specific coverage of Chinese if consensus emerges on how to best set-it up with XeLaTeX and this must be done Windows, Mac OS X, Unixen... Contributions are most welcome !

.. [1] which as mentioned already in this thread goes currently via platex engine which does not support Unicode.

And, stressing again, this does not solve problems one may encounter with stray Unicode characters !

@jfbu
Copy link
Contributor

jfbu commented Oct 21, 2017

Here is basic test of Hebrew with xelatex:

\documentclass[hebrew]{article}
\usepackage{polyglossia}
\setmainlanguage{hebrew}
\begin{document}
מבוא
\end{document}

Produces errors:

./testhebrew.tex:4: Package polyglossia Error: The current roman font does not 
contain the Hebrew script!
(polyglossia)                Please define \hebrewfont with \newfontfamily.

See the polyglossia package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              
                                                  
l.4 \begin{document}
                    
(That was another \errmessage.)

Missing character: There is no מ in font [lmroman10-regular]:mapping=tex-text;!
Missing character: There is no ב in font [lmroman10-regular]:mapping=tex-text;!
Missing character: There is no ו in font [lmroman10-regular]:mapping=tex-text;!
Missing character: There is no א in font [lmroman10-regular]:mapping=tex-text;!

./testhebrew.tex:6: Package polyglossia Error: The current roman font does not 
contain the Hebrew script!
(polyglossia)                Please define \hebrewfont with \newfontfamily.

See the polyglossia package documentation for explanation.

Attempting to try Sphinx on minimal Hebrew document with xelatex leads to plenty of problems:

.. FOO documentation master file, created by
   sphinx-quickstart on Sat Oct 21 14:57:01 2017.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

תוכן הענייני
============

רשימת הטבלאות

in conf.py:

language = 'he'
latex_engine = 'xelatex'
Package bidi Error: Oops! you have loaded package xcolor after bidi package. Please load package xcolor before bidi package, and then try to run xelatex on your document again.

Package bidi Error: Oops! you have loaded package float after bidi package. Please load package float before bidi package, and then try to run xelatex on your document again.

Package bidi Error: Oops! you have loaded package framed after bidi package. Please load package framed before bidi package, and then try to run xelatex on your document again.

Package bidi Error: Oops! you have loaded package wrapfig after bidi package. Please load package wrapfig before bidi package, and then try to run xelatex on your document again.

etc... etc...

and the one of interest to this thread:

Package polyglossia Error: The current roman font does not contain the Hebrew script!
...

(as above)

This confirms Sphinx-doc user will have to know a minimum of LaTeX macros (\newfontfamily) and documentation (fontspec, polyglossia) before reaching usable status for Hebrew language documents even with xelatex as latex_engine.

(we at Sphinx should probably take care of loading polyglossia hence bidi at the right place)

@tk0miya
Copy link
Member

tk0miya commented Oct 21, 2017

jfbu, Thank you for comment.

As you said, moving to xelatex is not silver bullet. AFAIK, there are no common settings that works well for all languages.

@fyears For Chinese docs, #3272 is proposed. It tries to move to xelatex and ctex only if language is zh_*.

@tk0miya
Copy link
Member

tk0miya commented Oct 21, 2017

Note:

This looks like quite some work at Sphinx side... I think first step is to move Sphinx towards supporting multi-lingual documents.

I don't know this is really needed. I've never seen such request. So it's okay to support only one language per project at once.

(edit) oh, I understand #4159 requires it...

@jfbu
Copy link
Contributor

jfbu commented Oct 21, 2017

@tk0miya in the case of CPython docs (which is big...), for example French translation is only at 27.2% currently.

It could make sense (not only for PDF perhaps, but for PDF it is important due to hyphenation which depends on language) to have multi-lingual. Currently only portions of CPython's library.pdf (about 1800 pages) are in French but the whole is treated as French document. This means that hyphenation is wrong for all English text, which is vast majority of document.

(I am using make latex SPHINXOPTS="-D locale_dirs=locales -D language='fr' -D gettext_compact=0" to build the CPython French documentation, with Doc/locales/fr/LC_MESSAGES a symlink to the python-docs-fr cloned repo at 3.6 branch)

@tk0miya
Copy link
Member

tk0miya commented Oct 22, 2017

Ah, I understand. Surely, it is mixture of English and French.
I feel it is very difficult to support it in Sphinx. We must mark languages per sentences or words...

@jfbu
Copy link
Contributor

jfbu commented Oct 22, 2017

@tk0miya But this is done by Docutils already. Consider this test file

Welcome to FOO's documentation!
===============================

Hello

.. class:: language-fr

   Bonjour

.. class:: language-de

   Guten Tag

Again English.

and then rst2latex.py index.rst test.tex constructs a LaTeX file which looks like this (non relevant lines cut):

\documentclass[a4paper]{article}
% generated by Docutils <http://docutils.sourceforge.net/>
\usepackage{cmap} % fix search and cut-and-paste in Acrobat
\usepackage{ifthen}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[french,ngerman,english]{babel}
% Prevent side-effects if French hyphenation patterns are not loaded:
\frenchbsetup{StandardLayout}
\AtBeginDocument{\selectlanguage{english}\noextrasfrench}

[lines cut]

\begin{document}
\maketitle

Hello

\foreignlanguage{french}{Bonjour}

\foreignlanguage{ngerman}{Guten Tag}

Again English.

\end{document}

On further experiment in case of multiple paragraphs each one is given as argument to \foreignlanguage. It should be probably better with \begin{otherlanguage}{french}...\end{otherlanguage} mark-up.

On the other hand Sphinx make latex produces this kind of output:

Hello


\begin{fulllineitems}
\pysigline{\sphinxbfcode{language-fr}}
Bonjour

Un autre paragraphe

\end{fulllineitems}



\begin{fulllineitems}
\pysigline{\sphinxbfcode{language-de}}
Guten Tag

\end{fulllineitems}


Again English.

Possibly related to #4010

@jfbu
Copy link
Contributor

jfbu commented Oct 22, 2017

HTML output from rst2html.py looks like this:

<p>Hello</p>
<p lang="fr">Bonjour</p>
<p lang="fr">Un autre paragraphe</p>
<p lang="de">Guten Tag</p>
<p>Again English.</p>

@mitya57
Copy link
Contributor

mitya57 commented Oct 29, 2017

@jfbu: In Sphinx, the .. class:: directive has a different meaning. Use .. rst-class:: language-XY if you want to insert the original Docutils directive. It should work then.

@jfbu
Copy link
Contributor

jfbu commented Oct 29, 2017

@mitya57: thanks for the tip, which does work indeed for html target, producing same lang attributes as rst2html.py. But it fails for latex target (as expected from actual writers/latex.py code...); the fulllineitems environments are gone however, the output simply losing all traces of the language tags in reST sources.

@jfbu
Copy link
Contributor

jfbu commented Dec 20, 2018

Sphinx 2.0 will use GNU FreeFont with xelatex, providing good coverage of Latin, Cyrillic and Greek scripts (as well as Arabic and Hebrew). This adds new requirement fonts-freefont-otf on Ubuntu xenial or e.g. in Fedora 29 texlive-gnu-freefont. Perhaps Sphinx 3.0 can then have 'xelatex' as default latex_engine, for non-Japanese projects.

(edit: and make suitable choice of fonts for Chinese with 'xelatex')

@goyalyashpal
Copy link

goyalyashpal commented Mar 6, 2024

french and japanese
- @ JulienPalard at #4159 (comment)

i use hindi, and same problem will be faced with using any indian language script (Hindi, Nepali, Tamil, Telugu, Pubjabi, Marathi, Gujarati, ...).

compilation times with xelatex or lualatex are often significantly increased compared to pdflatex builds.
- @ jfbu at #4159 (comment)

that's 'cz xelatex outputs in pdf, and modifying pdf is what takes time. to save on that, latexmk uses xelatex to fastly generate output of intermediate passes in .xdv files; then converts that via xdvipdfmx to .pdf only once at last.

Ref (abridged by me, original at: latexmk-pdf):

  -pdfxe Generate pdf version of document using xelatex [and xdvipdfmx via
         .xdv intermediate files].  Note that production of a .xdv file by
         xelatex is fast, [but of] a .pdf file can be quite time consuming
         when  document includes  large graphics files. So [this approach]
         can result in substantial gains in procesing time, since the .pdf
         file is produced once rather than on every run of xelatex.
Unabridged verbatim:
  -pdfxe Generate  pdf  version  of document using xelatex.  Note that to
         optimize processing time, latexmk uses xelatex  to  generate  an
         .xdv  file rather than a pdf file directly.  Only after possibly
         multiple runs to generate a fully up-to-date .xdv file does  la-
         texmk then call xdvipdfmx to generate the final .pdf file.

         (Note:  The  reason  why latexmk arranges for xelatex to make an
         .xdv file instead of the xelatex's default of a .pdf file is  as
         follows:  When the document includes large graphics files, espe-
         cially .png files, the production of a .pdf file  can  be  quite
         time consuming, even when the creation of the .xdv file by xela-
         tex  is  fast.  So the use of the intermediate .xdv file can re-
         sult in substantial gains in procesing time, since the .pdf file
         is produced once rather than on every run of xelatex.)

@jfbu
Copy link
Contributor

jfbu commented Jul 21, 2024

@goyalyashpal

There is in our docs this tip:

Also, if latexmk is at version 4.52b or higher (January 2017) LATEXMKOPTS="-xelatex" speeds up PDF builds via XeLateX in case of numerous graphics inclusions.

This -xelatex option is (with current Latexmk) equivalent to -pdfxe -dvi- -ps-.

It is probably time in 2017 we do this unconditionally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
builder:latex type:proposal a feature suggestion
Projects
None yet
Development

No branches or pull requests

6 participants