Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accented characters + hyphenation + \usepackage[T1]{fontenc} = parsing error #198

Closed
Bibi opened this issue Feb 2, 2023 · 20 comments
Closed
Assignees
Milestone

Comments

@Bibi
Copy link

Bibi commented Feb 2, 2023

LaTeX compilation in TextMate sometimes throws a parsing error. Others have experienced the same problem, but AFAIK, no solution has been found yet (see https://stackoverflow.com/questions/74837558/pb-latex-compilation-with-textmate-under-os-ventura-13 or #190 for example). This error is not related to LaTeX: the document compiles just fine using the terminal.

Now, I did investigate a bit, and it seems that the culprits may be accented characters, hyphenation and \usepackage[T1]{fontenc} (that I thought to be mandatory for correct hyphenation when using accented characters... )

Here is a MWE, that typesets fine when the fontenc package is not used, but fails when it is.

\documentclass{article}
\usepackage[french]{babel}
\usepackage[T1]{fontenc} % If this line is commented, everything goes smoothly

\begin{document}
\begin{minipage}[t]{3cm} % A narrow minipage to force some hyphenation
   éèàéééééééù éèàéééééééù some text with accented characters
\end{minipage}
\end{document}

The error:

Traceback (most recent call last): 
File "/Users/johndoe/Library/Application Support/TextMate/Managed/Bundles/LaTeX.tmbundle/Support/bin/texmate.py", line 1054, in status = run_latex(command, filename, cache_filename, verbose) 
File "/Users/johndoe/Library/Application Support/TextMate/Managed/Bundles/LaTeX.tmbundle/Support/bin/texmate.py", line 228, in run_latex fatal, errors, warnings = lp.parse_stream() 
File "/Users/johndoe/Library/Application Support/TextMate/Managed/Bundles/LaTeX.tmbundle/Support/lib/Python/parsing.py", line 611, in parse_stream return super(LaTexParser, self).parse_stream() 
File "/Users/johndoe/Library/Application Support/TextMate/Managed/Bundles/LaTeX.tmbundle/Support/lib/Python/parsing.py", line 195, in parse_stream line = self.get_rewrapped_line() 
File "/Users/johndoe/Library/Application Support/TextMate/Managed/Bundles/LaTeX.tmbundle/Support/lib/Python/parsing.py", line 140, in get_rewrapped_line line = to_utf8(self.input_stream.readline()) 
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) 
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 77: invalid continuation byte

Alas, I do not know how to investigate any further. Any idea on why this is happening would be most welcome!

P.S. I'm on MacOS 12.6.3, TextMate 2.0.23 with up-to-date bundles, TeXLive 2022 (also up-to-date). Note that I already asked this question on tex.stackexchange (https://tex.stackexchange.com/questions/673756/textmate-fontenc-hyphenation-and-accented-characters-parsing-error), but since this is TextMate-related, this place might be more appropriate.

@vschiavoni
Copy link

I confirm I also get the error:

restricted \write18 enabled.
entering extended mode
Processing: mwe.tex
LaTeX2e <2022-11-01> patch level 1
L3 programming layer <2023-01-24>
(/usr/local/texlive/2022/texmf-dist/tex/latex/base/article.cls
Document Class: article 2022/07/02 v1.4n Standard LaTeX document class
(/usr/local/texlive/2022/texmf-dist/tex/latex/base/size10.clo))
(/usr/local/texlive/2022/texmf-dist/tex/generic/babel/babel.sty
(/usr/local/texlive/2022/texmf-dist/tex/generic/babel/txtbabel.def)
(/usr/local/texlive/2022/texmf-dist/tex/generic/babel-french/french.ldf))
(/usr/local/texlive/2022/texmf-dist/tex/generic/babel/locale/fr/babel-french.tex) (/usr/local/texlive/2022/texmf-dist/tex/latex/carlisle/scalefnt.sty)
(/usr/local/texlive/2022/texmf-dist/tex/latex/graphics/keyval.sty)
(/usr/local/texlive/2022/texmf-dist/tex/latex/base/fontenc.sty)
(/usr/local/texlive/2022/texmf-dist/tex/latex/l3backend/l3backend-pdftex.def)
Traceback (most recent call last): File "/Users/veleno/Library/Application Support/TextMate/Managed/Bundles/LaTeX.tmbundle/Support/bin/texmate.py", line 1054, in status = run_latex(command, filename, cache_filename, verbose) File "/Users/veleno/Library/Application Support/TextMate/Managed/Bundles/LaTeX.tmbundle/Support/bin/texmate.py", line 228, in run_latex fatal, errors, warnings = lp.parse_stream() File "/Users/veleno/Library/Application Support/TextMate/Managed/Bundles/LaTeX.tmbundle/Support/lib/Python/parsing.py", line 611, in parse_stream return super(LaTexParser, self).parse_stream() File "/Users/veleno/Library/Application Support/TextMate/Managed/Bundles/LaTeX.tmbundle/Support/lib/Python/parsing.py", line 195, in parse_stream line = self.get_rewrapped_line() File "/Users/veleno/Library/Application Support/TextMate/Managed/Bundles/LaTeX.tmbundle/Support/lib/Python/parsing.py", line 140, in get_rewrapped_line line = to_utf8(self.input_stream.readline()) File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 77: invalid continuation byte```

@vschiavoni
Copy link

Interestingly, for me the problem is not necessely solved by commenting the fontenc package.

This MWE does not work:

\documentclass{article}
\usepackage[french]{babel}
\usepackage[T1]{fontenc} % If this line is commented, everything goes smoothly

\begin{document}
\begin{minipage}[t]{3cm} % A narrow minipage to force some hyphenation
   éèàééééééé some text with accented characters
\end{minipage}
\end{document}

Instead, this works just fine:

\documentclass{article}
\usepackage[french]{babel}
\usepackage[T1]{fontenc} % If this line is commented, everything goes smoothly

\begin{document}
\begin{minipage}[t]{3cm} % A narrow minipage to force some hyphenation
   éèàéééééé some text with accented characters
\end{minipage}
\end{document}

The difference is that in the first one (that fails) we have: éèàééééééé while in the one that works, we have: éèàéééééé.
NOt sure it gives additional clues and not sure how to debug further.

@Bibi
Copy link
Author

Bibi commented Feb 2, 2023

I think it is somehow related to hyphenation when there are accented words in the line, that's why it works in the second case and not in the first. Without accented characters in the line, I think it will always work fine.

@vschiavoni
Copy link

It seems to be correct. For instance, by wrapping the broken example into an mbox, \mbox{éèàéééééééù}, no more parsing error.

@Bibi
Copy link
Author

Bibi commented Feb 3, 2023

I might be because pdflatex writes non-utf8 characters to the terminal? (See MiKTeX/miktex#581.)

@vschiavoni
Copy link

immagine

That's the case for me at least: with that parameter option, no crash anymore. On the downside, one des not see the compilation log.

@Bibi
Copy link
Author

Bibi commented Feb 3, 2023

Yes, when specifying --interaction=batchmode, pdflatex doesn't print anything to the terminal. So it is a workaround (since nothing has to be parsed by TextMate), but not a solution per se...

@Bibi
Copy link
Author

Bibi commented Feb 6, 2023

Hmm... Reporting an issue to Github does not seem very conclusive so far. I use TextMate daily and this is becoming a hassle. Maybe I should try other means of communication. What is the best way of reporting issues to TextMate developer(s)?

@sanssecours
Copy link
Member

What is the best way of reporting issues to TextMate developer(s)?

In my opinion opening an issue here is the best way 👍. The problem is that I usually do not work on the bundle (as actively) as I should anymore. I do have a “real” day job and finding time and especially motivation to work on the bundle is kind of hard for me at the moment. Sorry for the lackluster answer :(.

@Bibi
Copy link
Author

Bibi commented Feb 6, 2023

Ok I understand, thanks for your answer. I guess I'll be patient and hope for the best!

@andreasloe
Copy link

andreasloe commented Feb 23, 2023

Having the same problem (which I mentioned in another thread) so I was looking further. It does not seem to be an issue with Latex. If I run the command inside textmate I can see that textmate is trying to execute

Running 'pdflatex -interaction=nonstopmode -file-line-error-style -interaction=nonstopmode -file-line-error-style -synctex=1 -synctex=1 -interaction=nonstopmode -recorder "Skript.tex"'

If I am executing this in terminal I do not get an error message at all. If I runt his inside Textmate I get

Traceback (most recent call last): File "/Users/user/Library/Application Support/TextMate/Managed/Bundles/LaTeX.tmbundle/Support/bin/texmate.py", line 1014, in status = command_parser.parse_stream() File "/Users/user/Library/Application Support/TextMate/Managed/Bundles/LaTeX.tmbundle/Support/lib/Python/parsing.py", line 771, in parse_stream return super(LaTexMkParser, self).parse_stream() File "/Users/user/Library/Application Support/TextMate/Managed/Bundles/LaTeX.tmbundle/Support/lib/Python/parsing.py", line 188, in parse_stream function(matching, line) File "/Users/user/Library/Application Support/TextMate/Managed/Bundles/LaTeX.tmbundle/Support/lib/Python/parsing.py", line 790, in start_latex fatal_error, number_errors, number_warnings = parser.parse_stream() File "/Users/user/Library/Application Support/TextMate/Managed/Bundles/LaTeX.tmbundle/Support/lib/Python/parsing.py", line 611, in parse_stream return super(LaTexParser, self).parse_stream() File "/Users/user/Library/Application Support/TextMate/Managed/Bundles/LaTeX.tmbundle/Support/lib/Python/parsing.py", line 195, in parse_stream line = self.get_rewrapped_line() File "/Users/user/Library/Application Support/TextMate/Managed/Bundles/LaTeX.tmbundle/Support/lib/Python/parsing.py", line 140, in get_rewrapped_line line = to_utf8(self.input_stream.readline()) File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 173: invalid start byte
So we have to look further.

@andreasloe
Copy link

Made some research and I am a little bit closer. Textmate/LatexBundle invokes in texmate.py (around line 1054) the command
status = run_latex(command, filename, cache_filename, verbose)
This is where the actual typesetting occurs (ok, it is rerouted to line 224 and then executed there). I printed the corresponding command and in my example it was
pdflatex -interaction=nonstopmode -file-line-error-style -synctex=1 -synctex=1 -interaction=nonstopmode Skript.tex .Skript.lb
This command runs through if typed in terminal. It does not run in the window in Textmate. Strange.

@andreasloe
Copy link

andreasloe commented Feb 23, 2023

OK, I found the problem (but not a solution yet). There is a text written in LaTeX. This text contains a footnote that results in an overbox. LaTeX now outputs the following message
Overfull \hbox (0.81625pt too wide) in paragraph at lines 383--390 \T1/ppl/m/n/10.95 net man als ewi-ge Ren-te.[][][][][] Bil-det man den Grenz-we rt der Bar-wert-for-mel für die gleich-
and this is encoded in Western Latin 1 and cannot be read in utf-8 (the particular culprit is the ü in "für" above). We need to change line 140 in parsing.py, namely
line = to_utf8(self.input_stream.readline())
and then we have a solution.

@andreasloe
Copy link

andreasloe commented Feb 23, 2023

OK, I now have a solution. This is a bad workaround for someone who simply wanted a quick-and-dirty solution but I do not know enough python to understand what is really going on. Since the issue has nothing to do with LaTeX itself but with the interaction between the latex output and the files that are shown in TextMate I decided to ignore/delete/discard those messages that produce the problem. They still can be read in the logfile. Then everything goes through.

What I did is the following. Open parsing.py, it can be found in
~/Library/Application support/TextMate/Managed/Bundles/LaTeX.tmbundle/Support/lib/Python/parsing.py
There, go to line 140 and change the line
line = to_utf8(self.input_stream.readline())
into a try-except-command like
try:
line = to_utf8(self.input_stream.readline())
except:
line = 'UTF-8 error'
You can write whatever you want in the last line (instead of UTF8-error). As can be seen "input_stream.readline" is responsible here. My solution did the trick for me.

@markusweb
Copy link

i can confirm that this change will work

OK, I now have a solution. ...

@sanssecours sanssecours self-assigned this Mar 5, 2023
@sanssecours sanssecours added this to the 1.8.0 milestone Mar 5, 2023
@sanssecours
Copy link
Member

Sorry for the super late response and thank you for the very well written bug report 👍. As far as I can tell the problem was fixed in the latest version of the Bundle in the repository. At least I was not able to reproduce the problem using the steps provided in the first post.

Unfortunately, the Bundle index of TextMate (“Settings” → “Bundles”) seems to still contain an old version of the LaTeX Bundle (from April). You should be able to update the Bundle using the steps provided in the Wiki. For more advanced user we also provide steps to update the bundle using Git. In both cases the modified bundle should be updated again as soon as Michael (Bundle maintainer) or Allan (TextMate creator) update the Bundle index.

@andreasloe
Copy link

andreasloe commented Mar 6, 2023

Bildschirm­foto 2023-03-06 um 11 09 23

Sounds good, but I get an error message "check_open“ kann nicht geöffnet werden, da der Entwickler nicht verifiziert werden kann." (cannot be opened because developer not modified?). I know this issue using simple programs where I have to use the right mouse button instead of clicking directly, but how I am resolving this?

@andreasloe
Copy link

I might solved it by myself. Can someone check Nr 11. at https://github.com/textmate/latex.tmbundle/wiki/FAQ

@sanssecours
Copy link
Member

I might solved it by myself. Can someone check Nr 11. at https://github.com/textmate/latex.tmbundle/wiki/FAQ

Thank you for adding the explanation 👍. While the warning window did not show up for me on a macOS VM when I used “Typset & View (PDF)”, I am pretty sure it would pop up for other LaTeX Bundle Commands. I reformulated your text slightly and added some screenshots to the Wiki. Thank you for the help 💖.

@Bibi
Copy link
Author

Bibi commented Mar 6, 2023

I confirm it solves the issue. Many thanks @sanssecours and @andreasloe !

@Bibi Bibi closed this as completed Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants