Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"KaTeX parse error: Expected 'EOF', got '<bla>' at position 1" when input pure Chinese character(s) #895

Closed
brlin-tw opened this issue Sep 19, 2017 · 12 comments

Comments

@brlin-tw
Copy link

brlin-tw commented Sep 19, 2017

_257

control case:

_258

Bug Reproducing Instructions

  1. Browse KaTeX – The fastest math typesetting library for the web
  2. Input any C(JK) characters, this attempt uses "測試"

Expected Behavior

"測試" printed in "See how it renders with KaTeX: " field, presumably to be in italic just as any regular text

Current Behavior

The "See how it renders with KaTeX: " field spews out the following error:

KaTeX parse error: Expected 'EOF', got '測' at position 1: 測̲試
@kevinbarabash
Copy link
Member

@Lin-Buo-Ren awesome bug report. You can use CJK characters within \text. I'm curious about the use cases of natural language strings outside of \text?

@brlin-tw
Copy link
Author

@kevinbarabash

I'm curious about the use cases of natural language strings outside of \text?

I'm not sure as I'm not really acquainted with LaTeX and simply using it to write math equations in Markdown documents on GitBook, never used \text before ;)

@07pepa
Copy link

07pepa commented Nov 23, 2017

I know it is utf8 characters related... looked into sources of katex on font and there are only printable english chars maybe it is related to multy-byte-ness of non ascii chars

PS. czech chars like "ěščřžýáíéťňůú" and theri capitalizated version"ĚŠČŘŽÝÁÍÉŤŇŮÚ" do not work also and utf-8 emoji also not work 😭.....only sotlution is ( if input box is utf-8) do utf 8 filter if it thinking it is procesing ascii)

it can look like this ( is not corect im writing in hurry) //will return nuber of bytes that needs to be discarded)
function utf8(char input) var locarvar=0; { if (input>128){ shr input; // shift right input for (var i=0;i<7;i++) { if (odd(input)){ localvar++; } shr input; } return localwar; }else return 0; }

@edemaine
Copy link
Member

edemaine commented Nov 23, 2017

To be clear, \text{測試} works fine already in LaTeX (with any CJK character); it's just 測試 in math mode that doesn't work. I don't see any particular reason to forbid CJK characters in math mode, so we should look at adding support for that.

@edemaine
Copy link
Member

FYI, PR #992 adds support for CJK characters in math mode. We're still resolving some accent-related issues there. If they don't get resolved soon (or the PR gets rejected), I could split off the CJK-in-math-mode part into a separate PR.

@rwbarton
Copy link

rwbarton commented Jan 8, 2018

It looks like #992 was merged, but without the math mode CJK support; what's the status of this?

@brlin-tw
Copy link
Author

brlin-tw commented Jan 8, 2018

The issue can still be reproduced via the instructions atop, I guess #992 isn't enables it ATM

@edemaine
Copy link
Member

edemaine commented Jan 8, 2018

The status is issue #1046, which needs discussion before change. I'd appreciate your input on that -- is there a way to get this to work in LaTeX?

@rwbarton
Copy link

rwbarton commented Jan 8, 2018

I actually haven't used xelatex myself unfortunately--I came across the Zulip issue while investigating something related (which was fixed already in another Unicode-related KaTeX PR).

@kevinbarabash
Copy link
Member

I'm going to close this as a duplicate of #1046.

edemaine added a commit to edemaine/KaTeX that referenced this issue Jan 30, 2018
* When `unicodeTextInMathMode` is `true`, accented letters from
  `unicodeSymbols.js`, and CJK and other supported languages,
  get added support in math mode (as requested in KaTeX#895).
* When `unicodeTextInMathMode` is `false, all of these stop working in
  math mode, and are only supported in text mode (matching XeTeX behavior).
  Note that this is a backwards incompatibility with some 0.9.0 alpha/betas.
kevinbarabash pushed a commit that referenced this issue Feb 20, 2018
* unicodeTextInMathMode setting

* When `unicodeTextInMathMode` is `true`, accented letters from
  `unicodeSymbols.js`, and CJK and other supported languages,
  get added support in math mode (as requested in #895).
* When `unicodeTextInMathMode` is `false, all of these stop working in
  math mode, and are only supported in text mode (matching XeTeX behavior).
  Note that this is a backwards incompatibility with some 0.9.0 alpha/betas.

* Fix handling of Unicode characters ð, Å, å

* Fix double handling of ð (math maps to \eth, not special Unicode character)
* Remove Åå special math handling, thanks to #1125

* Forbid extraLatin when unicodeTextInMathMode is false
@tanjhysj0
Copy link

I have same problem,how to solve?change text content?no I don't want to do that

@edemaine
Copy link
Member

edemaine commented Apr 9, 2018

@tanjhysj0 On the master branch, you can set the unicodeTextInMathMode to true, and this should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants