-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Break multi bytes UTF-8 characters when parsing in Node-style #908
Comments
Option 1: using string_decoder in
|
I'm trying to proactively use the iconv-lite option. Can you check if this pseudo implementation correct? Could also be added to the docs after clean-up. And is there a way to get the "meta" field in the streaming api? on('data') only gets you the data part of the result. Lines 917 to 918 in 1f2c733
I assume that's intentional?
|
If I may, I believe the WHATWG's TextDecoder option would be your best move here. |
any news on this? I tried every method to fix it, but it doesn't work or it just take forever to read the stream. What should be done in the mentime to be able to read mutil-byte UTF-8 cjaracters when streaming to papa? |
PapaParse breaks multi bytes UTF-8 characters when they are sliced between different chunks of
Buffer
.For example
ç
would become��
.To reproduce:
A workaround is to ensure UTF-8 decoding with
string_decoder
(internal Node module),WHATWG TextDecoder
or withiconv-lite
(user-land dependency).But a better answer is to use
string_decoder
orTextDecoder
intoPapaParse
, in place ofchunk.toString()
.Related to #751
The text was updated successfully, but these errors were encountered: