-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON.stringify
UTF-8 vs. UTF-16
#2387
Comments
It should say that it returns well-formed Unicode strings. |
Thanks for the feedback. Should the
|
I think so. There's nothing special about the encoding of the returned string — it’s just a JavaScript string, like other JavaScript strings. (And yes, JavaScript does treat strings kind of like UCS-2/UTF-16, but that's not special for |
Ok thanks. I'll preface this by acknowledging that I'm not tremendously well versed on this topic (though I'm significantly more informed than a few days ago, thanks in no small part to your "Well-formed JSON.stringify" proposal and "JavaScript’s internal character encoding" article). Those are good points and I'm mostly in alignment with you. I've thought about it further and I think it probably does make sense to be more explicit on what it returns than "JSON string" though -- whether by referencing "UTF-16" or "well-formed Unicode". "6.1.4 The String Type" seems a bit vague. It says:
So I don't read that as saying those operations will necessarily return well-formed UTF-16 / Unicode. On another note, "Well-formed JSON.stringify" says:
Referencing RFC 8259, which says:
(The ES spec references ECMA-404, which doesn't seem to say anything like that.) Taking both of those things into account, I actually now think the most useful thing to say would be something to the effect that it returns a UTF-16 encoded or well-formed Unicode string regardless of the presence of unpaired surrogates in the input, but the JSON encoding still represents ill-formed Unicode text containing unpaired surrogates and results of parsing it (other than via |
cc @gibson042 |
Hello,
Since ES2019 the Introduction section says:
I don't think that's what it really means to say though, is it? I think it means to say that it returns well-formed UTF-16 (and as a result the content could be encoded as UTF-8)?
The text was updated successfully, but these errors were encountered: