-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v20.5.0] Buffer.allocUnsafe and TextDecoder.decode doesn't play well. #48995
Comments
I can reproduce on macOS. |
FWIW this seems to fix the issue: diff --git a/src/encoding_binding.cc b/src/encoding_binding.cc
index b65a4f868e..747df2c40c 100644
--- a/src/encoding_binding.cc
+++ b/src/encoding_binding.cc
@@ -164,9 +164,9 @@ void BindingData::DecodeUTF8(const FunctionCallbackInfo<Value>& args) {
size_t length = buffer.length();
if (has_fatal) {
- auto result = simdutf::validate_utf8_with_errors(data, length);
+ bool is_valid = simdutf::validate_utf8(data, length);
- if (result.error) {
+ if (!is_valid) {
return node::THROW_ERR_ENCODING_INVALID_ENCODED_DATA(
env->isolate(), "The encoded data was not valid for encoding utf-8");
}
cc: @lemire |
No string that begins with the byte value 0x80 can be valid (does not matter what comes after). I have added tests for this upstream, specifically const char bad[1] = {(char)0x80};
size_t length = 1;
simdutf::result res = implementation.validate_utf8_with_errors(bad, length);
ASSERT_TRUE(res.error); We run tests with sanitizers, so an out-of-bound access or an undefined behavior would be detected. What is the expectation in this routine...? if (has_fatal) {
auto result = simdutf::validate_utf8_with_errors(data, length);
if (result.error) {
return node::THROW_ERR_ENCODING_INVALID_ENCODED_DATA(
env->isolate(), "The encoded data was not valid for encoding utf-8");
}
} Is the expectation that the data is incorrect? If so, I would encourage |
I think the expectation is that data is correct. Anyway the reproduction script calls
|
I am going to be adding tests upstream today. Give me 24 hours to research this, to see whether there is an issue. |
Sure, no hurries. |
I am going to be issue a patch release. |
The issue is upstream in simdutf. I have a patch release for it https://github.com/simdutf/simdutf/releases/tag/v3.2.15 I recommend upgrading to v3.2.15. It is nearly identical to v3.2.14 except for this one issue. cc @anonrig |
PR-URL: nodejs/node#49019 Fixes: nodejs/node#48995 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
PR-URL: nodejs/node#49019 Fixes: nodejs/node#48995 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
Version
v20.5.0
Platform
Linux dell 6.4.0-1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.4.4-1 (2023-07-23) x86_64 GNU/Linux
Subsystem
No response
What steps will reproduce the bug?
Here is some code to reproduce and explore the bug.
Both
Buffer.from([0x80])
andBuffer.allocUnsafe(1)[0] = 0x80
will cause unstable behaviour in TextDecoder.decodeHow often does it reproduce? Is there a required condition?
Sometimes I have to run the code snippet a few times before the bug is seen.
What is the expected behavior? Why is that the expected behavior?
Since I'm running the TextDecoder with
{fatal: true}
I expect the invalid input to ALWAYS throw an error.What do you see instead?
A few times (pretty much random) instead of throwing an error it outputs the
\uFFFD
replacement character. This should not be possible when{fatal: true}
is used.Additional information
I have only tested on Linux Debian with Node v20.5.0.
The text was updated successfully, but these errors were encountered: