Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer.from(str, encoding) silently ignores decoding errors #8569

Closed
olalonde opened this issue Sep 17, 2016 · 14 comments
Closed

Buffer.from(str, encoding) silently ignores decoding errors #8569

olalonde opened this issue Sep 17, 2016 · 14 comments
Labels
buffer Issues and PRs related to the buffer subsystem. doc Issues and PRs related to the documentations. stalled Issues and PRs that are stalled.

Comments

@olalonde
Copy link
Contributor

olalonde commented Sep 17, 2016

  • Version: Node v6.5.0
  • Platform: OS X 10.11
  • Subsystem: Buffer

Buffer.from silently ignores decoding errors.

Buffer.from('aGVsbG8=', 'base64') // valid base64 string
// => hello
Buffer.from('aGV     sbG8=', 'base64') // invalid base64 string
// => hello

I know this is now the expected behavior but it would be useful, e.g. when writing parsers for strict protocols, to have a Buffer.safeFrom which throws on decoding errors. It could also be passed as a parameter, e.g.: Buffer.from(str, [encoding, throwOnError]).

@mscdex mscdex added buffer Issues and PRs related to the buffer subsystem. feature request Issues that request new features to be added to Node.js. labels Sep 17, 2016
@ojss
Copy link
Contributor

ojss commented Sep 17, 2016

@olalonde I think we can use regular expressions to check for the validity of the encoding.

@addaleax
Copy link
Member

I think we can use regular expressions to check for the validity of the encoding.

Well, yeah, the user can do that if it’s necessary (and I’d be curious why it would be necessary), so I’m not sure it’s important to have that extra validation in core.

@ojss
Copy link
Contributor

ojss commented Sep 18, 2016

@addaleax Sorry I should have been more clear. That's what I meant.

On 18-Sep-2016 2:22 AM, "Anna Henningsen" notifications@github.com wrote:

I think we can use regular expressions to check for the validity of the
encoding.

Well, yeah, the user can do that if it’s necessary (and I’d be curious why
it would be necessary), so I’m not sure it’s important to have that extra
validation in core.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#8569 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AKL6ATmkH9rTRdmeofmiH8BIquLXt2vTks5qrFMggaJpZM4J_c_9
.

@olalonde
Copy link
Contributor Author

olalonde commented Sep 22, 2016

Whether this belongs core is subjective. Regex would work but even for base64, it's not as straightforward as it may seem, e.g. the regex that https://github.com/kevva/base64-regex/blob/master/index.js uses: '(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)'; or this stackoverflow answer.

One scenario I had in mind was using Buffer.from with the expectation that it will throw if the encoding is invalid, for example when implementing a strict protocol that requires you to reply with an error message if the encoding of some value is invalid. Right now this can't be done with Buffer.from alone.

@bnoordhuis
Copy link
Member

Buffer.from('aGV sbG8=', 'base64')

Real-world base64 frequently has extra whitespace or missing == trailers.

History factoid: one of my first contributions was a series of base64 decoder performance improvements, followed a few days later by a set of fix-ups because people complained I made it too strict. :-)

@addaleax
Copy link
Member

@olalonde It might help if you could lay out why you think this would be useful to have in Node core. I realize it can be useful, but what advantage would that have over userland alternatives (which, as you point out, exist)?

@jasnell
Copy link
Member

jasnell commented Sep 23, 2016

Well, the tricky part to validation here is the performance hit. Validating the encoding can be quite expensive if we're expected to do it every time.

@addaleax
Copy link
Member

I don’t think we really have to worry about that, anything like this would probably have go into its own API anyway?

@olalonde
Copy link
Contributor Author

It could be a a strict option passed to Buffer.from. In that mode, Buffer.from would throw when it sees invalid bytes instead of ignoring/replacing them.

@jasnell
Copy link
Member

jasnell commented Sep 23, 2016

Perhaps yeah. We could, I suppose, add a new option to Buffer.from() that says "Validate this input" but I'm not sure there's enough value in doing so vs. validating when reading the data out of the Buffer instance. For utf8 and utf16 it's trivial to simply validate when reading the data. I guess what I'm trying to say is that a strict option would only really be useful for base64 because the other encodings are easily validated on read... and since it would only be useful on base64, does it make sense to add it to core. I'm really not sure.

@dchest
Copy link

dchest commented Mar 13, 2017

Real-world base64 frequently has extra whitespace or missing == trailers.

It's fine to skip whitespace or padding characters — you can always encode such result back to the correct base64, but returning incorrect results as in:

var x = Buffer.from("вот зе фак", "base64") // -> <Buffer d8 1e f9 0f 4f>
var y = x.toString('base64') // -> '2B75D08='

instead of throwing error is unreasonable and will lead to data corruption and vulnerabilities.

https://tools.ietf.org/html/rfc4648#section-3.3

@jorangreef
Copy link
Contributor

@dchest, I just finished https://github.com/ronomon/base64 as a stop-gap until these and other base64 issues are fixed. The decoder will skip whitespace or missing padding characters but it will detect and throw an exception for corrupt or truncated base64 without additional performance penalty. The decoder is also nearly 2x faster when decoding wrapped base64.

@4miners
Copy link

4miners commented Jul 19, 2017

This issue affects also hex encoding, and documentation is not clear about how it will behave in such cases.

// Node 7.10.0
var x = Buffer.from('a2zza2 even!', 'hex');
var y = x.toString('hex'); // a2

var x = Buffer.from('a2zza2 odd!', 'hex'); // TypeError: Invalid hex string
var y = x.toString('hex');

// Node 8.0.0
var x = Buffer.from('a2zza2 even!', 'hex');
var y = x.toString('hex'); // a2

var x = Buffer.from('a2zza2 odd!', 'hex');
var y = x.toString('hex'); // a2

@targos targos added the stalled Issues and PRs that are stalled. label Nov 4, 2018
jasnell added a commit to jasnell/node that referenced this issue Jul 6, 2020
@jasnell jasnell added doc Issues and PRs related to the documentations. and removed feature request Issues that request new features to be added to Node.js. labels Jul 6, 2020
@jasnell jasnell closed this as completed in 14ac6e4 Jul 9, 2020
MylesBorins pushed a commit that referenced this issue Jul 14, 2020
Fixes: #8569

PR-URL: #34227
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
MylesBorins pushed a commit that referenced this issue Jul 16, 2020
Fixes: #8569

PR-URL: #34227
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
@earonesty
Copy link

This issue is closed, however, the base64 decoding doesn't just "ignore spaces"... in the referenced documentation. It ignores any invalid character. At the very least there should be a "{strict:true}" option to from() that raises exceptions when you : pass base64 stuff to hex decodings, pass an array to base64 decoding, etc.

addaleax pushed a commit that referenced this issue Sep 22, 2020
Fixes: #8569

PR-URL: #34227
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
addaleax pushed a commit that referenced this issue Sep 22, 2020
Fixes: #8569

PR-URL: #34227
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. doc Issues and PRs related to the documentations. stalled Issues and PRs that are stalled.
Projects
None yet
Development

Successfully merging a pull request may close this issue.