Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer: fix Buffer.isEncoding() return value for empty string #12847

Conversation

DavidCai1111
Copy link
Member

@DavidCai1111 DavidCai1111 commented May 5, 2017

For now the return value of Buffer.isEncoding('') is true which seems not the preferred behavior:

> Buffer.isEncoding('')
true

This PR is to fix this.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

@nodejs-github-bot nodejs-github-bot added the buffer Issues and PRs related to the buffer subsystem. label May 5, 2017
lib/buffer.js Outdated
@@ -375,6 +375,7 @@ Buffer.compare = function compare(a, b) {

Buffer.isEncoding = function(encoding) {
return typeof encoding === 'string' &&
encoding !== '' &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer encoding.length > 0 &&

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mscdex Updated, PTAL :=)

@jasnell
Copy link
Member

jasnell commented May 5, 2017

Hmm... given that '' is generally equivalent to undefined, and the behavior is to default to utf-8 when encoding is undefined, I actually think this is expected behavior. It may not be the preferred behavior, tho. Either way, I think this would need to be a semver-major change.

@DavidCai1111 DavidCai1111 force-pushed the buffer/is-encoding-empty-string branch from 3acad90 to 9e55470 Compare May 5, 2017 08:59
@DavidCai1111 DavidCai1111 added the semver-major PRs that contain breaking changes and should be released in the next major version. label May 5, 2017
@@ -17,7 +17,8 @@ const assert = require('assert');
assert.strictEqual(Buffer.isEncoding(enc), true);
});

[ 'utf9',
[ '',
'utf9',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this change but I think it makes sense to add undefined.

Copy link
Member Author

@DavidCai1111 DavidCai1111 May 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lpinca Done :=)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and null, please

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sam-github Done, updated :=)

Copy link
Contributor

@sam-github sam-github May 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that after line 35 the encoding should actually be used (perhaps Buffer.from()?), its important that the isEncoding() should agree with actual behaviour of APIs that use encodings, and its not clear to me ATM that it does, so I think the assert should ensure consistency.

Copy link
Member Author

@DavidCai1111 DavidCai1111 May 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sam-github So i think the problem we are facing now is that those APIs which use encodings identify invalid encodings inconsistently, and we can't tell whose behavior is actually right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea, I just did a quick check, and '' was accepted as an encoding, but you said my node version was not master, so perhaps my check was wrong? I'm not entirely clear on what the purpose of the API is, but if it is intended to find encoding values that don't work, it should agree with usage.

So, are various Buffer APIs inconsistent? If so, that would be a good thing to fix! :-)

Copy link
Contributor

@cjihrig cjihrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I thought this was by design, but I'm not sure. cc: @nodejs/buffer

@DavidCai1111
Copy link
Member Author

@jasnell Hum... so I've edited the PR description (s/expected/preferred/) and added a semver-major label to it 🤔

@DavidCai1111 DavidCai1111 force-pushed the buffer/is-encoding-empty-string branch from 9e55470 to 539dd23 Compare May 5, 2017 09:23
@benjamingr
Copy link
Member

I think this behavior is by-design. isEncoding is described as:

Returns true if encoding contains a supported character encoding, or false otherwise.

Since '' can be passed as an encoding, it should return true.

@benjamingr
Copy link
Member

I'm fine with this change if the CTC signs off on it and CITGM passes without incident. I think the current behavior is marginally better - but I have no strong feelings about that - and I want the change to be an informed decision by the project.

@sam-github
Copy link
Contributor

I have same questions as above, '' appears to be a valid encoding:

> Buffer.from("ab", '')
<Buffer 61 62>
> Buffer.from("ab", 'sam')
TypeError: "encoding" must be a valid string encoding
    at fromString (buffer.js:199:11)
    at Function.Buffer.from (buffer.js:104:12)
    at repl:1:8
    at ContextifyScript.Script.runInThisContext (vm.js:23:33)
    at REPLServer.defaultEval (repl.js:339:29)
    at bound (domain.js:280:14)
    at REPLServer.runBound [as eval] (domain.js:293:12)
    at REPLServer.onLine (repl.js:536:10)
    at emitOne (events.js:101:20)
    at REPLServer.emit (events.js:191:7)
> Buffer.from("ab", '')
<Buffer 61 62>
> Buffer.from("ab", undefined)
<Buffer 61 62>
> Buffer.from("ab", null)
<Buffer 61 62>

@lpinca
Copy link
Member

lpinca commented May 5, 2017

@sam-github so are null and undefined in that case but

> Buffer.isEncoding(undefined)
false
> Buffer.isEncoding(null)
false
>

@DavidCai1111
Copy link
Member Author

DavidCai1111 commented May 5, 2017

@sam-github And when it comes to buf.toString in master branch, '' and null both become to invalid...:

> Buffer.from('foo').toString('')
TypeError: Unknown encoding:
    at stringSlice (buffer.js:558:9)
    at Buffer.toString (buffer.js:594:10)
    // ...
> Buffer.from('foo').toString(null)
TypeError: Unknown encoding: null
    at stringSlice (buffer.js:558:9)
    at Buffer.toString (buffer.js:594:10)
    // ...
> Buffer.from('foo').toString(undefined)
'foo'

@DavidCai1111 DavidCai1111 force-pushed the buffer/is-encoding-empty-string branch from 539dd23 to 78b41da Compare May 5, 2017 14:26
@DavidCai1111
Copy link
Member Author

DavidCai1111 commented May 5, 2017

@@ -17,7 +17,8 @@ const assert = require('assert');
assert.strictEqual(Buffer.isEncoding(enc), true);
});

[ 'utf9',
[ '',
'utf9',
Copy link
Contributor

@sam-github sam-github May 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that after line 35 the encoding should actually be used (perhaps Buffer.from()?), its important that the isEncoding() should agree with actual behaviour of APIs that use encodings, and its not clear to me ATM that it does, so I think the assert should ensure consistency.

@BridgeAR
Copy link
Member

@DavidCai1993 this needs a rebase and there is one open comment

@shinnn
Copy link
Contributor

shinnn commented Sep 19, 2017

Note:

  1. This PR does almost the same change as buffer: empty string is not a valid encoding #9661 does, though buffer: empty string is not a valid encoding #9661 is already closed.

  2. Probably it's better to reach a consensus on the design of Buffer.isEncoding in Buffer.isEncoding regards an empty string as a valid encoding #9654 first of all. The goal of this PR seems different from the most supported suggestion in Buffer.isEncoding regards an empty string as a valid encoding #9654:

    Another alternative is to simply state in the documentation that Buffer.from() accepts them from legacy reasons but that Buffer.isEncoding() is the one source of truth. Not a bad option either.

@BridgeAR
Copy link
Member

Closing due to long inactivity. There is actually little to do besides rebasing and the single comment and it would make sense to reopen or to open a new PR if someone wants to follow up on this.

@BridgeAR BridgeAR closed this Nov 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants