-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zlib: Fix gzip member header/input buffer boundary issue #5883
Conversation
Mostly LGTM but I think this is a semver-major change? Can you remove the (now unused) GZIP_ defines from src/node_zlib.cc? |
@bnoordhuis And regarding semver: Not sure – If so, then #5120 was semver-major in the first place. The current (both master and 5.9.x) behaviour is to throw an error upon encountering trailing garbage only if that starts with So, yes, possibly reverting back to the pre-5.9 behaviour of silently discarding trailing garbage in all cases is a good idea for 5.x, and implementing that should certainly be feasible. But I completely agree with your comment here, throwing an error should be the right behaviour. |
32c7d28
to
c8755e7
Compare
Gray area, I suppose. Maybe one or two people from @nodejs/ctc can weigh in? Change itself LGTM. CI: https://ci.nodejs.org/job/node-test-pull-request/2052/ |
For future reference and to the best of my knowledge:
¹ never actually inspected by any Node.js version, assumed to be null byte padding And for the record, I don’t think trailing garbage is much of a real-world problem, but I’m not sure how relevant that fact is. |
@@ -43,7 +43,6 @@ enum node_zlib_mode { | |||
|
|||
#define GZIP_HEADER_ID1 0x1f | |||
#define GZIP_HEADER_ID2 0x8b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These define
s are unused now right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, but #5884 would start using them again, so I’d leave them in unless it gets decided that that should not get landed.
It's hard to debate the relevance of trailing garbage without any idea of prevalence in the wild. I'm on the fence regarding the behaviour of the 1 member + garbage case, I can see the argument for both sides. Regarding semver-major, #5120 pretty much only changed in a breaking way the behaviour of the last case in your table, EDIT: I was thinking of the case here #5863. @addaleax you're absolutely right regarding semver-major on this change. |
Yep, I can definitely see why one would count this as semver-major, and personally I don’t think I have a strong opionion on that. But keep in mind that this PR still mostly contains a bugfix: For the case that a new member’s header wraps around the input buffer boundary. So even if this does not land in some 5.x version, I’d still argue that a modified patch (which keeps the current trailing-garbage behaviour or restores the old one) should. |
Yes, I agree 100%.
EDIT: turns out this is taken care of. |
@@ -38,3 +41,26 @@ fs.createReadStream(pmmFileGz) | |||
assert.deepStrictEqual(Buffer.concat(pmmResultBuffers), pmmExpected, | |||
'result should match original random garbage'); | |||
})); | |||
|
|||
// test that the next gzip member can wrap around the input buffer boundary | |||
[0, 1, 2, 3, 4, defEncoded.length].forEach((offset) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kthelgason I think testing for 0
should take care of the case where data members fall exactly on input boundaries:
The first unzip.write()
call writes abcEncoded
+ 0 bytes from defEncoded
, and the second unzip.end()
call writes all of defEncoded
. Or did you have something else in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course, you're absolutely right.
I agree that it's technically a semver-major but this definitely looks like a grey area and I agree with the comments that an error is the right thing to do. I think I'd be ok with this landing as a semver-minor. |
The change LGTM |
Btw, this is what re-allowing trailing garbage would look like – wouldn’t be a big deal. |
Make sure that, even if an `inflate()` call only sees the first few bytes of a following gzip member, all members are decompressed and part of the full output. This change also modifies behaviour for trailing garbage: If there is trailing garbage which happens to start with the gzip magic bytes, it is no longer discarded but rather throws an error, since we cannot reliably tell random garbage from a valid gzip member anyway and have to try and decompress it. (Null byte padding is not affected, since it has been pointed out at various occasions that such padding is normal and discarded by `gzip(1)`, too.) Adds tests for the special case that the first `inflate()` call receives only the first few bytes of a second gzip member but not the whole header (or even just the magic bytes).
c8755e7
to
e6af8b9
Compare
Rebased & squashed this into one commit. |
Thanks Anna, landed in 54a5287. |
Make sure that, even if an `inflate()` call only sees the first few bytes of a following gzip member, all members are decompressed and part of the full output. This change also modifies behaviour for trailing garbage: If there is trailing garbage which happens to start with the gzip magic bytes, it is no longer discarded but rather throws an error, since we cannot reliably tell random garbage from a valid gzip member anyway and have to try and decompress it. (Null byte padding is not affected, since it has been pointed out at various occasions that such padding is normal and discarded by `gzip(1)`, too.) Adds tests for the special case that the first `inflate()` call receives only the first few bytes of a second gzip member but not the whole header (or even just the magic bytes). PR-URL: #5883 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com>
@bnoordhuis Thanks! I would actually be more comfortable with having this marked as semver-major and backporting this bugfix in a way that keeps the current (v5.x) behaviour for trailing garbage. Would that be okay? (And if so, would creating a PR against the v5.x branch be the correct way to do that?) |
Okay, I updated the labels. PRing against v5.x sounds good; v5.x-staging is the branch you want. |
Sorry to ask, but there is no v5.x-staging branch here? |
Oh, that's right; I remember being baffled by that a while ago. It should be fine to target v5.x in that case. @jasnell @thealphanerd What's the reason there isn't a v5.x-staging branch like there is for v0.10/v0.12 and v4? |
@bnoordhuis afaik the staging convention has been primarily used for LTS... but tbqh it would be useful for v5 as well and likely allow for quite a bit less pain when cutting releases. Not 100% where an issue should be made for this though, ctc? |
I don't think an issue is necessary. If other @nodejs/release people feel it's a good idea, just go ahead and do it. Maybe file an informational issue afterwards as a memo for contributors. |
Make sure that, even if an `inflate()` call only sees the first few bytes of a following gzip member, all members are decompressed and part of the full output. Adds tests for the special case that the first `inflate()` call receives only the first few bytes of a second gzip member but not the whole header (or even just the magic bytes). This is a backport of nodejs#5883 and contains additional changes to make sure that the behaviour on encountering trailing garbage remains the same (namely to silently discard it if one full member has already been decompressed). #PR-URL: nodejs#5883 #Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> #Reviewed-By: James M Snell <jasnell@gmail.com>
Make sure that, even if an `inflate()` call only sees the first few bytes of a following gzip member, all members are decompressed and part of the full output. Adds tests for the special case that the first `inflate()` call receives only the first few bytes of a second gzip member but not the whole header (or even just the magic bytes). This is a backport of #5883 and contains additional changes to make sure that the behaviour on encountering trailing garbage remains the same (namely to silently discard it if one full member has already been decompressed). PR-URL: #5973 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com>
Pull Request check-list
make -j8 test
(UNIX) orvcbuild test nosign
(Windows) pass withthis change (including linting)?
test (or a benchmark) included?
Affected core subsystem(s)
zlib
Description of change
Make sure that, even if an
inflate()
call only sees the first few bytes of a following gzip member, all members are decompressed and part of the full output.This change also modifies behaviour for trailing garbage: If there is trailing garbage which happens to start with the gzip magic bytes, it is no longer discarded but rather throws an error, since we cannot reliably tell random garbage from a valid gzip member anyway and have to try and decompress it.
(Null byte padding is not affected, since it has been pointed out at various occasions that such padding is normal and discarded by
gzip(1)
, too.)The problem resolved by this is mentioned here, but definitely not as likely to occur in the wild, since it does not affect the decompression of gzip files containing only a single member.
/cc @kthelgason