Node's zlib.ungzip does not support concatenated files #4306

yahao87 · 2015-12-16T07:50:32Z

why did incorrect operate zlib.ungzip?

It is operates in C#.net
but don't operate in nodejs.(windows7, node ver. 5.2.0)
Most of the well, but a problem appears in some of the files.

in C# (with ICSharpCode)

using ICSharpCode.SharpZipLib.Core;
using ICSharpCode.SharpZipLib.GZip;
using (GZipInputStream gzipStream = new GZipInputStream(fs))
{
    // operate very well
}

in NodeJs

var fs = require('fs');
var zlib = require("zlib");
var file = fs.readFileSync(sFileFullPath);
var extractBin = zlib.gunzipSync(file);
console.log("Compressed buffer  size" + file.length);      // 134079
console.log("Uncompressed buffer size" + extractBin.length);   // 14

Why it did reduce the size?
Here's the file in question.

http://static.kafra.kr/file/patch.rgz

I'm sorry bad English.

addaleax · 2015-12-17T08:25:29Z

Your example gzip file actually consists of multiple streams, i.e. is the concatenation of more than one “raw” gzip file. See e.g. the gzip(1) manpage for some explanations on how the gzip command line program handles this.

Node.js, however, ignores anything after the first stream, which in your case is only 14 bytes long. A bit more simplified:

var abc = zlib.gzipSync('ABC');
var def = zlib.gzipSync('DEF');
var decompressed = zlib.gunzipSync(Buffer.concat([abc, def]));
console.log(decompressed.toString()); // => ABC (!)

There was a fix for this a while ago, which resulted in a regression and then was reverted again; The story so far (as far as I can tell):
nodejs/node-v0.x-archive#6032
nodejs/node-v0.x-archive#6442
nodejs/node-v0.x-archive#8962
nodejs/node-v0.x-archive#8985

yahao87 · 2015-12-18T02:57:36Z

I understand.
Why not support recursion like other language's library?
hmm...
Thank you for answer.

addaleax · 2015-12-20T10:52:58Z

I’d totally be in favour of supporting concatenated input files. Maybe someone here has the time to look into this; If people here agree that that would be the intended behaviour, I’d be really interested in looking into this, but I’m not sure I can make any promises on when I’d find time to do that.

Fishrock123 · 2016-01-07T17:52:24Z

cc @chrisdickinson, @indutny, @trevnorris, @bnoordhuis -- You all seemed to have dealt with the aforementioned issues, can any of you give more context?

@addaleax Thanks for triaging this! :)

bnoordhuis · 2016-01-07T18:44:16Z

IIRC the problem was that the streams paradigm doesn't work so well for concatenated archives? For the sync versions, we could create an overload or alternative that returns an array or an iterator. For the streaming versions, I'm not so sure; maybe an event for the unconsumed data that you can then feed back into a new zlib stream?

addaleax · 2016-01-07T22:24:36Z

the streams paradigm doesn't work so well for concatenated archives

Not sure whether this helps, but here might be a little confusion between terms here. This is not about providing multiple files to the end user (i.e. it is not about archives in the sense of a single file which in some way contains multiple files), but rather it’s about encoding a single file as the result of multiple, independent compression operations.
One (real-world) example use case would be parallel compression, i.e. splitting a file into multiple parts and compressing them independently, then concatenating the results. When using a command line tool or libraries in other environments (e.g. Python), the decoder then transparently outputs a single stream.
Other compression formats like bzip2 or xz also accept this format (the xz(1) tool has a --single-stream command line option to opt-out of this).

As far as I can tell, the problem with the original patch was that trailing zero bytes (as padding) caused decompression to fail. I’d still say that the default behaviour should be to allow decompression of a single file as concatenated streams, following the conventions set by the command line tools (and besides, it’s cool feature of the compression format).

chrisdickinson · 2016-01-08T00:14:29Z

Ah, I think I remember this. The terms are confusing, but @bnoordhuis has it right, I think.

There was a patch to make the zlib module compatible with concatenated zlib datastreams, but the problem was that it broke existing code. IIRC this was because:

Our current zlib streams stop consuming as soon as they reach the end of the first zlib datastream.
Modules in the ecosystem have made assumptions about this (notably, the tarball handling in npm.) Edit — it looks like the handling was that there were garbage bytes trailing a .tar.gz file, and those bytes were causing node to hang while the impl tried to restart the zlib datastream over and over.

The workaround, if you're interested in taking it on, is to expose either:

An option to the existing JS zlib streams that turns on concatenated datastream support, OR
A second set of JS zlib transform streams that the user can use with concatenated datastreams.

addaleax · 2016-01-08T23:40:32Z

@chrisdickinson Yes, you are correct; However, the underlying assumption in the npm’s tarball handling was not that only the first datastream from a gzip file would be extracted. npm expects the ability to extract .tar.gz files which are padded with trailing \0 bytes, and that is perfectly normal for gzip files. The original patch (1183ba4) did not account for that, but that’s an issue with the implementation of this feature, not with the feature itself.

So:

As already referenced in Add support for concatenated gzip files. node-v0.x-archive#6442, the RFC1952 explicitly mentions multiple streams as part of the format (and therefore I’d agree that this would be a bug fix, not really a feature). If Node.js really wants to diverge from the standard here, this should be documented somewhere.
Yes, 1183ba4 had incorrect error handling for trailing garbage. That can be fixed, though.
I don’t think this should affect decompression of the zlib format, only gzip. In gzip, it’s part of the specs, but for the raw zlib format, no such thing is mentioned anywhere nor do the libraries of e.g. other languages allow concatenated zlib streams.

I agree that it’s a good idea to make this optional, but it should be active by default. And yes, if it’s okay with you, I’d be interested in working on this, but I can’t make any promises on when I’d get this done (probably within the next 1 oder 2 weeks).

chrisdickinson · 2016-01-09T02:14:11Z

I'm definitely 👍 on getting support into core for this, and would love to see a PR for it — I'll note re:

I agree that it’s a good idea to make this optional, but it should be active by default.

The problem while defaulting this way is that the existing behavior has been out in the wild for a long time now, and folks have built programs that expect that behavior — it could potentially cause a lot of breakage without a commensurate upside. We're primarily beholden to the code that exists in the wild by way of our users, and we'd have to have a very good reason to release a version (even a major version) that breaks their code. For something like this we'd have to message changing the default way in advance (i.e., start warning that we're going to do it in the next major version, then actually flip the default in the major version after that.)

addaleax · 2016-01-09T03:42:50Z

Okay, I’ll start working on it! :)

And I get that this would be semver-major, I just can’t really imagine that there’s any application which would rely on the current first-stream-only behaviour. But I do see your point, and if you say this is the way it has to be done, then it’s like that.

kthelgason · 2016-02-01T01:19:41Z

@addaleax have you started working on this issue? If you have difficulty finding the time I can also work on this.

addaleax · 2016-02-01T02:04:26Z

@kthelgason Feel free to do that – I was busier than expected in the last weeks, sorry

kthelgason · 2016-02-01T20:47:19Z

@addaleax Great, I'd love too. You've made it really approachable for me with all the groundwork laid out in this thread 😄

kthelgason · 2016-02-04T21:02:26Z

If I understand correctly an issue with the previous implementation of this feature is that NULL-padded .gz files caused issues. According to rfc1952 the gzip file format is as follows (ephasis mine):

2.2. File format

      A gzip file consists of a series of "members" (compressed data
      sets).  The format of each member is specified in the following
      section.  The members simply appear one after another in the file,
      with no additional information before, between, or **after** them.

Doesn't this explicitly disallow trailing garbage? @addaleax you state above that padding with \0 is perfectly normal, is this a case of implementations diverging from the standard, or am I misunderstanding something?

addaleax · 2016-02-05T11:10:34Z

@kthelgason Yes, I’d agree that trailing \0 are a divergence from quoted section of the standard text.

Then again, the gzip(1) util is also explicitly mentioned in the RFC as a reference implementation, and its behaviour, namely to allow trailing \0 and print a warning about other trailing garbage (but effectively ignoring that too), makes sense to me. There is actually a bit of explanation on it in the manpage.

I have no idea how many .gz files in the wild are actually padded with zero bytes, but if it’s possible, I wouldn’t see any harm in supporting trailing zeroes.

According to the spec gzipped archives can contain more than one compressed member. Previously Node's gzip implementation would only unzip the first member and throw away the rest of the compressed data. Issue nodejs#4306 is an example of this occurring in daily use.

According to the spec gzipped archives can contain more than one compressed member. Previously Node's gzip implementation would only unzip the first member and throw away the rest of the compressed data. Issue #4306 is an example of this occurring in daily use. Fixes: #4306 PR-URL: #5120 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com>

mscdex added the zlib Issues and PRs related to the zlib subsystem. label Dec 16, 2015

Fishrock123 added the good first issue Issues that are suitable for first-time contributors. label Jan 7, 2016

Fishrock123 changed the title ~~why did incorrect operate zlib.ungzip?~~ Node's zlib.ungzip does not support concatenated files Jan 7, 2016

Fishrock123 removed the good first issue Issues that are suitable for first-time contributors. label Jan 7, 2016

kthelgason mentioned this issue Feb 6, 2016

zlib: support concatenated archives #5120

Closed

bnoordhuis closed this as completed in f380db2 Mar 15, 2016

addaleax mentioned this issue Apr 8, 2016

ZLIB failing to inflate entire .gz file that contains an extra field in the header #6116

Closed

puzrin mentioned this issue May 30, 2018

Failing for BGZIP'd streaming files nodeca/pako#139

Open

snyk-bot mentioned this issue Oct 16, 2020

[Snyk] Security upgrade mocha from 7.2.0 to 8.2.0 turkdevops/node#22

Merged

This was referenced Oct 17, 2020

[Snyk] Security upgrade mocha from 6.2.3 to 8.2.0 adamlaska/node#40

Open

[Snyk] Security upgrade mocha from 6.2.3 to 8.2.0 turkdevops/node#23

Open

This was referenced Nov 18, 2020

[Snyk] Security upgrade mocha from 7.2.0 to 8.2.0 turkdevops/node#32

Merged

[Snyk] Security upgrade mocha from 6.2.3 to 8.2.0 adamlaska/node#45

Open

[Snyk] Security upgrade mocha from 6.2.3 to 8.2.0 turkdevops/node#33

Open

vmarkovtsev mentioned this issue Aug 30, 2024

Cannot load a gzipped JSON trace with multiple blocks google/perfetto#872

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node's zlib.ungzip does not support concatenated files #4306

Node's zlib.ungzip does not support concatenated files #4306

yahao87 commented Dec 16, 2015

addaleax commented Dec 17, 2015

yahao87 commented Dec 18, 2015

addaleax commented Dec 20, 2015

Fishrock123 commented Jan 7, 2016

bnoordhuis commented Jan 7, 2016

addaleax commented Jan 7, 2016

chrisdickinson commented Jan 8, 2016

addaleax commented Jan 8, 2016

chrisdickinson commented Jan 9, 2016

addaleax commented Jan 9, 2016

kthelgason commented Feb 1, 2016

addaleax commented Feb 1, 2016

kthelgason commented Feb 1, 2016

kthelgason commented Feb 4, 2016

addaleax commented Feb 5, 2016

Node's zlib.ungzip does not support concatenated files #4306

Node's zlib.ungzip does not support concatenated files #4306

Comments

yahao87 commented Dec 16, 2015

addaleax commented Dec 17, 2015

yahao87 commented Dec 18, 2015

addaleax commented Dec 20, 2015

Fishrock123 commented Jan 7, 2016

bnoordhuis commented Jan 7, 2016

addaleax commented Jan 7, 2016

chrisdickinson commented Jan 8, 2016

addaleax commented Jan 8, 2016

chrisdickinson commented Jan 9, 2016

addaleax commented Jan 9, 2016

kthelgason commented Feb 1, 2016

addaleax commented Feb 1, 2016

kthelgason commented Feb 1, 2016

kthelgason commented Feb 4, 2016

addaleax commented Feb 5, 2016