-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for zlib data format (RFC 1950) #2236
Comments
If adding |
I am forced to include a Zlib wrapper project and build a local nuget to keep my project as portable as possible. We support Windows Server 2016/2019, Ubuntu 16/18/20LTS, CentOS 8.1, and Debian 9/10 against .NET Core 3.1. Due to high concurrency and throughout requirements I can't use a fully managed solution or streams that have allocations. I am working on backend projects that utilize existing game client protocols, so It would be nice if I could reference the zlib native library directly, or a native library nuget is exposed (like libuv). Lastly, I don't have access to modify the client code so I can't include headers/footers, nor would I want to. I was looking forward to zlib implementation since ~2007 for multiple projects, but it looks like it is still not usable for my use cases. |
@stephentoub unless this is committed for the 5.0 release, we should mark Future. It can still happen for 5.0, but milestone 5.0 == "debt to pay off" to successfully release. |
We have a bug #38022. That needs to be fixed in some way for .NET 5. Implementing this is the right way to fix it. If we choose to punt on this, we can do the short term workaround there (which isn't a fix so much as disabling the functionality), but that's why I marked this as 5.0, to make a decision on the right thing to do. From my perspective, this whole thing can be done in a day. |
Oh that's a good reason for the milestone then. 😸 |
For reference, other than tests, this is what the solution looks like: |
@kamronbatman I've been developing a managed solution that is much faster than any other managed implementation (sometimes faster than DeflateStream) and allocates the same as DeflateStream. Dunno if it's of any interest as the planned implementation here does not expose many of the useful zlib options. (levels + mode) I'm not finished cleaning up yet but will release something soonish. |
Thanks @JimBobSquarePants, I have been wrapping native zlib for my own purposes into a cross-platform compatible nuget. I'll definitely take a look at what you wrote and hopefully it can be used. One advantage of my use case is that the data is small, so streaming is not even needed. @danmosemsft, I hear quite a bit about .NET 5 milestones, how does it work for .NET Core? Is it a separate implementation that is needed? |
.NET 5 is the next version of .NET Core (it's what one you might otherwise expect would be named .NET Core 4.0). More info here https://devblogs.microsoft.com/dotnet/introducing-net-5/ ... Milestone is described here #38286 |
namespace System.IO.Compression
{
public class ZLibStream : Stream
{
public ZLibStream(Stream stream, CompressionLevel compressionLevel);
public ZLibStream(Stream stream, CompressionMode mode);
public ZLibStream(Stream stream, CompressionLevel compressionLevel, bool leaveOpen);
public ZLibStream(Stream stream, CompressionMode mode, bool leaveOpen);
public Stream BaseStream { get; }
}
} |
@carlossanlop note that @stephentoub is on leave for a while (just in case you were assuming he'd complete his change above). Perhaps a community member is interested in taking it to PR. |
Question, will this still generate headers/footers? If so, will there be an option to not? |
Looks like I'll be carrying on my my implementation. This misses too many features. |
@JimBobSquarePants, what features specifically? Just "levels + mode" as you cited earlier, or something more / deeper? Thanks. |
@stephentoub I'd consider the lack of compression levels and strategy to be a significant missing features. |
I'm not arguing about significance, rather I'm trying to understand the scope of the key things you believe are significant. i.e. Is your primary concern the lack of a constructor like: public ZLibStream(Stream stream, SomeType compressionLevel, SomeOtherType strategy); or does it go beyond that? |
If constructors like the one in your example were available offering the full range of compression levels and strategy I would consider that sufficient for my use case. However, I cannot speak for others who would like to utilize additional zlib stream functionality like flushing strategies. |
Someone also requested the ability to read metadata in another thread, maybe it is related: |
Thanks. Since this issue has had its api approved and the additional surface area you're referring to then would be pure addition on top of that (e.g. new ctors, new flush methods, etc.), we can go ahead and implement this and then discuss additional support separately. Please open a separate issue with the additional APIs proposed you're hoping to see. Thanks! |
@stephentoub How would you suggest compression levels 0-9 be added? You'll now have two separate enumerations that represent sets of the same property? I've watched the API review video and it's painfully obvious that the individuals doing the review have little understanding of zlib yet they identify that the https://www.youtube.com/watch?v=7YpDyRMaDKE&t=1h17m28s Why not implement the API properly from the start? |
In a way that's applicable to DeflateStream and GzipStream and BrotliStream as well. The features being requested here cross the other streams that have a similar API, whether it be additional overloads of Flush that accept a strategy value, or a ctor that takes an integer value for a compression level, etc. I understand you don't like it. But there's a lot to be said for consistency with the existing APIs that have worked well for many people for years. And then extending them all to account for the additional support desired in a consistent fashion.
The enum you cite already exists, and needs to work with the new type. And if we added a new enum, it would need to work with the existing types, too. |
Yes, it already exists and it's a poor representation of the available values in almost every instance it have been forced into with different meanings for what Optimal actually represents. For example: There was some flipflopping on the implementation in these for Zlib. Of course, the docs say both should represent the best possible compression.
But now there's SmallestSize? For Brotli it appears you can only set the compression to a custom value using the separate I would imagine, for consistency, I would have to suggest something similar. It's not just a case of me not liking something. The leaky abstraction that is |
Thankfully, docs are mutable 😉 If they need to be improved, they can be. Please feel free to open issues and/or PRs in https://github.com/dotnet/dotnet-api-docs.
It's not meant to be a perfect representation of all available values. It's meant to be a high-level option that provides some expression of preference. You're right that it lacks full fidelity to a particular underlying implementation; that was never the goal. And from what I've seen, it's sufficient for the majority of developers / needs, while also being easier than having to understand passing an arbitrary number and knowing what it means.
And they can be revisited as needed. With SmallestSize being added, it's possible changing Optimal is the right answer for Brotli; I doubt there are significant dependencies on the exact behavior, but if there are, well, one of the nice things about .NET Core vs .NET Framework is Core is more accepting of that level of breaking change between major versions. Optimal is meant to mean a good all-around choice.
That is your opinion; I respectfully disagree. From my perspective, these higher-level Stream APIs make it straightforward for someone to either simply not care at all about the level of compression employed and trust in the system to do something reasonable: new BrotliStream(stream, CompressionMode.Compress);
new DeflateStream(stream, CompressionMode.Compress);
new GZipStream(stream, CompressionMode.Compress);
new ZLibStream(stream, CompressionMode.Compress); or to exert a preference for speed vs size, with options on the scale from NoCompression to Fastest to Optimal (which may have been better named as Balanced or Default or something like that) to SmallestSize: new BrotliStream(stream, CompressionLevel.NoCompression);
new DeflateStream(stream, CompressionLevel.Fastest);
new GZipStream(stream, CompressionLevel.Optimal);
new ZLibStream(stream, CompressionLevel.SmallestSize); Yes, in both cases, the developer is relying on the implementation respecting those preferences, and it does. And yes, the API doesn't expose the nitty-gritty of every possible variation the underlying implementation enables; again, that was by-design, and I do not see how that makes it "wrong" or "a mess". And yes, additional APIs could be considered that did expose the more fine-grained knobs. Which is exactly what was done when the later, lower-level, more advanced BrotliEncoder and BrotliDecoder were exposed, for scenarios where a developer is more interested in managing more themselves. I don't know if those APIs have proven to be useful or not, but if they have been and the scenarios exist for this, I don't see anything wrong with providing parallel ZLibEncoder and ZLibDecoder structs; that would be reasonable to propose. |
I concur with that approach. Instead of exposing the underlying libraries only through restrictive opinionated API/knobs, providing access to bare minimum low level wrappers for each compression libs would help more fine grained control. |
A ZLibEncoder with no headers/footers which ran at least close to as fast as the native zlib library would work for my use-case. I can't really justify the use of a stream, especially since most buffers are stackalloc or rented memory of <= 64k. |
The point I'm trying to make is that if I've chosen a specific compression algorithm I am already in the advanced scenario. I've reviewed each format and have chosen one based upon my requirements. I should now be able to use the basic features of that format without having to use a separate struct. It's like choosing jpeg over png, or json over xml. I have no issue at all with having separate structs that work directly against a span for specific high-performance scenarios. I simply consider them a separate concern from a basic I strongly believe that the addition of constructors and properties that allow basic Zlib functionality should not require opening a new issue. So that said I would like to have the following additional constructor:
I've chosen to omit the option for suppressing a header for now as that is considered an advanced use case for Zlib where passing a negative value for Where public enum ZlibCompressionLevel : int
{
DefaultCompression = -1,
Level0 = 0,
NoCompression = Level0,
Level1 = 1,
BestSpeed = Level1,
Level2 = 2,
Level3 = 3,
Level4 = 4,
Level5 = 5,
Level6 = 6,
Level7 = 7,
Level8 = 8,
Level9 = 9,
BestCompression = Level9,
} public enum ZlibCompressionStrategy : int
{
DefaultStrategy = 0,
Filtered = 1,
HuffmanOnly = 2,
Rle = 3,
Fixed = 4
} There's also an additional property that should be considered: public ZlibFlushMode FlushMode { get; set; } Defined as the following. There's already a partial implementation, public enum ZlibFlushMode : int
{
NoFlush = 0,
PartialFlush = 1,
SyncFlush = 2,
FullFlush = 3,
Finish = 4,
} This property should be passed to every call to call to the native deflate and inflate methods. Below is the documentation from ZLib regarding the property.
As a side note I would suggest similar additions should be made to |
This issue was opened in Jan. It was approved in July. There is already a PR out implementing it. The additional suggestions, which don't negate any of the existing API, weren't explicitly proposed until just now (you commented previously briefly about desired functionality, but without any specificity or background), will still need to be iterated on, and will need be to be reviewed separately. And it should ideally be done in way where all the compression streams gain similar functionality, even if the respective ctors take slightly different arguments or give slightly different meanings to those arguments. Please open a new issue dedicated to the new surface area being proposed; it is the process we utilize in this repo. Thank you. cc: @terrajobst |
Now you're just being obtuse. Nobody replied to my initial comment. |
I think we'd like to standardize this across all the compression streams ( There is also a desire is to unblock the PR for |
@terrajobst @stephentoub Since 3.1 is LTS, is there any chance of backported support for this? EDIT: I would understand if y'all just don't want to spend the extra time on this small feature though. |
Unless there is a super strong business reason we generally don't backport features/new APIs. In this case the answer is most likely no. |
Oh GitHub buttons. One day I will not mess them up. But today is not that day. |
Currently .Net doesn't support the zlib data format despite the majority of the work having been done by DeflateStream. I propose a new class be added similar to GZipStream to support RFC 1950.
Personally I'm encountering the zlib format in two areas. Many games depend on zlib, and save files using this format. We also use zlib in the firmware of one of our hardware products. We save debug data and compress it with zlib. For both of these scenarios, we use C# tooling, on the desktop, to operate on these files whether it is for the purpose of reading, modifying, or writing.
Due to the lack of support in .Net I either have to use lame hacky methods, or rely on a separate third party library for the functionality. This is such a shame when .Net already provides the majority of what's needed, and it should be trivial (as far I can tell) to include this as part of .Net.
While decompression is quite simple to do by skipping the 2-6 byte header, and 4 byte CRC at the end, I have trouble with compression. I wouldn't know what the proper header should be when using DeflateStream to do the compression, not to mention having to include a computation for the Adler32 checksum.
Here's a quick untested example of what I currently have to do for decompression to give you an idea of the "lame hacky-ness": (Note: I'm not checking the checksum here. I'm assuming the data is good.)
Proposed API
The text was updated successfully, but these errors were encountered: