Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan to migrate to zlib-ng #101465

Closed
18 tasks done
carlossanlop opened this issue Apr 24, 2024 · 33 comments
Closed
18 tasks done

Plan to migrate to zlib-ng #101465

carlossanlop opened this issue Apr 24, 2024 · 33 comments
Assignees
Milestone

Comments

@carlossanlop
Copy link
Member

carlossanlop commented Apr 24, 2024

For .NET 9, we are planning to replace our dependency of madler/zlib and zlib-intel with a dependency to zlib-ng.

The zlib-ng repo offers an alternative implementation of the zlib algorithm that:

  • Is compatible and optimized for next-generation platforms and architectures.
  • Adopted the known zlib-intel and Cloudflare optimizations.
  • Deleted old and obsolete code.
  • Adopted all known CVE fixes.
  • Is actively maintained by multiple people and has a quick turnaround for fixing security issues. Additionally, zlib-intel is out of support already.

Acceptance criteria

Tests

  • All tests must continue to pass, especially tests that read data.
  • If needed, adjust tests that measure specific output characteristics of compression, but they must pass.
  • Inputs which do not yield an Exception must continue to be Exception-free.
  • Add unmanaged tests to directly verify the p/invoked methods.

Threat modeling

  • Have a threat modeling exercise and make adjustments to the code if needed. Do this before collecting performance results.

Elapsed time

  • Elapsed time for optimal compression should not increase more than 10%. "Optimal" should be deferred to the library's inherent notion of default. If the increase is higher than 10%, we should discuss if we need to choose a less agressive value manually to keep results aligned with the previous version.
  • Elapsed time for fastest, smallest, and no-compression have no acceptance criteria, as they're entirely at the mercy of the underlying library. We may still collect and compare results.
  • Elapsed time for optimal decompression should also be verified, but no acceptance target is defined, as very little managed code is involved in the decompression stage, and there are no algorithmic choices to be had. But the measurements can help us determine if we need to invest in zlib-ng's performance.

Output size

  • Output size for optimal compression should not increase more than 10%, priority given to inclusive time. If the number differs due to the optimal compression being faster than baseline, then we need to decide if we should use a new manual value for optimal instead of deferring to the library default.

Managed memory

  • Managed allocations should not increase when compared to madler/zlib or zlib-intel.

Unmanaged memory

  • Peak unmanaged memory must be collected before and after. An acceptance criteria may be decided after comparing results, if needed.
  • Test for unmanaged memory leaks. If any are found, fix locally, then contribute upstream if applicable.

Servicing plan

  • No more than one release of .NET misses a given patch-release of zlib-ng. This takes in consideration the fact that zlib-ng does not have a predictable release cadence.
  • .NET moves to a newer major or minor version of zlib-ng within 3 months of the first patch-release or 6 months of the initial release, whichever is earlier. This aligns with zlib-ng's history of maintaining parallel version support.
  • zlib-ng takes security fixes as public PRs. There's a lag between accepting a fix and releasing it. We must be quick and reliably in taking patch commits, in addition to staying current on releases.
  • Any team member can update to a newer version of zlib-ng in less than 10 minutes.
    • Must test and document the updating process prior to releasing .NET 9.
    • Decide what the quick process will be and implement it, for example, writing a script.

Execution plan

Milestone 1: Migration

Milestone 2: Testing

  • 3 days - Run existing tests, ensure they pass.
  • 1 day - Verify correct results in all supported architectures and platforms in our CI.

Milestone 3: Performance

  • 2 weeks - Collect performance reports for before and after in Windows, Linux and MacOS, in x86, x64 and arm64 (Discuss if more configurations are needed). Compare results and discuss if action needs to be taken with the default values.

Milestone 4: Documentation

@carlossanlop carlossanlop added this to the 9.0.0 milestone Apr 24, 2024
@carlossanlop carlossanlop self-assigned this Apr 24, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

@jkotas
Copy link
Member

jkotas commented Apr 24, 2024

Discuss if we want to add a compat switch to allow users to consume their own zlib DLL

I do not think we should be adding compat switches like this proactively.

@omajid
Copy link
Member

omajid commented Apr 24, 2024

Fedora moved recently from zlib to zlib-ng: https://fedoraproject.org/wiki/Changes/ZlibNGTransition

.NET (VMR) builds in Fedora are already using this (since it's the system zlib now) instead of the original zlib project. And we have used the self-built-VMR (release/8.0) that uses zlib-ng to build itself a few times now. We have not noticed any functional bugs/issues on Linux on arm64, ppc64le, s390x or x64.

@ericstj
Copy link
Member

ericstj commented Apr 24, 2024

I do not think we should be adding compat switches like this proactively.

@jkotas we did this for ICU. We had a folks broken when we previously moved from managed implementation > zlib, and zlib > zlib-intel. I was imagining we'd need something as an escape hatch here. We wouldn't be carrying those alternative implementations but we could let folks bring their own. What worries you about providing a compat switch? Locking down the ABI between S.C.Compression and native?

@stephentoub
Copy link
Member

stephentoub commented Apr 24, 2024

I do not think we should be adding compat switches like this proactively.

@jkotas we did this for ICU. We had a folks broken when we previously moved from managed implementation > zlib, and zlib > zlib-intel. I was imagining we'd need something as an escape hatch here. We wouldn't be carrying those alternative implementations but we could let folks bring their own. What worries you about providing a compat switch? Locking down the ABI between S.C.Compression and native?

But there are actual and significant functional differences between both NLS and ICU and between ICU versions (with different data sets). I would not expect that to be the case with zlib; what specifically are we concerned about?

@carlossanlop
Copy link
Member Author

what specifically are we concerned about?

If something stops working, something that we didn't catch in testing, we would have a workaround to give to customers: enable the compat switch, use your own DLL (it can be the old zlib DLL), while we fix the issue.

@stephentoub
Copy link
Member

what specifically are we concerned about?

If something stops working, something that we didn't catch in testing, we would have a workaround to give to customers: enable the compat switch, use your own DLL (it can be the old zlib DLL), while we fix the issue.

That doesn't seem different from any other improvement we make. If we have blocking bugs, we service them.

@hopperpl

This comment was marked as off-topic.

@stephentoub
Copy link
Member

Can we add support for compression/decompression via Span? This was always missing; zlib required using a stream.

This shouldn't be much of a problem as the native library already has exported functions for it.

This is unrelated to which zlib implementation is used. It is tracked by #39327.

@ericstj
Copy link
Member

ericstj commented Apr 24, 2024

And #44793.

This issue is only about swapping the implementation of zlib.

But there are actual and significant functional differences between both NLS and ICU and between ICU versions (with different data sets). I would not expect that to be the case with zlib; what specifically are we concerned about?

Observable behavior. Size and content compressed payload. Maybe we'd want such a switch if we serviced this change in-place in 8.0 - but since we're doing in a major release we just accept this change as part of the churn.

@stephentoub
Copy link
Member

since we're doing in a major release we just accept this change as part of the churn.

+1. We have such considerations any time we patch our existing zlibs, and we had one commensurate with the move to zlib-ng when we moved to zlib-intel.

@carlossanlop
Copy link
Member Author

Thanks @jkotas @stephentoub and @ericstj for your input. Based on the feedback, I will remove the section for compat switch.

Maybe we'd want such a switch if we serviced this change in-place in 8.0

Speaking of which - I did not add a section to backport the zlib-ng migration to servicing branches. Do we need to consider this too?

@bartonjs
Copy link
Member

I did not add a section to backport the zlib-ng migration to servicing branches. Do we need to consider this too?

I believe the current plan is "no", and that by virtue of having built/redisted we're just on the hook for dealing with any problems in zlib-intel before .NET 8 goes out of support. After we see what the cost of the change is in main then tactics/servicing/etc might want to have a discussion... but we have no plan (to my knowledge) to do it proactively.

@carlossanlop
Copy link
Member Author

@lewing @steveisok @agocke @directhex do we also want to switch mono's dependency to zlib-ng, or do we want mono to keep consuming the old zlib copy?

I ask because we have, under src/native/external, one copy of zlib, and one copy of zlib-intel, and we currently reuse these copies for coreclr and mono.

Do you see any reasons why we should keep using zlib and zlib-intel in mono, or can we proceed to replace them with zlib-ng as well?

@jkotas
Copy link
Member

jkotas commented Apr 26, 2024

We should switch to zlib-ng everywhere.

@lambdageek
Copy link
Member

lambdageek commented Apr 26, 2024

I don't think we do anything particularly fancy with zlib. We should use whatever the rest of the repo uses. (I think we mostly use it to unpack PPDBs for the debugger). /cc @thaystg

@carlossanlop
Copy link
Member Author

We should switch to zlib-ng everywhere.

@jkotas When you say "everywhere", do you also mean we should stop looking for an installed zlib-ng package in unix distros and always consume the native copy that we will be carrying?

@carlossanlop
Copy link
Member Author

I was also made aware that as part of the mobile changes in runtime, we will need to update xamarin/xamarin-macios and xamarin/xamarin-android once they start targeting .NET 9. I'll add this to the work items.

@jkotas
Copy link
Member

jkotas commented May 23, 2024

do you also mean we should stop looking for an installed zlib-ng package in unix distros and always consume the native copy that we will be carrying?

Yes.

We may want to have switch to use system zlib-ng for distro builds, similar to how we have CLR_CMAKE_USE_SYSTEM_BROTLI for Brotli. Introduced by #66462 . cc @omajid

@carlossanlop
Copy link
Member Author

carlossanlop commented May 23, 2024

Ok.

It seems zlib-ng is not available among the default repositories in distros like Ubuntu. So I see two options:

  • Try to find the system zlib-ng like we used to do with zlib. If it's pre-installed in the distro, we proceed to use that. If it's not available, we proceed to use the native copy we carry.
  • Only use the system zlib-ng where it could be available where zlib-ng can be installed as an external package. But this would mean adding zlib-ng as a dependency of runtime, which could also mean adding an extra repository, but I'm not sure we want that restriction.

@jkotas
Copy link
Member

jkotas commented May 23, 2024

It seems zlib-ng is not available among the default repositories in distros like Ubuntu.

@omajid and @dotnet/source-build-contrib should provide the guidance for this.

It is ok to do this part in a follow up change. It does not need to be part of your initial enablement PR.

@omajid
Copy link
Member

omajid commented May 23, 2024

AFAIK, the system copy of zlib is used by default on Linux (so no need for a CLR_CMAKE_USE_SYSTEM_ZLIB on Linux yet). On some distros like Fedora, the system version of zlib is actually zlib-ng. And zlibng is used transparently in those environments by default. On Fedora 40, it looks like this:

$ ldd /usr/lib64/dotnet/shared/Microsoft.NETCore.App/8.0.4/libSystem.IO.Compression.Native.so
        linux-vdso.so.1 (0x00007fffa36dd000)
        libz.so.1 => /lib64/libz.so.1 (0x00007f5505c64000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f5505b81000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f5505994000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f5505d6d000)
$ rpm -qf /lib64/libz.so.1
zlib-ng-compat-2.1.6-2.fc40.x86_64

So the scenario that we want to handle is where the system has both zlib and zlib-ng and then we want the runtime to pick zlib-ng over zlib?

@carlossanlop
Copy link
Member Author

AFAIK, the system copy of zlib is used by default on Linux (so no need for a CLR_CMAKE_USE_SYSTEM_ZLIB on Linux yet). On some distros like Fedora, the system version of zlib is actually zlib-ng. And zlibng is used transparently in those environments by default. On Fedora 40, it looks like this:

$ ldd /usr/lib64/dotnet/shared/Microsoft.NETCore.App/8.0.4/libSystem.IO.Compression.Native.so
        linux-vdso.so.1 (0x00007fffa36dd000)
        libz.so.1 => /lib64/libz.so.1 (0x00007f5505c64000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f5505b81000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f5505994000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f5505d6d000)
$ rpm -qf /lib64/libz.so.1
zlib-ng-compat-2.1.6-2.fc40.x86_64

So the scenario that we want to handle is where the system has both zlib and zlib-ng and then we want the runtime to pick zlib-ng over zlib?

I had not thought about the scenario where zlib-ng is actually zlib, but yes, that's something we should consider too. And yes, if there's an installed version of zlib-ng, we want to use that one.

@carlossanlop
Copy link
Member Author

carlossanlop commented Jun 19, 2024

We should switch to zlib-ng everywhere.

@jkotas @steveisok @akoeplinger @jkoritzinsky @lambdageek I wanted to ask for some additional feedback from you all regarding making zlib-ng available everywhere, as I'm having ongoing conversations with different (and sometimes opposed) opinions, and I would like to make sure we're all on the same page.

  • In platforms like Android and iOS, I'm hearing an argument in favor of continuing to consume the system zlib instead of consuming our own new copy of zlib-ng, because if a security issue is found in zlib, it would take time to update the zlib-ng copy that we carry, whereas the system zlib would eventually get updated (probably much faster) by the platform owners (Android, Apple) via a system update. It seems this is the approach we follow for openssl.
  • We are currently seeing CI failures in my PR (I'm working on fixing them) in Wasi and Browser, because the llvm binaries we build currently link against system zlib, so the build then gets confused when we try to link our copy of zlib-ng. I was told there's concern that in Android/iOS, third-party libraries could also get build problems if they depend on zlib and they start seeing our new copy of zlib-ng.

In platforms where we have a concern, we could continue consuming the system-provided zlib (like in Android and iOS). That way we don't carry a copy that could have a security issue. It's my understanding that we already do something similar with our cryptographic features.

@jkotas
Copy link
Member

jkotas commented Jun 19, 2024

if a security issue is found in zlib, it would take time to update the zlib-ng copy that we carry, whereas the system zlib would eventually get updated (probably much faster) by the platform owners (Android, Apple) via a system update.

I do not think that this is a strong argument. If system zlibs security benefits outweighed better performance and consistency from carrying our own copy of zlib-ng, we would want to use system zlibs everywhere that it is possible. We are not doing that. We have convinced ourselves that we prefer the performance and consistency of zlib-ng.

I was told there's concern that in Android/iOS, third-party libraries could also get build problems if they depend on zlib and they start seeing our new copy of zlib-ng.

This problem is not limited to mobile. It exists with native AOT on all platforms. I am not particularly worried about it. It affects only a very small fraction of our customers and there are ways to deal with it.

I can buy the argument that it is not worth the trouble to deal with zlib-ng build issues and conflicts with system zlib on mobile and browser. I would be ok with staying with system zlib on these platforms.

@rolfbjarne
Copy link
Member

if a security issue is found in zlib, it would take time to update the zlib-ng copy that we carry, whereas the system zlib would eventually get updated (probably much faster) by the platform owners (Android, Apple) via a system update.

I do not think that this is a strong argument. If system zlibs security benefits outweighed better performance and consistency from carrying our own copy of zlib-ng, we would want to use system zlibs everywhere that it is possible. We are not doing that. We have convinced ourselves that we prefer the performance and consistency of zlib-ng.

One difference on mobile platforms is that size matters much more than on other platforms.

Using system zlib is basically free size-wise.

I was told there's concern that in Android/iOS, third-party libraries could also get build problems if they depend on zlib and they start seeing our new copy of zlib-ng.

This problem is not limited to mobile. It exists with native AOT on all platforms. I am not particularly worried about it. It affects only a very small fraction of our customers and there are ways to deal with it.

I can buy the argument that it is not worth the trouble to deal with zlib-ng build issues and conflicts with system zlib on mobile and browser. I would be ok with staying with system zlib on these platforms.

The way we've dealt with this in the past, has been to rename symbols exposed by Mono (prefixed them with something like _mono_). Note that making symbols private isn't necessarily enough, because we link Mono statically, and then if a third-party static library happens to use the same symbol names, we'll still get symbol clashes.

@rolfbjarne
Copy link
Member

if a security issue is found in zlib, it would take time to update the zlib-ng copy that we carry, whereas the system zlib would eventually get updated (probably much faster) by the platform owners (Android, Apple) via a system update.

I do not think that this is a strong argument. If system zlibs security benefits outweighed better performance and consistency from carrying our own copy of zlib-ng, we would want to use system zlibs everywhere that it is possible. We are not doing that. We have convinced ourselves that we prefer the performance and consistency of zlib-ng.

Note that this is a bit worse on mobile than on desktop, because Mono is shipped with the apps:

  1. Security issue is found in zlib-ng
  2. .NET runtime (and maybe workloads) publishes an update with the security fix
  3. Every one of our customers (app developers) have to update, rebuild their apps, and republish them.
  4. Every one of our customers' customers (app consumers) would have to update the app on their device(s).

Compare to:

  1. Security issue is found in zlib
  2. Apple/Google ships a system update, fixing it for everybody.

@stephentoub
Copy link
Member

How is that different from any other security issue found and fixed anywhere in the runtime, System.* libraries, etc?

@jkotas
Copy link
Member

jkotas commented Jul 3, 2024

One difference on mobile platforms is that size matters much more than on other platforms.
Using system zlib is basically free size-wise.

Yes, system zlib is free size-wise. On the other hand, zlib is fairly small (~100kB) so the binary footprint is not a big deal either.

Note that this is a bit worse on mobile than on desktop, because Mono is shipped with the apps

This concern is not mobile specific. It exists for self-contained applications everywhere - mobile, browser, desktop, server.

@carlossanlop
Copy link
Member Author

FYI all - Here's the perf report showing improvements in Deflate, ZLib and GZip in arm64:
dotnet/perf-autofiling-issues#38054

@prayaas-a
Copy link

FYI all - Here's the perf report showing improvements in Deflate, ZLib and GZip in arm64: dotnet/perf-autofiling-issues#38054

Thanks @carlossanlop. I would love to see the perf comparison on Windows. Is a similar report for Windows in the works?

@ericstj
Copy link
Member

ericstj commented Sep 17, 2024

FYI all - Here's the perf report showing improvements in Deflate, ZLib and GZip in arm64: dotnet/perf-autofiling-issues#38054

Thanks @carlossanlop. I would love to see the perf comparison on Windows. Is a similar report for Windows in the works?

Have a look at this query https://github.com/dotnet/perf-autofiling-issues/issues?q=is%3Aissue+is%3Aopen+Decompress+Compress+label%3Aperf-improvement+

@carlossanlop are we tracking any remaining work with this issue?

@carlossanlop
Copy link
Member Author

All items have been resolved, we can close this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants