Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tbb: Split into tbb_2020_3 and tbb_2021_8 #217585

Merged
merged 1 commit into from
Mar 3, 2023
Merged

Conversation

davidak
Copy link
Member

@davidak davidak commented Feb 21, 2023

Description of changes

This is based on PR #214762.

For the new release 2021.8, see
https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-oneapi-threading-building-blocks-release-notes.html
https://github.com/oneapi-src/oneTBB/releases/tag/v2021.5.0
https://github.com/oneapi-src/oneTBB/releases/tag/v2021.6.0
https://github.com/oneapi-src/oneTBB/releases/tag/v2021.7.0

Due to the significant breakage due to the update to TBB 2021.8, instead split the tbb package into tbb_2020_3 and tbb_2021_8, with the default tbb aliased to tbb_2020_3 in order to minimize breakage.

Also fixed build issues and improved code.

The reason I went for this change is that I'd like to package CCTag which depends on TBB 2021. To me this solution seems like a good way to bridge the time until the packages incompatible with TBB 2021.8, but I'm not sure if there are any guidelines regarding splitting packages like this. So please comment if this discouraged for some reason.

I hope I got the split right. In particular, I'm wondering if there's a better solution for sharing meta. Also, I'm open to suggestions regarding the versioned package naming scheme.

Continuing #215689 and #216991

cc co-author @hesiod

Things done
  • Built on platform(s)
    • x86_64-linux
    • i686-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.05 Release Notes (or backporting 22.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@davidak
Copy link
Member Author

davidak commented Feb 22, 2023

Result of nixpkgs-review pr 217585 run on x86_64-linux 1

2 packages built:
  • tbb_2020_3
  • tbb_2021_8

Content looks good:

[nix-shell:~/.cache/nixpkgs-review/pr-217585-2]$ tree results/*
results/tbb_2020_3
└── lib
    ├── libtbbmalloc_proxy.so
    ├── libtbbmalloc_proxy.so.2
    ├── libtbbmalloc.so
    ├── libtbbmalloc.so.2
    ├── libtbb.so
    └── libtbb.so.2
results/tbb_2021_8
├── lib
│   ├── libtbbmalloc_proxy.so -> libtbbmalloc_proxy.so.2
│   ├── libtbbmalloc_proxy.so.2 -> libtbbmalloc_proxy.so.2.8
│   ├── libtbbmalloc_proxy.so.2.8
│   ├── libtbbmalloc.so -> libtbbmalloc.so.2
│   ├── libtbbmalloc.so.2 -> libtbbmalloc.so.2.8
│   ├── libtbbmalloc.so.2.8
│   ├── libtbb.so -> libtbb.so.12
│   ├── libtbb.so.12 -> libtbb.so.12.8
│   └── libtbb.so.12.8
└── share
    └── doc
        └── TBB
            └── README.md

4 directories, 10 files

@davidak davidak marked this pull request as ready for review February 22, 2023 00:12
@davidak
Copy link
Member Author

davidak commented Feb 22, 2023

Please review and test.

I will build for x86_64-linux, i686-linux and aarch64-linux.

pkgs/top-level/all-packages.nix Outdated Show resolved Hide resolved
pkgs/development/libraries/tbb/common.nix Outdated Show resolved Hide resolved
pkgs/development/libraries/tbb/2021_8.nix Outdated Show resolved Hide resolved
pkgs/development/libraries/tbb/2021_8.nix Outdated Show resolved Hide resolved
pkgs/development/libraries/tbb/2021_8.nix Outdated Show resolved Hide resolved
pkgs/development/libraries/tbb/2021_8.nix Outdated Show resolved Hide resolved
pkgs/development/libraries/tbb/2021_8.nix Outdated Show resolved Hide resolved
pkgs/development/libraries/tbb/default.nix Outdated Show resolved Hide resolved
@davidak
Copy link
Member Author

davidak commented Feb 24, 2023

Result of nixpkgs-review pr 217585 run on x86_64-linux 1

1 package failed to build:
22 packages built:
  • blender
  • bowtie2
  • cloudcompare
  • easyeffects
  • embree
  • embree2
  • ikos
  • mirtk
  • opencascade
  • openimagedenoise
  • openturns
  • openvdb
  • openvino
  • osrm-backend
  • pagmo2
  • paraview
  • potreeconverter
  • prusa-slicer
  • sysdig
  • tiledb
  • tracy
  • vcmi

Built packages where tbb is a direct dependency.

I have run one binary if the package contain any. No crash except easyeffects, but that is expected (#217972).

@davidak
Copy link
Member Author

davidak commented Feb 24, 2023

Somehow the build on i686-linux is blocked by #217976 even tho the package is not requested to be built directly.

@davidak
Copy link
Member Author

davidak commented Mar 1, 2023

Build of both packages work now on i686-linux.

Thanks to @trofi for the great feedback!

Copy link
Contributor

@trofi trofi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good! I added a few more minor comments.

pkgs/development/libraries/tbb/2020_3.nix Show resolved Hide resolved
# Fix/suppress warnings on gcc12.1 from https://github.com/oneapi-src/oneTBB/pull/866
(fetchpatch {
url = "https://patch-diff.githubusercontent.com/raw/oneapi-src/oneTBB/pull/866.patch";
hash = "sha256-zT+tRPnPVGUqKvob0RWjASRm4bPYYLtaaqTzA4EdU/8=";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be that you used hash from the wrong version of the patch? If failed to fetch for me. I had to change the hash locally as:

--- a/pkgs/development/libraries/tbb/default.nix
+++ b/pkgs/development/libraries/tbb/default.nix
@@ -32,7 +32,7 @@ stdenv.mkDerivation rec {
     # Fix/suppress warnings on gcc12.1 from https://github.com/oneapi-src/oneTBB/pull/866
     (fetchpatch {
       url = "https://patch-diff.githubusercontent.com/raw/oneapi-src/oneTBB/pull/866.patch";
-      hash = "sha256-zT+tRPnPVGUqKvob0RWjASRm4bPYYLtaaqTzA4EdU/8=";
+      hash = "sha256-e44Yv84Hfl5xoxWWTnLJLSGeNA1RBbah4/L43gPLS+c=";
     })
   ];

If you still have previous patch locally try to compare what changed if it's an expected difference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also get that hash now.

I guess the only change i made is to use hash = instead of sha256 =, but that shouldn't make a difference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, hash = vs sha256 = does not make a difference. It's something else: maybe unstable patch generation algorigthm on github's side that fetchpatch can't clean up? or maybe the hash was calculated against fetchurl before it was converted to fetchpatch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fetchurl was also used before. Maybe that made the difference. From looking at the GitHub PR, there should not be a change.

pkgs/development/libraries/tbb/default.nix Outdated Show resolved Hide resolved
Comment on lines 16106 to 16109
tbb = callPackage ../development/libraries/tbb { };
tbb_2020_3 = callPackage ../development/libraries/tbb/2020_3.nix { };
tbb_2021_8 = callPackage ../development/libraries/tbb { };
# many packages fail with latest version, so they need to be updated one-by-one
tbb = tbb_2020_3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a reasonable change. I have only one complain on the one-by-one bit: updating users one-by-one is not really an option for library users due to diamond dependency problems: https://trofi.github.io/posts/248-how-do-shared-library-collisions-break.html. We will likely need to update most of users in a single go later. Otherwise we will have to debug non-trivial linkage failures or even plain SIGSEGVs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About 100 packages build and 100 fail (#214762 (comment)). So i thought we could change the working 100 to the new version and leave the other on the old.

You say that will cause unexpected issues? I did not have time to read the post.

So far i understand that when we have 2 libraries, that use the two versions of tbb and one package that use these two libraries, we will have issues. So i would try to update as much as possible and test the program. I guess in case of an issue, they will not even start?

How does this contradict the feature of Nix to be able to have multiple versions of libraries on the same system? Why is it different here than with other dependencies? I guess it would also not be possible to have multiple glibc? I hope such limitations of Nix are documented somewhere.

The whole point of this change (for me) is to be able to use the new version for the CCTag library, that is needed for Meshroom.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What change do you suggest here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far i understand that when we have 2 libraries, that use the two versions of tbb and one package that use these two libraries, we will have issues. So i would try to update as much as possible and test the program. I guess in case of an issue, they will not even start?

Yeah, if a single address space loads multiple library versions there will likely be troubles. It's hard to say how those will manifest. At least you will get an early crash. At worst it will not crash at all and will corrupt your data. The typical problem is when data structure is constructed in one library version and processed in another. I usually use lddtree -a on a binary to see if there are multiple versions of a library loaded together.

How does this contradict the feature of Nix to be able to have multiple versions of libraries on the same system? Why is it different here than with other dependencies? I guess it would also not be possible to have multiple glibc? I hope such limitations of Nix are documented somewhere.

You can have multiple versions of glibc as long as you guarantee that a single address space (a single process) does not load two versions of glibc at the same time. nixpkgs tries very hard to do it by defining a single glibc package used everywhere (except the bootstrap). That way as long as you don't mix glibc from different nixpkgs instances in a single derivation it should be mostly OK (mostly because there are some exceptions like /run/opengl-driver that expose a particular glibc version via shared plugin, https://github.com/nixpkgs-architecture/issues/issues/15).

Mixing and matching final binaries from different nixpkgs version normally is safer as each of them usually uses their consistent (either disjoint or identical) closure of shared libraries and configs. There are still creative ways of breaking them like incompatible configs in /etc/ or through other non-hermetic paths like plugins or incompatible database formats in `/var/.

The libraries are just a lot simpler to break if a single nixpkgs provides multiple versions of a library and uses different versions of a library in different subtrees of a package tree. qt (for example) tries to solve the mixing problem by using a package set that pin the whole set of qt libraries into a consistent set (libsForQt5.callPackage instead of callPackage). It's still a mess if your package want to use both qt5 and qt6 at the same time (directly or indirectly).

I suggest avoiding one-by-one phrasing: # many packages still fail with latest version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a single address space loads multiple library versions there will likely be troubles

can we solve that by showing an error when evaluating a package that has dependencies that use the same library in different versions? it would be hard to know which package is a library and which is the same library, but in a different version. nix does not even have a concept of a package

we could have metadata for libraries like using outputs = [ "lib" "dev" ]; and define lib as default (out) and a policy to use the same package name and only use different version, so nix can compare them

since nix already knows all dependencies, i think it could be solvable at evaluation time

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think heuristics are possible but might be non-trivial to implement:

  1. nix / nixpkgs does not exactly know it's runtime closures unti the package is built. At eval time we only see built depends in buildInputs / nativeBuildInputs and even that is both a too wide list and an incomplete list. Things like { foo }: stdenv.mkDerivation { .., postInstall = "echo LD_PRELOAD=${foo}/lib/libfoo.so.1 >> something"; } don't have the foo in *Inputs as an expression, but does have it as a derivation input.

  2. There are rare cases when we mix libraries, like cross-compilation, bootstrap and copied libraries (libgcc_s.so is provided by gcc and glibc).

@trofi
Copy link
Contributor

trofi commented Mar 1, 2023

Build of both packages work now on i686-linux.

A note on i686-linux: hydra does not run i686-linux builds. Thus many unrelated failures are expected.

What hydra usually runs instead is a small subset of it as wine dependency via pkgsi686Linux.* namespace. Thus package name that targets i686 is normally pkgsi686Linux.tbb_2021_8 and pkgsi686Linux.tbb. They are identical to i686-linux, but they don't imply that you run it on a i686-linux system (I'm pretty sure your host is running a 64-bit x86_64 kernel and not a 32-bit i686 one).

@davidak
Copy link
Member Author

davidak commented Mar 1, 2023

Build on aarch64-linux fails because the dependency binutils-wrapper-2.40-aarch64-linux does not build.

@davidak
Copy link
Member Author

davidak commented Mar 1, 2023

They are identical to i686-linux, but they don't imply that you run it on a i686-linux system

@trofi what are you suggesting here?

Should i build pkgsi686Linux.tbb instead of tbb --system i686-linux when testing if a package builds on i686-linux? What advantage does that have? (i'm on x86_64-linux)

@trofi
Copy link
Contributor

trofi commented Mar 1, 2023

They are identical to i686-linux, but they don't imply that you run it on a i686-linux system

@trofi what are you suggesting here?

Should i build pkgsi686Linux.tbb instead of tbb --system i686-linux when testing if a package builds on i686-linux? What advantage does that have? (i'm on x86_64-linux)

Yeah, I suggest only testing pkgsi686Linux.tbb and pkgsi686Linux.tbb_2021_8 unless you really want to get i686-linux into a better shape.

@ofborg ofborg bot added the 8.has: package (new) This PR adds a new package label Mar 1, 2023
@ofborg ofborg bot requested review from hesiod and thoughtpolice March 1, 2023 08:28
@davidak
Copy link
Member Author

davidak commented Mar 1, 2023

Result of nixpkgs-review pr 217585 run on aarch64-linux 1

2 packages built:
  • tbb_2021_8
  • tbb_2021_8.dev (tbb_2021_8.dev.dev)

@trofi
Copy link
Contributor

trofi commented Mar 1, 2023

@ofborg build tbb_2020_3 tbb_2021_8 tbb pkgsi686Linux.tbb_2020_3 pkgsi686Linux.tbb_2021_8 pkgsi686Linux.tbb

@trofi
Copy link
Contributor

trofi commented Mar 1, 2023

Result of nixpkgs-review pr 217585 run on x86_64-linux 1

2 packages built:
  • tbb_2020_3
  • tbb_2021_8

@trofi
Copy link
Contributor

trofi commented Mar 1, 2023

aarch64-darwin fails as:

error: unknown warning option '-Werror=stringop-overflow'; did you mean '-Werror=shift-overflow'? [-Werror,-Wunknown-warning-option]

Can you have a look? Probably needs to drop our -Werror=stringop-overflow hack. Or make it conditional on gcc.

This is based on PR NixOS#214762.

For the new release 2021.8, see
https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-oneapi-threading-building-blocks-release-notes.html
https://github.com/oneapi-src/oneTBB/releases/tag/v2021.5.0
https://github.com/oneapi-src/oneTBB/releases/tag/v2021.6.0
https://github.com/oneapi-src/oneTBB/releases/tag/v2021.7.0

Due to the significant breakage due to the update to TBB 2021.8, instead
split the tbb package into tbb_2020_3 and tbb_2021_8, with the default
tbb aliased to tbb_2020_3 in order to minimize breakage.

Also fixed build issues and improved code.

Co-Authored-By: Martin Weinelt <hexa@darmstadt.ccc.de>
Co-Authored-By: davidak <git@davidak.de>
@davidak
Copy link
Member Author

davidak commented Mar 2, 2023

Can you have a look? Probably needs to drop our -Werror=stringop-overflow hack. Or make it conditional on gcc.

Yes, before i only edited it in the gcc cmake file.

@davidak
Copy link
Member Author

davidak commented Mar 2, 2023

Result of nixpkgs-review pr 217585 run on aarch64-linux 1

2 packages built:
  • tbb_2021_8
  • tbb_2021_8.dev (tbb_2021_8.dev.dev)

@trofi
Copy link
Contributor

trofi commented Mar 2, 2023

https://github.com/ofborg build tbb_2020_3 tbb_2021_8 tbb pkgsi686Linux.tbb_2020_3 pkgsi686Linux.tbb_2021_8 pkgsi686Linux.tbb

@davidak
Copy link
Member Author

davidak commented Mar 2, 2023

ofborg build failed

error: building of '/nix/store/srvihjp36ww37lcl83ylxns1s73cxsay-llvm-11.1.0.drv!lib,out' from .drv file timed out after 3600 seconds

error: building of '/nix/store/jg5aciin9lvh2m0264y4dr5hksld14nx-llvm-11.1.0.drv!lib,out' from .drv file timed out after 3600 seconds

@ofborg build tbb

@trofi
Copy link
Contributor

trofi commented Mar 2, 2023

nix build --no-link github:NixOS/nixpkgs/pull/217585/merge#pkgsLLVM.tbb{,_2020_3,_2021_8} succeded for me. I think it's a good proxy as a clang build test.

@trofi trofi merged commit a7a1966 into NixOS:staging Mar 3, 2023
@trofi
Copy link
Contributor

trofi commented Mar 3, 2023

Let's declare it good enough for staging and sort the possible darwin failures if those come up in the follow-up PRs.

Pulled as is. Thank you!

@davidak davidak deleted the tbb-split branch March 3, 2023 10:01
@davidak
Copy link
Member Author

davidak commented Mar 3, 2023

I think a goal for tbb is to update the default to the latest version, like @mweinelt tried in #214762. But half the packages currently fail.

@trofi can you recommend a stretegy for that? Should we just wait for now for more packages to become compatible? Maybe monitor how other distros handle it.

@trofi
Copy link
Contributor

trofi commented Mar 3, 2023

That's a very good question!

It's a challenge to update commonly used libraries even with minor breakages.

Major breakages are especially hard and would require some porting effort. In any case we would need at least:

  • a PR with the package update attempt
  • a few build failure examples (logs) collected there
  • a few build fixes queued for the PR

Perfect fixes would work for both old and new tbb versions and would go upstream first. Picking fixes from other downstreams should be reasonable as well.

To offload some work from doing all the fixes myself I usually do:

  • post a topic on https://discourse.nixos.org/ with details on how breakages usually look like and how to fix it (sometimes upstreams already provide similar guides)
  • file bug reports to upstreams with some hints how to do porting to the new version
  • (once fixed most known breakages) request a hydra jobset against your OR to collect all the nixpkgs failures

Sometimes it works, sometimes it does not :)

A few past examples:

This was referenced Mar 9, 2023
@NickCao NickCao mentioned this pull request Apr 24, 2023
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants