Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdenv: provide a deterministically built gcc #112928

Merged
merged 1 commit into from
Apr 26, 2021

Conversation

baloo
Copy link
Member

@baloo baloo commented Feb 12, 2021

Motivation for this change

This is a proposal to fix one of the last issue on the road to reproducibility of nixos.
There is some background information here: #108475
There is also some discussion on: #445

gcc, when built, will run multiple stages. It will use performance data and profiling of one of those compilation stages to inject optimizations on a later stage. The purpose of this is to optimize performance.
This renders the build nondeterministic and impure since it inject local behavior of the builder. I believe this is contrary to the principles of nix, and @edolstra thesis.

Furthermore, I'm not sure how optimizations made for an hydra builder would affect performance on any other machine. To that end, I chose to make the gcc used by stdenv deterministic but keep an profiled built gcc in the default packages (build derivations deterministically albeit a bit slower, but you can run an optimized gcc in your nix-shell (if you're doing development)).
Note: This could probably use some benchmarks here.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@ofborg ofborg bot added the 6.topic: stdenv Standard environment label Feb 12, 2021
@Mic92
Copy link
Member

Mic92 commented Feb 12, 2021

How do other distros handle this i.e. debian?

@baloo
Copy link
Member Author

baloo commented Feb 12, 2021

From what I can read, they run the profiledbootstrap, but the automated reproducibility tests times out.

buildlogs: https://buildd.debian.org/status/fetch.php?pkg=gcc-10&arch=amd64&ver=10.2.1-6&stamp=1610315062&raw=0
reproducibility test: https://tests.reproducible-builds.org/debian/unstable/arm64/index_all_abc.html

@tomberek
Copy link
Contributor

tomberek commented Feb 12, 2021

Looks like Arch doesn't use profiledbootstrap it in favor of reproducibility: https://bugs.archlinux.org/task/56856

Nix provides an incredibly robust means to have the default be reproducible and allow opt-in for optimization. This means we have more options in this arena.

@baloo
Copy link
Member Author

baloo commented Feb 12, 2021

Running the following benchmark:
https://gist.github.com/baloo/45cebcaa2057d1cbb8a875338ce4ecf3

Trying to compare build performance before/after this change, I get the following results:

==================
building: bc
==== baseline ====
/nix/store/dczxyx8skghfmkjd5ilmq7gcx4bbwzfz-bc-1.07.1.drv
0.01user 0.01system 0:11.62elapsed 0%CPU (0avgtext+0avgdata 21188maxresident)k
0inputs+0outputs (0major+1334minor)pagefaults 0swaps
==== target ====
/nix/store/9a6w14gwmv7rdk3cyv9kg72wb0w9qxpw-bc-bench-target-1.07.1.drv
0.01user 0.00system 0:12.31elapsed 0%CPU (0avgtext+0avgdata 21020maxresident)k
0inputs+0outputs (0major+1334minor)pagefaults 0swaps

==================
building: zsh
==== baseline ====
/nix/store/lyxm5b7bk2ijlfijy74y2mpvn2bwvv2x-zsh-bench-baseline-5.8.drv
0.02user 0.02system 1:05.51elapsed 0%CPU (0avgtext+0avgdata 21144maxresident)k
0inputs+0outputs (0major+1331minor)pagefaults 0swaps
==== target ====
/nix/store/3dpvmgx0l2p3bjv468k9vaqi20w55awn-zsh-bench-target-5.8.drv
0.02user 0.02system 1:10.77elapsed 0%CPU (0avgtext+0avgdata 21148maxresident)k
0inputs+0outputs (0major+1333minor)pagefaults 0swaps

==================
building: linux_5_10
==== baseline ====
/nix/store/h3bv595jk0912ji7mifyi0wk9shhiw4d-linux_5_10-bench-baseline-5.10.15.drv
0.60user 0.51system 17:08.15elapsed 0%CPU (0avgtext+0avgdata 21096maxresident)k
0inputs+0outputs (0major+1331minor)pagefaults 0swaps
==== target ====
/nix/store/xqldn8pwps75wrjwb56bd2ydwjcb8qfy-linux_5_10-bench-target-5.10.15.drv
0.61user 0.50system 19:16.01elapsed 0%CPU (0avgtext+0avgdata 21248maxresident)k
0inputs+0outputs (0major+1334minor)pagefaults 0swaps

==================
building: ceph
==== baseline ====
/nix/store/610sml58pp6m3lqk4ymzckzixj8ks2sz-ceph-bench-baseline-15.2.8.drv
0.08user 0.05system 23:32.81elapsed 0%CPU (0avgtext+0avgdata 21232maxresident)k
0inputs+0outputs (0major+1340minor)pagefaults 0swaps
==== target ====
/nix/store/s7hc4h7dpcqhx9m8jpm52sv6xx4fikjp-ceph-bench-target-15.2.8.drv
0.08user 0.05system 25:44.84elapsed 0%CPU (0avgtext+0avgdata 21296maxresident)k
72inputs+0outputs (0major+1344minor)pagefaults 0swaps

So there is definitely a performance impact on this change. 7-12% slowdown

@ofborg ofborg bot added the 10.rebuild-linux-stdenv This PR causes stdenv to rebuild label Feb 12, 2021
@ofborg ofborg bot requested review from lovek323, andir, np and edolstra February 12, 2021 23:20
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild 10.rebuild-linux: 501+ 10.rebuild-linux: 5001+ labels Feb 12, 2021
@tomberek
Copy link
Contributor

Result of nixpkgs-review pr 112928 run on x86_64-linux 1

26 packages built:
  • gcc
  • gcc-arm-embedded (gcc-arm-embedded-10)
  • gcc-arm-embedded-6
  • gcc-arm-embedded-7
  • gcc-arm-embedded-8
  • gcc-arm-embedded-9
  • gccForLibs (gcc-unwrapped)
  • gcc10 (gcc_latest)
  • gcc10Stdenv
  • gcc48
  • gcc49
  • gcc49Stdenv
  • gcc6
  • gcc6Stdenv
  • gcc7
  • gcc7Stdenv
  • gcc8
  • gcc8Stdenv
  • gcc9
  • gcc9Stdenv
  • gccMultiStdenv
  • gccStdenv
  • gccStdenvNoLibs
  • gcc_debug
  • gcc_multi
  • gccgo (gccgo6)

@tomberek
Copy link
Contributor

tomberek commented Feb 13, 2021

Result of nixpkgs-review pr 112928 run on x86_64-linux 1

4 0 packages failed to build:
3 7 packages built:
  • gfortran48
  • gfortran49
  • gfortran6
  • gfortran (gfortran9)
  • gfortran10
  • gfortran7
  • gfortran8

Edit: Failures seem to just come from nixpkgs-review building multiple copies of things at once and running out of resources.

@baloo
Copy link
Member Author

baloo commented Feb 13, 2021

@tomberek Yes I hit the same issues as well, I'm not too sure what went wrong with the gfortran evaluation.

@tomberek
Copy link
Contributor

Running @baloo's benchmark I get around 2-5% slowdown.

An interesting question is how much slowdown is acceptable. If reproducible builds meant a 100% slowdown, we'd have a much harder time justifying it. Having robust reproducible builds allows a kind of parallelism in the package building ecosystem that would provide an order of magnitude speed up in how fast channels update, allow for greater coordination between mutually distrusted builders, easier regression testing, etc. So where would we draw the line? 10%? Has any of the reproducible-builds community set some rules of thumb for the effort?

@baloo baloo changed the base branch from staging to staging-next February 16, 2021 22:21
@baloo baloo mentioned this pull request Feb 17, 2021
10 tasks
@baloo baloo changed the base branch from staging-next to staging February 17, 2021 01:09
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/prs-already-reviewed/2617/361

@sternenseemann
Copy link
Member

sternenseemann commented Mar 29, 2021

If you look at the gzip example linked in https://github.com/bmwiedemann/theunreproduciblepackage/tree/master/pgo you find that there is a way to make Profile-Guided-Optimizations deterministic without shipping pre-recorded profiling data.
You 'only' need to ensure that everything that happens in the profiling run is 100% deterministic.
gcc's profiling run includes compiling of gcc, so it might be easier to replace that with a carefully crafted script, that covers most relevant code in a fully deterministic way.

We have something like that in nixpkgs actually: The foot derivation supports PGO, but the build is made deterministic by generating the profiling data in the same way each time (by using a fixed seed for the random inputs generated for the profiling information collection). From my testing this makes the build deterministic as PGO doesn't seem to take timing information into account. I'm not sure however if the PGO build can also be reproducible or if the output would (deterministically) differ between different machines.

@baloo
Copy link
Member Author

baloo commented Mar 29, 2021

@sternenseemann just to see if I got things right.
foot is a terminal emulator and the stimulusGenerator is generating seeded random data as input to the emulator, and injects the profiling of the terminal back to the PGO build. Is that correct?

The way gcc does PGO is by compiling gcc itself and gathering profiling information from that build. I don't even know why there is entropy here.
Is that the non-deterministic build ordering (build is run in parallel, so different gcc calls are run out of order according to scheduling)? Is there any other factor?
Filesystem ordering? (gosh I fear the disorderfs now!)

@sternenseemann
Copy link
Member

@baloo You got that right, yeah. stimulusGenerator generates a series of random inputs which are meant to be fed to the foot VTE parser for profiling purposes. Then an intermediate version of foot is compiled (with just the VTE parser so we don't need wayland) and that is fed with the inputs and generates profiling data.

That is then in turn used to compile the actual output binary.

I also need to correct myself, I checked again and the build is actually reproducible as well: I can reproduce the exact same foot as our hydra produced by using nix-build --check -A foot on two different x86_64-linux machines.

The upshot from this is: if we could track down the entropy in gcc's build we could have a reproducible gcc build which has PGO (unless I'm missing something). This is however probably quite the task, I wonder if upstream has some interest in this as well?

@baloo
Copy link
Member Author

baloo commented Mar 29, 2021

I'll try a build with a patch like:

diff --git a/Makefile.in b/Makefile.in
index 36e369df6e7..67b0a56070d 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -57699,8 +57699,8 @@ stagetrain-bubble:: stageprofile-bubble
        if test -f stagetrain-lean || test -f stageprofile-lean ; then \
          echo Skipping rebuild of stagetrain; \
        else \
-         $(MAKE) stagetrain-start; \
-         $(MAKE) $(RECURSE_FLAGS_TO_PASS) all-stagetrain; \
+         $(MAKE) -j1 stagetrain-start; \
+         $(MAKE) -j1 $(RECURSE_FLAGS_TO_PASS) all-stagetrain; \
        fi

 .PHONY: all-stagetrain clean-stagetrain

@baloo
Copy link
Member Author

baloo commented Mar 29, 2021

Compiling twice on the same machine gets me the same result, but I still get variation between builds on two different machines. So it's not the compilation ordering.

@@ -7,16 +7,16 @@
 5418c6e9b16872c94077c51f8fffc6f4  /nix/store/14qcfyb4fqyl4fdd67i61vpv13dvg01g-gcc-10.2.0-man/share/man/man7/fsf-funding.7.gz
 44773935bd03199cabedce33f481d62f  /nix/store/14qcfyb4fqyl4fdd67i61vpv13dvg01g-gcc-10.2.0-man/share/man/man7/gfdl.7.gz
 08a97967837e2b817a7a090ca0fd0e3d  /nix/store/14qcfyb4fqyl4fdd67i61vpv13dvg01g-gcc-10.2.0-man/share/man/man7/gpl.7.gz
-bd773bd1f77408ccfb196e8acd340006  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/bin/cpp
-d489619d5604a50b83be582753f55425  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/bin/g++
-837dac4a06244fc591412b4cae685e16  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/bin/gcc
+d0107a50dd52b173ee136f56e6ae22be  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/bin/cpp
+f58b5911985b1955c03c60b3f09dbe9a  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/bin/g++
+e3978a69e19add0244aaaa28c11c2aa0  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/bin/gcc
 1c8c29b23933b639f29d8a60d3e8fbdd  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/bin/gcc-ar
 177b5c8570e3b2be03c35825adb0f0e4  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/bin/gcc-nm
 9192478676c7d168e09ea52f1ed2d965  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/bin/gcc-ranlib
 83aad200c7002d5b25c6f8c085b4ea75  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/bin/gcov
 bd6e8fa18ad8301c36b6273fcce60b36  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/bin/gcov-dump
 628963456c5f62b5e4d9ea5d7efb5c53  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/bin/gcov-tool
-d2963e7b1ef32db3506c125cd49f1b78  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/bin/lto-dump
+1a08464d05552d92c3864ad9e2379329  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/bin/lto-dump
 1c8c29b23933b639f29d8a60d3e8fbdd  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/bin/x86_64-unknown-linux-gnu-gcc-ar
 177b5c8570e3b2be03c35825adb0f0e4  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/bin/x86_64-unknown-linux-gnu-gcc-nm
 9192478676c7d168e09ea52f1ed2d965  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/bin/x86_64-unknown-linux-gnu-gcc-ranlib
@@ -790,12 +790,12 @@
 4049626146d982e0b4bb4ca7eb3d029e  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/include/c++/10.2.0/x86_64-unknown-linux-gnu/bits/stdtr1c++.h
 9e811aa097fbb67ab7b709f12b412f52  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/include/c++/10.2.0/x86_64-unknown-linux-gnu/bits/time_members.h
 48ec1f1f6ca7ee862cd97e01b78debfe  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/include/c++/10.2.0/x86_64-unknown-linux-gnu/ext/opt_random.h
-89d48d0c90a02a02f886457fd7324c77  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/libexec/gcc/x86_64-unknown-linux-gnu/10.2.0/cc1
-bca4f6d7507f5cb732c8c6cf1bf4eab0  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/libexec/gcc/x86_64-unknown-linux-gnu/10.2.0/cc1plus
+857ff8f9b1971032178ff63dacb38f72  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/libexec/gcc/x86_64-unknown-linux-gnu/10.2.0/cc1
+7f7ebddcdc3f96fcafc43940cdccc23e  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/libexec/gcc/x86_64-unknown-linux-gnu/10.2.0/cc1plus
 34291d97d43308d6857d7f5b5284dda1  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/libexec/gcc/x86_64-unknown-linux-gnu/10.2.0/collect2
 addb968757f44f2461cb55194dcecb08  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/libexec/gcc/x86_64-unknown-linux-gnu/10.2.0/liblto_plugin.la
 f41f309dc0e87b71e8404280d4c1b917  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/libexec/gcc/x86_64-unknown-linux-gnu/10.2.0/liblto_plugin.so.0.0.0
-4a256e609c07774dcd5c0adde731dc19  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/libexec/gcc/x86_64-unknown-linux-gnu/10.2.0/lto1
+08461888e2b4b96df4ec8c89c23a23ea  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/libexec/gcc/x86_64-unknown-linux-gnu/10.2.0/lto1
 f64bbbf8281863028861298a19d4c65e  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/libexec/gcc/x86_64-unknown-linux-gnu/10.2.0/lto-wrapper
 d504c1030ebf4a192adf95209a53c5f4  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/libexec/gcc/x86_64-unknown-linux-gnu/10.2.0/plugin/gengtype
 b425a30eb59e4ab0e653d03208464a82  /nix/store/7yz83gl80araz40d3zc44hfbaz0ic86l-gcc-10.2.0/lib/gcc/x86_64-unknown-linux-gnu/10.2.0/crtbegin.o

@raboof
Copy link
Member

raboof commented Apr 7, 2021

Compiling twice on the same machine gets me the same result, but I still get variation between builds on two different machines. So it's not the compilation ordering

Making sure I'm understanding you correctly: you were testing with -j1 and without the changes proposed in this PR, right?

So your experiment was intended to see if adding -j1 would make gcc reproducible without the changes proposed in this PR, and it showed that -j1 is not sufficient, and apparently there's still entropy coming from somewhere?

(I would like to see this PR merged, continuing work on adding PGO back in a deterministic way in parallel)

@tomberek
Copy link
Contributor

tomberek commented Apr 7, 2021

[tom@tom:~/nixpkgs]$ nix-store --dump $(nix-build -A gcc10.cc) | md5sum
aa08b5e886bdd673feb9767e5f9bee99  -
[tom@tom:~/nixpkgs]$ nix build .#legacyPackages.x86_64-linux.gcc10.cc && nix hash path $(nix path-info ./result)
sha256-MgPJsdox3ErU7uSYxnbMbg2MDrSDbs/Li/tjcYc7mXk=

Not sure what is best way to do comparison.

@raboof
Copy link
Member

raboof commented Apr 7, 2021

[tom@tom:~/nixpkgs]$ nix-store --dump $(nix-build -A gcc10.cc) | md5sum
aa08b5e886bdd673feb9767e5f9bee99  -

I'm arriving at aa08b5e886bdd673feb9767e5f9bee99 as well when building from this pr (specifically a961aea) 🎉

@baloo
Copy link
Member Author

baloo commented Apr 7, 2021

So your experiment was intended to see if adding -j1 would make gcc reproducible without the changes proposed in this PR, and it showed that -j1 is not sufficient, and apparently there's still entropy coming from somewhere?

Yes, exactly

(I would like to see this PR merged, continuing work on adding PGO back in a deterministic way in parallel)

100% agree. I still think we can do better, it just needs a ton of work.

I need to keep digging about PGO. I still don't get how entropy is injected.
Ideally we need to "step" in between the different compilation stages and compare the different output of compilation.

As far as I can tell, there is noise/entropy generated in the "stagetrain" but I can't what it's coming from (yet).

@grahamc
Copy link
Member

grahamc commented Apr 8, 2021

Does the profiling actually make a reasonable difference today? Aren't the benchmarks it is running actually running, for most users, on a very powerful datacenter grade machine which is also (generally) heavily loaded? Are these benchmarks useful for end users, who are most likely running GCC in a very different environment?

@sternenseemann
Copy link
Member

sternenseemann commented Apr 8, 2021

Does the profiling actually make a reasonable difference today? Aren't the benchmarks it is running actually running, for most users, on a very powerful datacenter grade machine which is also (generally) heavily loaded? Are these benchmarks useful for end users, who are most likely running GCC in a very different environment?

PGO — even though the profiling in principle has limited merit for the actual machines that we are building gcc for, impacts performance positively in a significant way. @baloo tested this further up in the thread #112928 (comment).

As far as I understand it, this is because profiling mostly reveals to the compiler, what codepaths are taken more often than others in actual invocations and thus allows further optimization.

@raboof
Copy link
Member

raboof commented Apr 26, 2021

Looking through the last few months of discussion, it seems the rough consensus is that the performance hit looks acceptable (esp. with the availability of fastStdenv), so I think it's reasonable to merge this now.

Further research on how to "eat our cake and have it, too" will still be valuable, of course.

@raboof raboof merged commit 48c952c into NixOS:staging Apr 26, 2021
@vcunat
Copy link
Member

vcunat commented Apr 26, 2021

I don't think fastStdenv really helps. The majority of the extra cost will be paid by hydra.nixos.org, I believe. I'm not sure if paying 7–12% extra is worth the benefit, but I personally provide for just an insignificant bit of the build farm...

@vcunat
Copy link
Member

vcunat commented Apr 26, 2021

As for PGO generally, I believe the main benefit is in better estimates of which code-paths are hot and which are cold. That should be mostly independent of the machine and more sensitive of the "training inputs". The rule of thumb is to optimize hot paths for speed and cold paths for code size (which can also improve speed due to saving CPU instruction caches).

EDIT: I forgot to add that PGO and LTO have a good synergy, i.e. using both at once can improve more than sum of using either.

@raboof
Copy link
Member

raboof commented Apr 26, 2021

That should be mostly independent of the machine and more sensitive of the "training inputs".

That's my expectation as well - if we can find a way to remove the current nondeterminism (adding -j1 as tried in #112928 (comment) was not sufficient...) from the training we should definitely re-enable it.

@edolstra
Copy link
Member

I'm not sure if paying 7–12% extra is worth the benefit, but I personally provide for just an insignificant bit of the build farm...

It's not just a cost for our build farm, it's a cost for every user of gcc. Slowing down every C/C++ build by 7-12% is a substantial price to pay...

@edolstra
Copy link
Member

and we are oh so close to a fully reproducible ISO :)

For a reproducible ISO, there is a simpler solution: don't include GCC in the ISO.

However from the perspective of content-addressable Nix, a deterministic GCC is certainly a lot better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.