Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Bazel more responsive and use less memory when --jobs is high #17120

Commits on Jan 9, 2023

  1. Make Bazel more responsive and use less memory when --jobs is high

    When using Bazel in combination with a larger remote execution cluster,
    it's not uncommon to call it with something like --jobs=512. We have
    observed that this is currently problematic for a couple of reasons:
    
    1. It causes Bazel to launch 512 local threads, each being responsible
       for running one action remotely. All of these local threads may spend
       a lot of time in buildRemoteAction(), generating input roots in the
       form of Merkle trees.
    
       As the local system tends to have fewer than 512 CPUs, all of these
       threads will unnecessarily compete with each other. One practical
       downside of that is that interrupting Bazel using ^C takes a very
       long time, as it first wants to complete the computation of all 512
       Merkle trees.
    
       Let's put a semaphore in place, limiting the number of concurrent
       Merkle tree computations to the number of CPU cores available.
    
    2. Related to the above, Bazel will end up keeping 512 Merkle trees in
       memory throughout all stages of execution. This makes sense, as we
       may get cache misses, requiring us to upload the input root
       afterwards. Or the execution of a remote action may fail, requiring
       us to upload the input root.
    
       That said, generally speaking these cases are fairly uncommon. Most
       builds have relatively high cache hit rates and execution retries
       only happen rarely. It is therefore not worth keeping these Merkle
       trees in memory constantly. We only need it when computing the action
       digest for GetActionResult(), and while uploading it into the CAS.
    
    3. AbstractSpawnStrategy.getInputMapping() has some smartness to memoize
       its results. This makes a lot of sense for local execution, where the
       input mapping is used in a couple of places. For remote
       caching/execution it is not evident that this is a good idea.
       Assuming you end up having a remote cache hit, you don't need it.
    
       Let's make the memoization optional, only using it in cases where we
       do local execution (which may also happen when you get a cache miss
       when doing remote caching).
    
    Similar changes against Bazel 5.x have allowed me to successfully do
    builds of a large monorepo using --jobs=512 using the default heap size
    limits, whereas I would normally see occasional OOM behaviour when
    providing --host_jvm_args=-Xmx64g.
    EdSchouten committed Jan 9, 2023
    Configuration menu
    Copy the full SHA
    a66590d View commit details
    Browse the repository at this point in the history