When using Bazel in combination with a larger remote execution cluster,
it's not uncommon to call it with something like --jobs=512. We have
observed that this is currently problematic for a couple of reasons:
1. It causes Bazel to launch 512 local threads, each being responsible
for running one action remotely. All of these local threads may spend
a lot of time in buildRemoteAction(), generating input roots in the
form of Merkle trees.
As the local system tends to have fewer than 512 CPUs, all of these
threads will unnecessarily compete with each other. One practical
downside of that is that interrupting Bazel using ^C takes a very
long time, as it first wants to complete the computation of all 512
Merkle trees.
Let's put a semaphore in place, limiting the number of concurrent
Merkle tree computations to the number of CPU cores available.
2. Related to the above, Bazel will end up keeping 512 Merkle trees in
memory throughout all stages of execution. This makes sense, as we
may get cache misses, requiring us to upload the input root
afterwards. Or the execution of a remote action may fail, requiring
us to upload the input root.
That said, generally speaking these cases are fairly uncommon. Most
builds have relatively high cache hit rates and execution retries
only happen rarely. It is therefore not worth keeping these Merkle
trees in memory constantly. We only need it when computing the action
digest for GetActionResult(), and while uploading it into the CAS.
3. AbstractSpawnStrategy.getInputMapping() has some smartness to memoize
its results. This makes a lot of sense for local execution, where the
input mapping is used in a couple of places. For remote
caching/execution it is not evident that this is a good idea.
Assuming you end up having a remote cache hit, you don't need it.
Let's make the memoization optional, only using it in cases where we
do local execution (which may also happen when you get a cache miss
when doing remote caching).
Similar changes against Bazel 5.x have allowed me to successfully do
builds of a large monorepo using --jobs=512 using the default heap size
limits, whereas I would normally see occasional OOM behaviour when
providing --host_jvm_args=-Xmx64g.