-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage regression in bazel 0.14.0 and 0.14.1 #5389
Comments
Please provide the contents of jvm.out. |
I ssh'd into CircleCI when this failure occurs and checked jvm.out and it an empty file after the crash. The failure looks like this:
It doesn't always occur in the same place (there are a few steps in CI that I've seen it fail at) and on occasion it passes. Is there anywhere else besides jvm.out that might give a hint on the crash? |
If the file is empty then that's a sign that the gRPC client/server connection crashed. We've seen a similar case before where we didn't implement flow control correctly for file uploads to remote machines. Do you have remote caching or execution enabled? If yes, please provide the flags you are using. |
If these are the right flags, then it doesn't look like it:
|
Hmm, that's odd - this seems to be a null build. Here's a more complete snippet from a passing run:
|
The failing build fails to run //src:prodserver, but it's still just two actions. |
I think the reference to #3645 is a red herring. |
If I understand correctly, then you're running the containers with a 4 GB limit. --local_resources does not actually restrict Bazel's own memory usage, but maybe that's the intent? IIRC, Bazel is set to 4 GB max memory, which is unaffected by --local_resources. Maybe Bazel is trying to allocate too much memory and the container is killing it? Technically, this doesn't imply that Bazel's memory use has actually increased - we might be using a slightly different gc configuration or allocate memory more rapidly, and that might push it over the limit. If that's correct, then you should try adding --host_jvm_args=-Xmx2G to the Bazel invocation, like this:
or like this in the bazelrc:
|
Thanks for investigating, Ulf! Your simple explanation is right. The build passes three times in a row with the 2G heap limit for the bazel JVM. https://circleci.com/gh/alexeagle/workflows/angular-bazel-example/tree/test-bazel-0.14.1 I also have a PR out to update Angular to use 0.14 again This should fix it for us. I'm not sure what else we could do on this issue, other than improve the "guard rails" here so it's harder to get the wrong memory limit, or easier to debug. Feel free to close if you don't want to take further action. |
On Angular and related projects, we run builds on CI in a docker container. We use the
--local_resources
flag to workaround#3645
which causes random failures where Bazel tries to allocate too much memory, since it asks the OS how much RAM is available rather than the containerization host.
After updating to 0.14.0 (and also observed in 0.14.1) we have the problem again:
angular/angular#24484
it affects Angular users who send PRs which are failing on CI.
An example failure:
https://circleci.com/gh/gregmagolan/angular-bazel-example/442?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link
There is not much indication what is going wrong, but it looks the same as errors we got before adding the
--local_resources
flag.We are rolling back Bazel in Angular and related projects to 0.13.
The text was updated successfully, but these errors were encountered: