Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote cache fetching jobs aren't counting towards --jobs #18439

Closed
Ryang20718 opened this issue May 17, 2023 · 3 comments
Closed

Remote cache fetching jobs aren't counting towards --jobs #18439

Ryang20718 opened this issue May 17, 2023 · 3 comments
Assignees
Labels
more data needed team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug untriaged

Comments

@Ryang20718
Copy link

Description of the bug:

If I'm using local execution with a remote cache and almost all tests are cached. with --jobs set to 4, I see. (currently on bazel 6.0.0)

21s remote-cache, linux-sandbox ... (8 actions running)
29s remote-cache, linux-sandbox ... (8 actions running)

With jobs set to 8, I see a max of 16 actions. This would mean jobs fetching from remote cache is not counting towards parallelism. Is there a way we can configure it to do so? The reason I'd want the jobs fetching from remote cache to also count towards jobs is due to memory constraints (we're running out of memory when running 16 actions in parallel

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Startup a remote cache,

# quay.io example:
$ docker pull quay.io/bazel-remote/bazel-remote
$ docker run -u 1000:1000 -v /path/to/cache/dir:/data \
	-p 9090:8080 -p 9092:9092 quay.io/bazel-remote/bazel-remote

Run bazel tests with --jobs=8 --remote_cache=grpc://127.0.0.1:9092

to fill the cache

Now run bazel tests with --jobs=8 --remote_cache=grpc://127.0.0.1:9092 and you'll see 16 jobs running

Which operating system are you running Bazel on?

linux debian x8664

What is the output of bazel info release?

release 6.0.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

Related thread #6394 (comment)

Any other information, logs, or outputs that you want to share?

No response

@sgowroji sgowroji added the team-Remote-Exec Issues and PRs for the Execution (Remote) team label May 18, 2023
@tjgq tjgq self-assigned this May 30, 2023
@tjgq tjgq removed the untriaged label May 30, 2023
@tjgq
Copy link
Contributor

tjgq commented Oct 17, 2023

@Ryang20718 I think there might be something very specific about your setup, because I can't repro this. Here's my attempt, based on your description:

test.sh

#!/bin/bash
echo "$@"

BUILD

COUNT = 1000

[sh_test(
  name = "test%s" % i,
  srcs = ["test.sh"],
  args = [str(i)],
) for i in range(COUNT)]

test_suite(
  name = "tests",
  tests = ["test%s" % i for i in range(COUNT)],
)

.bazelrc

build --remote_instance_name=projects/bazel-untrusted/instances/default_instance
build --remote_cache=grpcs://remotebuildexecution.googleapis.com
build --remote_default_exec_properties=container-image=docker://gcr.io/bazel-public/ubuntu1804-bazel-java11@sha256:2d50853a7edbe59a99bc4141d7a03cb1068157b9766077302b46c4ec94eef151
build --remote_timeout=600
build --google_default_credentials
build --jobs=4

I ran it once to populate the cache, then did a clean build against the cache to download the results for all 1k parallel tests. I never see the X actions, Y actions running counters exceed 4. I've tried both Linux and MacOS, both 6.0.0 and head Bazel. (The machines where I'm running Bazel have a lot more than 4 cores.)

Are you setting any other flags?

@tjgq
Copy link
Contributor

tjgq commented Oct 17, 2023

Are you by any chance running Bazel with a custom JDK? (Recent JDK releases have broken --jobs, which we fixed at head by adding the --experimental_use_semaphore_for_jobs flag; the flag is not available on 6.x, because the JDK bundled with 6.x binaries should not exhibit that issue.)

@joeleba
Copy link
Member

joeleba commented Oct 31, 2023

Closing now. Please reopen if you have a repro. Thanks.

@joeleba joeleba closed this as not planned Won't fix, can't repro, duplicate, stale Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more data needed team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug untriaged
Projects
None yet
Development

No branches or pull requests

5 participants