Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define thread QoS priorities on macOS #7446

Closed
jmmv opened this issue Feb 15, 2019 · 11 comments
Closed

Define thread QoS priorities on macOS #7446

jmmv opened this issue Feb 15, 2019 · 11 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) type: feature request z-team-Apple Deprecated. Send to rules_apple, or label team-Rules-CPP + platform:apple

Comments

@jmmv
Copy link
Contributor

jmmv commented Feb 15, 2019

macOS has a concept of thread QoS service classes. Bazel at the moment uses the default class when run from the command line. The documentation strongly recommends defining QoS classes for proper operation instead of relying on the default one. See https://developer.apple.com/library/archive/documentation/Performance/Conceptual/power_efficiency_guidelines_osx/PrioritizeWorkAtTheTaskLevel.html

We have found that the use of the default class causes two problems: first, Bazel can make the system unresponsive; and, second, Bazel can become very slow if it relies on system services that run at a lower service class (e.g. think of FUSE file systems).

I have been experimenting with this by explicitly declaring Bazel to run at the Utility level and this makes both problems go away. I think we should do this change, but then account for the many threads we run. In particular, it'd be nice if we defaulted to Utility but made the UI thread and the Bazel client higher priority to ensure snappier console output. (This latter detail is nice, but not a requirement in my opinion.) Of course, we must measure if this causes a performance regression.

I also tried setting Bazel to Background level. This made the system even more snappy but also prevented Bazel from fully utilizing all CPUs.

One last thing to note is that changing the QoS class of a program after it has started, even if it has just one thread, does not modify the class for the main thread. This means that any further spawned threads or subprocesses do not respect the class change and instead use whatever was set at the main thread level. The only way I've found so far of changing the class of the main level is by using posix_spawnattr_set_qos_class_np, which means we have to exec a process using posix_spawn. This makes the change slightly more complicated, but given that the Bazel client is already spawning the server, we can probably fit this in that code path.

@jmmv jmmv added type: feature request P2 We'll consider working on this in future. (Assignee optional) z-team-Apple Deprecated. Send to rules_apple, or label team-Rules-CPP + platform:apple labels Feb 15, 2019
@jmmv jmmv self-assigned this Feb 15, 2019
bazel-io pushed a commit that referenced this issue Feb 21, 2019
This introduces a new CharPP class that implements the conversion of
C++ data structures to their corresponding char** representation
needed for execv-like calls.

As a side-effect of this change, we fix two issues: first, a memory
leak; and second, a theoretical const-correctness violation.

I am doing this change to provide room for implementing a similar
conversion function for envp arrays, which I will need to switch to
using posix_spawn.

Prerequisite cleanup to address
#7446.  While this does
fix an issue with a memory leak, this change should have no visible
effects.

RELNOTES: None.
PiperOrigin-RevId: 234968622
bazel-io pushed a commit that referenced this issue Feb 21, 2019
…les.

The map returned by PrepareEnvironmentForJvm() previously represented a
"delta" to apply to the current environment.  This was used to modify the
environment surrounding fork/execv calls because we could not pass an envp
to execv.

However, when using the resulting map to create the envp value of a call
like posix_spawn, we must be exhaustive and list all variables we expect
the subprocess to have.  And as we are going to be using posix_spawn, we
need to make this happen, so change PrepareEnvironmentForJvm() to make
the returned map exhaustive.

Prerequisite change to address
#7446 because of the need to
switch to posix_spawn.

RELNOTES: None.
PiperOrigin-RevId: 234971246
bazel-io pushed a commit that referenced this issue Feb 21, 2019
Introduce a new "daemonize" helper tool that takes the heavy-lifting
of starting the Bazel server process as a daemon.  This tool is pure
C and does not have any big dependencies like gRPC so we can ensure
it's thread-free.  As a result, the code in this tool doesn't have
to take a lot of care when running fork(), which results in easier
to read code overall.

Then, change the Bazel client to simply use posix_spawn to invoke
this helper tool, which in turn runs the Bazel server as a daemon.
The code in the Bazel client becomes simpler as well as we don't
have to worry about the implications of using fork() from a threaded
process.

The switch to posix_spawn is necessary to address
#7446 because the only way
to customize the QoS of a process in macOS is by using this API.
Such a change will come separately.

This change is intended to be a no-op.  However, this fixes a bug
introduced in unknown commit by which we violated the requirement of
not doing any memory management from a child process started with
fork().  (I don't think this bug ever manifested itself, but it was
there -- which is funny because this piece of code went to great
extents to not do the wrong thing... and a one-line change let
hell break loose.)

RELNOTES: None.
PiperOrigin-RevId: 234991943
@jmmv
Copy link
Contributor Author

jmmv commented Feb 22, 2019

I think we can consider this done for now. There are a couple of things that could be looked into to make this better, but I don't think they are critical:

  • The changes so far only modify the QoS of the server when started in the background, but have no effect on batch mode, the client, or whatever is run via bazel run.

  • Set QoS levels on a thread basis within the Bazel server. This would likely require some JNI glue and an audit of what each threadpool does and what it deserves. Or maybe just make sure that whatever thread is handling the UI is of a higher priority and leave everything else happen at the utility level.

@jmmv
Copy link
Contributor Author

jmmv commented Mar 13, 2019

I had to roll back the default QoS change. I'm now adding support for this feature behind a flag and will evaluate behavior on different classes (Utility vs. User-initiated).

bazel-io pushed a commit that referenced this issue Mar 14, 2019
This adds a new flag to set the QoS class of the Bazel server and accepts
selecting any of the possible classes exposed by the system.  We will use
this flag to evaluate the behavior of Bazel under different classes because
it is not clear which one we should be using.  (We tried forcing the class
to be utility, but that caused regressions in some cases.)

This is essentially a retry of 0877340 but gated by a flag (which
is how it should have been done in the first place).

Note that the flag exists on all platforms to ensure that a bazelrc file
shared across different people works in all cases, but this flag is ignored
in non-macOS systems.

Addresses #7446.

RELNOTES: None.
PiperOrigin-RevId: 238440116
@mpassell
Copy link
Contributor

mpassell commented Apr 3, 2019

Too bad about the rollback. Do you think that this is likely to make it into 0.25.0?

@jmmv
Copy link
Contributor Author

jmmv commented Apr 3, 2019

The default change was rolled back, but I added a --macos_qos_class startup flag that you can pass to re-enable this feature and customize Bazel to whichever QoS level you want.

At this point I don't think we should change the default for the whole process given that this resulted in slower builds for some people. But you can change it based on your observations, or on whether you prefer faster builds vs. a more responsive system.

@mpassell
Copy link
Contributor

mpassell commented Apr 3, 2019

Thanks. I see that's in the master version of the command-line reference, so I'll try it out with 0.25 or just build master if I get impatient. :)

@jmmv
Copy link
Contributor Author

jmmv commented Apr 3, 2019

Something I'd like us to do is have a "performance tuning" page where we document the various knobs in Bazel to adjust performance characteristics -- and documenting the impact of this new flag on system responsiveness would nicely fit there.

FTR most of the trouble we observed with QoS settings was because the services Bazel was using internally (think FUSE daemons) were running at a lower class than Bazel. We fixed this by raising the priority of those services instead.

That said, as I don't foresee us doing anything else in this area right now, I'm closing this bug.

@jmmv jmmv closed this as completed Apr 3, 2019
@mpassell
Copy link
Contributor

mpassell commented Apr 3, 2019

This page isn't exactly what you're referring to, but it still might be good to add a reference here - https://docs.bazel.build/versions/master/skylark/performance.html

Edit: maybe not, actually...

@jmmv
Copy link
Contributor Author

jmmv commented Apr 3, 2019

That page is seemingly focused on rule authors (based on the path and the contents). We need something that's user-facing. @meisterT

@cgruber
Copy link
Contributor

cgruber commented May 22, 2019

I take it that change didn't make it into 0.25.2? I can't seem to find that flag:

ERROR: Unrecognized option: --macos_qos_class=utility

bazel version returns:

Build label: 0.25.2
Build target: bazel-out/darwin-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri May 10 20:50:40 2019 (1557521440)
Build timestamp: 1557521440
Build timestamp as int: 1557521440

@keith
Copy link
Member

keith commented May 22, 2019

It appears to be in the tag https://github.com/bazelbuild/bazel/blob/0.25.2/src/main/java/com/google/devtools/build/lib/runtime/BlazeServerStartupOptions.java#L487-L499 note that it is a startup option, not a build option

@cgruber
Copy link
Contributor

cgruber commented May 22, 2019

Gah. That's likely my mistake. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) type: feature request z-team-Apple Deprecated. Send to rules_apple, or label team-Rules-CPP + platform:apple
Projects
None yet
Development

No branches or pull requests

4 participants