-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define thread QoS priorities on macOS #7446
Comments
This introduces a new CharPP class that implements the conversion of C++ data structures to their corresponding char** representation needed for execv-like calls. As a side-effect of this change, we fix two issues: first, a memory leak; and second, a theoretical const-correctness violation. I am doing this change to provide room for implementing a similar conversion function for envp arrays, which I will need to switch to using posix_spawn. Prerequisite cleanup to address #7446. While this does fix an issue with a memory leak, this change should have no visible effects. RELNOTES: None. PiperOrigin-RevId: 234968622
…les. The map returned by PrepareEnvironmentForJvm() previously represented a "delta" to apply to the current environment. This was used to modify the environment surrounding fork/execv calls because we could not pass an envp to execv. However, when using the resulting map to create the envp value of a call like posix_spawn, we must be exhaustive and list all variables we expect the subprocess to have. And as we are going to be using posix_spawn, we need to make this happen, so change PrepareEnvironmentForJvm() to make the returned map exhaustive. Prerequisite change to address #7446 because of the need to switch to posix_spawn. RELNOTES: None. PiperOrigin-RevId: 234971246
Introduce a new "daemonize" helper tool that takes the heavy-lifting of starting the Bazel server process as a daemon. This tool is pure C and does not have any big dependencies like gRPC so we can ensure it's thread-free. As a result, the code in this tool doesn't have to take a lot of care when running fork(), which results in easier to read code overall. Then, change the Bazel client to simply use posix_spawn to invoke this helper tool, which in turn runs the Bazel server as a daemon. The code in the Bazel client becomes simpler as well as we don't have to worry about the implications of using fork() from a threaded process. The switch to posix_spawn is necessary to address #7446 because the only way to customize the QoS of a process in macOS is by using this API. Such a change will come separately. This change is intended to be a no-op. However, this fixes a bug introduced in unknown commit by which we violated the requirement of not doing any memory management from a child process started with fork(). (I don't think this bug ever manifested itself, but it was there -- which is funny because this piece of code went to great extents to not do the wrong thing... and a one-line change let hell break loose.) RELNOTES: None. PiperOrigin-RevId: 234991943
I think we can consider this done for now. There are a couple of things that could be looked into to make this better, but I don't think they are critical:
|
I had to roll back the default QoS change. I'm now adding support for this feature behind a flag and will evaluate behavior on different classes (Utility vs. User-initiated). |
This adds a new flag to set the QoS class of the Bazel server and accepts selecting any of the possible classes exposed by the system. We will use this flag to evaluate the behavior of Bazel under different classes because it is not clear which one we should be using. (We tried forcing the class to be utility, but that caused regressions in some cases.) This is essentially a retry of 0877340 but gated by a flag (which is how it should have been done in the first place). Note that the flag exists on all platforms to ensure that a bazelrc file shared across different people works in all cases, but this flag is ignored in non-macOS systems. Addresses #7446. RELNOTES: None. PiperOrigin-RevId: 238440116
Too bad about the rollback. Do you think that this is likely to make it into 0.25.0? |
The default change was rolled back, but I added a At this point I don't think we should change the default for the whole process given that this resulted in slower builds for some people. But you can change it based on your observations, or on whether you prefer faster builds vs. a more responsive system. |
Thanks. I see that's in the master version of the command-line reference, so I'll try it out with 0.25 or just build master if I get impatient. :) |
Something I'd like us to do is have a "performance tuning" page where we document the various knobs in Bazel to adjust performance characteristics -- and documenting the impact of this new flag on system responsiveness would nicely fit there. FTR most of the trouble we observed with QoS settings was because the services Bazel was using internally (think FUSE daemons) were running at a lower class than Bazel. We fixed this by raising the priority of those services instead. That said, as I don't foresee us doing anything else in this area right now, I'm closing this bug. |
This page isn't exactly what you're referring to, but it still might be good to add a reference here - https://docs.bazel.build/versions/master/skylark/performance.html Edit: maybe not, actually... |
That page is seemingly focused on rule authors (based on the path and the contents). We need something that's user-facing. @meisterT |
I take it that change didn't make it into 0.25.2? I can't seem to find that flag:
bazel version returns:
|
It appears to be in the tag https://github.com/bazelbuild/bazel/blob/0.25.2/src/main/java/com/google/devtools/build/lib/runtime/BlazeServerStartupOptions.java#L487-L499 note that it is a startup option, not a build option |
Gah. That's likely my mistake. Thanks! |
macOS has a concept of thread QoS service classes. Bazel at the moment uses the default class when run from the command line. The documentation strongly recommends defining QoS classes for proper operation instead of relying on the default one. See https://developer.apple.com/library/archive/documentation/Performance/Conceptual/power_efficiency_guidelines_osx/PrioritizeWorkAtTheTaskLevel.html
We have found that the use of the default class causes two problems: first, Bazel can make the system unresponsive; and, second, Bazel can become very slow if it relies on system services that run at a lower service class (e.g. think of FUSE file systems).
I have been experimenting with this by explicitly declaring Bazel to run at the Utility level and this makes both problems go away. I think we should do this change, but then account for the many threads we run. In particular, it'd be nice if we defaulted to Utility but made the UI thread and the Bazel client higher priority to ensure snappier console output. (This latter detail is nice, but not a requirement in my opinion.) Of course, we must measure if this causes a performance regression.
I also tried setting Bazel to Background level. This made the system even more snappy but also prevented Bazel from fully utilizing all CPUs.
One last thing to note is that changing the QoS class of a program after it has started, even if it has just one thread, does not modify the class for the main thread. This means that any further spawned threads or subprocesses do not respect the class change and instead use whatever was set at the main thread level. The only way I've found so far of changing the class of the main level is by using
posix_spawnattr_set_qos_class_np
, which means we have to exec a process usingposix_spawn
. This makes the change slightly more complicated, but given that the Bazel client is already spawning the server, we can probably fit this in that code path.The text was updated successfully, but these errors were encountered: