Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve "action.prepare" performance #19266

Open
chiragramani opened this issue Aug 16, 2023 · 5 comments
Open

Improve "action.prepare" performance #19266

chiragramani opened this issue Aug 16, 2023 · 5 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Performance Issues for Performance teams type: bug

Comments

@chiragramani
Copy link
Contributor

chiragramani commented Aug 16, 2023

Description of the bug:

action.prepare step - 4.4 seconds
ActionContinuation.execute - 10.4 seconds

Since all of this is in critical path where every second savings matter, we would like to explore if there are opportunities for improving the performance of the action.prepare step.

From the documentation here,

// This call generally deletes any files at locations that are declared outputs of the
// action, although some actions perform additional work, while others intentionally
// keep previous outputs in place.
Screenshot 2023-08-16 at 4 27 23 PM

Which category does this issue belong to?

Performance

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

No response

Which operating system are you running Bazel on?

macOS

What is the output of bazel info release?

Bazel 6.4 e34112f

@chiragramani chiragramani changed the title Improving "action.prepare" performance Improve "action.prepare" performance Aug 17, 2023
@brentleyjones
Copy link
Contributor

brentleyjones commented Aug 17, 2023

I believe most of that time is deleting previous artifacts. In the case of tree artifacts it would probably be a lot more efficient to move the directory to a temp directory and then asynchronously delete that directory (and since it's in a temp directory, if the process is cancelled, the OS will eventually clean it up).

I believe

/**
* Deletes all directory trees recursively beneath the given path and removes that path as well.
*
* @param path the directory hierarchy to remove
* @throws IOException if the hierarchy cannot be removed successfully
*/
protected void deleteTree(PathFragment path) throws IOException {
deleteTreesBelow(path);
delete(path);
}

is what is used right now, and if so, yeah, it would work a lot better if it worked like I described.

@Pavank1992 Pavank1992 added the team-Rules-Java Issues for Java rules label Aug 17, 2023
@Pavank1992 Pavank1992 added team-Performance Issues for Performance teams and removed team-Rules-Java Issues for Java rules labels Aug 17, 2023
@tjgq
Copy link
Contributor

tjgq commented Aug 17, 2023

I believe most of that time is deleting previous artifacts.

Yes, deleting large tree artifacts is known to be slow, especially on MacOS where filesystem performance leaves a lot to be desired. I've also observed large delays when deleting stale sandboxes left over by previous builds, which would likely also benefit from your proposed solution (although, to be clear, these would manifest themselves differently in a profile).

move the directory to a temp directory and then asynchronously delete that directory (and since it's in a temp directory, if the process is cancelled, the OS will eventually clean it up).

Moving to an OS-designated temporary directory (e.g. /tmp) might cross filesystems and end up being just as inefficient. I think I might prefer to move them into a designated subdirectory of the output base, and have them reaped by a background thread.

One thing I want to be careful with is to wait for any pending deletions to finish before we consider the invocation finished, as otherwise they could interfere with the performance of subsequent invocations in a benchmarking context.

@brentleyjones
Copy link
Contributor

One thing I want to be careful with is to wait for any pending deletions to finish before we consider the invocation finished, as otherwise they could interfere with the performance of subsequent invocations in a benchmarking context.

Looks like Bazel already thought of all of that, but only in the context of sandbox trees:

/**
* Executes file system tree deletions asynchronously.
*
* <p>The number of threads used to process the backlog of tree deletions can be configured at any
* time via {@link #setThreads(int)}. While a build is running, this number should be low to not use
* precious resources that could otherwise be used for the build itself. But when the build is
* finished, this number should be raised to quickly go through any pending deletions.
*/
class AsynchronousTreeDeleter implements TreeDeleter {

I bet that could be massaged to work for your case as well.

@tjgq
Copy link
Contributor

tjgq commented Aug 17, 2023

Great spelunking!

It looks like there's both a synchronous and an asynchronous TreeDeleter implementation, but the default is synchronous and you have to set --experimental_sandbox_async_tree_delete_idle_threads to make it asynchronous.

@larsrc-google, do you know why asynchronous isn't the default? (I vaguely remember discussing this with you, but I don't recall the conclusion.)

@larsrc-google
Copy link
Contributor

I think we discussed it in connection with sandbox cleanup for reuse, where I suspected async deletion of causing rare errors. But I don't know why it isn't turned on for non-reusing sandboxes, it was created before my time on Bazel (16af94c).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Performance Issues for Performance teams type: bug
Projects
None yet
Development

No branches or pull requests

9 participants