Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix possible race condition in the caching of request graph #9675

Merged
merged 4 commits into from
Apr 29, 2024

Conversation

benjervis
Copy link
Contributor

TL;DR: To fix some existing race conditions in the way the request graph is cached, the old cached graph is now deleted from disk at the start of the process to prevent stale references, and the cache/snapshot write has been moved until after the node-blobs processing rather than being queued.


As part of the incremental cache write system, we add all the nodes to be cached on to the queue, and then the request graph itself and the cache snapshot.

We're currently having some problems with "hanging builds" in our application, which means that a number of users are killing the Parcel process through more drastic means once their ctrl C has taken too long.

There are two possible scenarios that we think are causing issues in our application at the moment.

Problems

1. Missing nodes

The process is force quit before the nodes are finished being written, but after the request graph cache has been written.

Given the promise queue used in the cache system has concurrency inbuilt, it's possible that the request graph write would finish before the last couple of node-blob writes.

This means that the request graph written to cache would contain references to nodes that don't actually exist in the cache. Since the node IDs are index based, this is causing the graph to look up the wrong nodes.

2. Old request graph cache

The process is force quit after the nodes are finished being written, but before the request graph cache has been written.

This would result in all the new node-blobs being written to disk, but the old request graph remaining in place. This means that all its index-based IDs now, again, point to different nodes than it was expecting.

Solutions

This PR contains fixes for both of these issues.

To fix problem number 1, the write of the request graph and snapshot file have been removed from the queue system, and now wait until after the queue has been flushed. This means that the request graph cache file won't be written until all of the node-blobs are in place.

To fix problem number 2, the cached request graph is deleted once we start writing a new cache. To achieve this, a new deleteLargeBlob method has been added to the parent Cache type, and implemented for each different cache variant. This means that if a user force-quits the process before the new request graph cache has been written, there will be no cached graph at all, eliminating the stale references problem.

Copy link
Contributor

@mattcompiles mattcompiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job. Would be great to be able to test this behaviour somehow though 🤔

@benjervis benjervis merged commit 03983cb into v2 Apr 29, 2024
16 of 17 checks passed
@benjervis benjervis deleted the fix-request-cache-race-cond branch April 29, 2024 22:24
let i = 0;
let filePath = this.#getFilePath(key, i);

while (await this.fs.exists(filePath)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/nit fs.exists (as opposed to existsSync) prior to performing a file operation is "deprecated" as the existence could change between the two operations.

It's an edge case, but you should probably use existsSync here. I believe rimraf itself will never "fail" if the file doesn't exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I'll fix it up in my next PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants