Fix possible race condition in the caching of request graph #9675

benjervis · 2024-04-29T09:35:56Z

TL;DR: To fix some existing race conditions in the way the request graph is cached, the old cached graph is now deleted from disk at the start of the process to prevent stale references, and the cache/snapshot write has been moved until after the node-blobs processing rather than being queued.

As part of the incremental cache write system, we add all the nodes to be cached on to the queue, and then the request graph itself and the cache snapshot.

We're currently having some problems with "hanging builds" in our application, which means that a number of users are killing the Parcel process through more drastic means once their ctrl C has taken too long.

There are two possible scenarios that we think are causing issues in our application at the moment.

Problems

1. Missing nodes

The process is force quit before the nodes are finished being written, but after the request graph cache has been written.

Given the promise queue used in the cache system has concurrency inbuilt, it's possible that the request graph write would finish before the last couple of node-blob writes.

This means that the request graph written to cache would contain references to nodes that don't actually exist in the cache. Since the node IDs are index based, this is causing the graph to look up the wrong nodes.

2. Old request graph cache

The process is force quit after the nodes are finished being written, but before the request graph cache has been written.

This would result in all the new node-blobs being written to disk, but the old request graph remaining in place. This means that all its index-based IDs now, again, point to different nodes than it was expecting.

Solutions

This PR contains fixes for both of these issues.

To fix problem number 1, the write of the request graph and snapshot file have been removed from the queue system, and now wait until after the queue has been flushed. This means that the request graph cache file won't be written until all of the node-blobs are in place.

To fix problem number 2, the cached request graph is deleted once we start writing a new cache. To achieve this, a new deleteLargeBlob method has been added to the parent Cache type, and implemented for each different cache variant. This means that if a user force-quits the process before the new request graph cache has been written, there will be no cached graph at all, eliminating the stale references problem.

…new one

…tten

mattcompiles

Nice job. Would be great to be able to test this behaviour somehow though 🤔

marcins · 2024-04-29T23:01:21Z

packages/core/cache/src/FSCache.js

+    let i = 0;
+    let filePath = this.#getFilePath(key, i);
+
+    while (await this.fs.exists(filePath)) {


/nit fs.exists (as opposed to existsSync) prior to performing a file operation is "deprecated" as the existence could change between the two operations.

It's an edge case, but you should probably use existsSync here. I believe rimraf itself will never "fail" if the file doesn't exist.

Good call, I'll fix it up in my next PR

benjervis added 3 commits April 29, 2024 19:09

Implement a deleteLargeBlob method for the various cache types

0479aae

Delete the existing request graph cache once we've started writing a …

f80b90b

…new one

Move the graph cache and snapshot write until after all nodes are wri…

8cc92cf

…tten

github-actions bot had a problem deploying to Preview April 29, 2024 09:45 Failure

Merge branch 'v2' into fix-request-cache-race-cond

fa0dc2e

github-actions bot had a problem deploying to Preview April 29, 2024 10:56 Failure

benjervis requested review from mattcompiles and a team April 29, 2024 21:41

mattcompiles approved these changes Apr 29, 2024

View reviewed changes

benjervis merged commit 03983cb into v2 Apr 29, 2024
16 of 17 checks passed

benjervis deleted the fix-request-cache-race-cond branch April 29, 2024 22:24

marcins reviewed Apr 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix possible race condition in the caching of request graph #9675

Fix possible race condition in the caching of request graph #9675

benjervis commented Apr 29, 2024

mattcompiles left a comment

marcins Apr 29, 2024

benjervis May 2, 2024

Fix possible race condition in the caching of request graph #9675

Fix possible race condition in the caching of request graph #9675

Conversation

benjervis commented Apr 29, 2024

Problems

1. Missing nodes

2. Old request graph cache

Solutions

mattcompiles left a comment

Choose a reason for hiding this comment

marcins Apr 29, 2024

Choose a reason for hiding this comment

benjervis May 2, 2024

Choose a reason for hiding this comment