-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Cache already exists" due to concurrency #537
Comments
I created my own action and I seem to get a wanring caused by this error quite reliably on every build (if the cache was not already present before I started it). It is annoying.
If we look at the // https://github.com/actions/cache/blob/d29c1df198dd38ac88e0ae23a2881b99c2d20e68/src/save.ts#L43
} catch (error) {
if (error.name === cache.ValidationError.name) {
throw error;
} else if (error.name === cache.ReserveCacheError.name) {
core.info(error.message);
} else {
utils.logWarning(error.message);
}
} This // https://github.com/actions/toolkit/blob/9ad01e4fd30025e8858650d38e95cfe9193a3222/packages/cache/src/cache.ts#L146
const cacheId = await cacheHttpClient.reserveCache(key, paths, {
compressionMethod
})
if (cacheId === -1) {
throw new ReserveCacheError(
`Unable to reserve cache with key ${key}, another job may be creating this cache.`
)
} Unfortunatly, another generic error seems to be thrown before we reach that point. I think it is initially thown by the PS: Something like |
Then maybe Btw. I copied the exception handling from |
Yes, returning a boolean instead of throwing an exception might be a better API. Anyway, I would already be happy if |
Thanks for reporting this. There's definitely an issue with handling non-successful status codes. It looks like there were a series of changes that led up to this. Caching used to call the If the cache already exists, the server returns a 409 Conflict so |
Before fixing the cache module, will need to fix |
Is there any timeline for when this will be fixed? Currently, I'm using the following workaround: try {
await cache.saveCache(paths, key);
} catch (err) {
if (err.message.includes("Cache already exists")) {
core.info(`Cache entry ${key} has already been created by another worfklow`);
} else {
throw err;
}
} This seems to work to catch and neutralise the error. The error is detected through the error message because the type of the error seems to be just |
In the previous code there was a potential race condition when two or more workflows checked the cache at the same time: 1. Workflow 1: cache miss; Workflow 2: cache miss 2. Workflow 1: install; Workflow 2: install 3. Workflow 1: save cache; Workflow 2: save cache <<< RACE CONDITION This resulted in the step failing with the following error: Error: reserveCache failed: Cache already exists. Scope: refs/heads/master, Key: <key>, Version: <...> It's a bug in the @actions/cache package and it's discussed in the following issue: actions/toolkit#537 The correct behaviour would be for Workflow 2 to simply drop its attempt to save the cache entry when this situation occurs. For now, this code uses a workaround to fix this issue which works by detecting and catchin the above error based on its error message.
@weibeld Thanks for the ping. The changes to |
PR for the cache module: #558 Once this is merged and published, we'll also need to bump the version used by the |
If you have a look at this workflow run: https://github.com/Vampire/setup-wsl/runs/938421512?check_suite_focus=true you see the problem.
Here the timeline of the problem:
The problem is clear, the typical problem you have with lazy initialization if you don't do the double-checked locking properly.
Both jobs see the cache is not present, prepare the thing to be cached and then try to cache it, while the second fails at caching.
Is there a clean way to solve this, or can something be added like for example a method
cache.exists()
that only checks whether the cache exists? Doing a fullrestore
before doing thesave
seems a bit overkill just to check whether in the meantime another job already filled the cache. And just catching the exception feels like "exception for flow control" if there could maybe be a simple test method to check up front.The text was updated successfully, but these errors were encountered: