Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use a single file for the emit cache to eliminate multiple cache files getting out of sync #13303

Closed
wants to merge 5 commits into from

Conversation

dsherret
Copy link
Member

@dsherret dsherret commented Jan 7, 2022

Solves part of #13302.

This PR combines the version hash (.meta), emit (.js), source map (.map), and declaration (.d.ts) cache files into a single file (.emitinfo) to eliminate race conditions between the four files on the file system. This should now make it impossible for a version hash to be stored for an emit that doesn't match.

This does not completely solve #13302 because our emit process still relies on using cache data from the file system that could be out of date with the data stored in the graph if another deno process edits it. We should fix this part in a future PR.

At the current moment, this will be less performant because it will be loading all the data from the file when checking the version hash. This can be essentially eliminated when implementing the rest of #13302 because we'll be able to keep everything in memory instead of visiting the cache multiple times. There will only be a performance penalty when the cache is out of date instead of also when it's also not out of date.

map,
// todo(dsherret): deno_graph should not have multiple files for emit and map
emit: emit.clone(),
map: emit,
})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this CacheInfo used for? I forget and didn't figure it out on a quick glance as it doesn't seem to be used (unless my editor is failing me).

@@ -350,3 +340,195 @@ impl CacherLoader for MemoryCacher {
self
}
}

fn get_cache_filename(url: &Url) -> Option<PathBuf> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function and the one below were moved from disk_cache.rs into here. I didn't change the contents, only moved them off the struct and made it private here.

.dir
.gen_cache
.get_cache_filename_with_extension(&found, "js")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had multiple ways for getting this data from the cache. Here it was accessing it via DiskCache and other places it was FetchCache. I've combined them to one api (.get_emit_data)

v == get_version(module.source.as_bytes(), &config_bytes)
});
.get_emit_data(&module.specifier)
.map_or(false, |d| d.version_hash == version);
Copy link
Member Author

@dsherret dsherret Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the more inefficient code (I haven't measured it, though looking briefly at the benchmarks I don't see a regression), but fixing the todo outlined on line 733 and the issue I mentioned in the PR description should make it so that we don't get any performance penalty for an up to date cache.

@dsherret dsherret changed the title fix: use a single file for the emit cache to eliminate race conditions fix: use a single file for the emit cache to eliminate cached data getting out of sync Jan 7, 2022
@dsherret dsherret changed the title fix: use a single file for the emit cache to eliminate cached data getting out of sync fix: use a single file for the emit cache to eliminate multiple cache files getting out of sync Jan 7, 2022
pub(crate) enum CacheType {
Declaration,
Emit,
SourceMap,
TypeScriptBuildInfo,
Copy link
Member Author

@dsherret dsherret Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR actually currently panics on old cache dir files (just tried it now. Edit: now it errors with an actionable message, but still not ideal). The old TS build info just needs to be invalidated somehow.

I think maybe we should store the tsbuildinfo versioned beyond what the ts compiler does in a JSON object like:

{"version":1,"data": "..."}

Then we could really easily invalidate the cache by bumping the version value. Thoughts @kitsonk?

match emit.media_type {
MediaType::JavaScript | MediaType::Mjs | MediaType::Cjs => {
let version = get_version(source_bytes, &config_bytes);
cache.set(CacheType::Version, &specifier, version)?;
cache.set(CacheType::Emit, &specifier, emit.data)?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of a previous bug would be if the process got killed right before entering this function call.

#[derive(Debug, Deserialize, Serialize)]
pub struct EmitMetadata {
pub version_hash: String,
}
Copy link
Member Author

@dsherret dsherret Jan 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative solution is to store hashes of all the cached files here and then leave the original cache files as-is.

This would be more complex though and would still mean the files could get out of sync though the code here could be changed to tell if so. Also, I think it would be slower because four files would need to be loaded/checked for instead of just one and these files would need to be hashed and then compared to against the hashes here. Also it would up the complexity of any other tools that would read the cache thereby increasing the chance of them not doing it correctly and getting out of date information for one of the files.

@bartlomieju
Copy link
Member

I verified that this PR works with Chrome Devtools; we should also check that VSCode debugger still works (preferable JetBrains IDE too).

@dsherret
Copy link
Member Author

I verified that this PR works with Chrome Devtools; we should also check that VSCode debugger still works (preferable JetBrains IDE too).

@bartlomieju I tested in VSCode and it works fine and I don't see why it wouldn't—nothing knows about the location of those files on the disk except the code in Deno that goes to load them (unless there's some other code I'm missing that accesses this filepath).

@dsherret
Copy link
Member Author

I'm going to close this PR. I'm going to open a new one that uses an sqlite db instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants