-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use stat() to avoid checking content hashes for repository up-to-dateness checks #21044
Comments
Just want to note that Bazel doesn't check just mtime, but also ctime and other attributes, before computing a digest: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/vfs/DigestUtils.java;l=42;bpv=1;bpt=1 |
You're indeed right -- I used "mtime" as a shorthand for "whatever can be gleaned about a file short of looking at its contents". |
@SalmaSamy AFAIU the ctime always changes if mtime does, so using that instead of the mtime would cause suboptimal caching in the worst case, which is perfectly fine (if somewhat inconsistent with action inputs, but that's just a minor gripe) |
There is the (probably very theoretical) concern of having the filesystem backing a path change. By chance, the ctime could remain the same but the file contents could differ. Bazel uses more stat properties such as file size and inside for the digest cache key, but I could see that being overkill for repo rules. |
Good point! TBH I think the best approach here is to mimic what |
@bazel-io fork 7.1.0 |
Note that timestamps on HFS+ have a granularity of one second. |
FWIW, that's what |
I think just using mtime alone now provides sufficient information for our needs, as it directly reflects changes to the file's content, which is our primary concern. Storing ctime or inode may add unnecessary complexity without offering much additional value as I think the extra information they store would already invalidate the label Ex: file location or name or very rarely changed Ex: file owner. |
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents. - Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting. - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044) Work towards #20952.
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents. - Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting. - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044) Work towards #20952.
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents. - Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting. - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044) Work towards #20952.
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents. - In the future we could add glob patterns to this method. - Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting. - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044) Work towards #20952.
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents. - In the future we could add glob patterns to this method. - Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting. - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044) Work towards #20952.
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents. - In the future we could add glob patterns to this method. - Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting. - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044) Work towards #20952.
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents. - In the future we could add glob patterns to this method. - Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting. - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044) Work towards #20952. Closes #21362. PiperOrigin-RevId: 608667062 Change-Id: Ibacbb7af4cf4d7628fe8fcf06e2c4fa50e811e4e
Lowering the priority of this one due to the uncertainty of its correctness (and we're out of time for 7.1.0 anyway). |
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents. - In the future we could add glob patterns to this method. - Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting. - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044) Work towards #20952. Closes #21362. PiperOrigin-RevId: 608667062 Change-Id: Ibacbb7af4cf4d7628fe8fcf06e2c4fa50e811e4e
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents. - In the future we could add glob patterns to this method. - Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting. - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044) Work towards #20952. Closes #21362. PiperOrigin-RevId: 608667062 Change-Id: Ibacbb7af4cf4d7628fe8fcf06e2c4fa50e811e4e
Description of the feature request:
The way Bazel verifies up-to-dateness of source files is that if some select data from
stat()
is unchanged, it believes it, and if not, it does a secondary check by checksumming source files to avoid re-running actions after a simpletouch
.Repository rules don't work like this: they simply checksum every file that the repository depends on, which makes up-to-dateness much slower than they would otherwise be.
It would be nice to make repository rules do the same thing as actions.
Which category does this issue belong to?
No response
What underlying problem are you trying to solve with this feature?
Make the repository up-to-dateness check more efficient.
Which operating system are you running Bazel on?
No response
What is the output of
bazel info release
?No response
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.No response
What's the output of
git remote get-url origin; git rev-parse HEAD
?No response
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
No response
The text was updated successfully, but these errors were encountered: