-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rctx.watch_tree()
to watch a directory tree
#21362
Conversation
The two main questions in my head:
|
mostly just being conservative -- only introducing this in the context we know it's needed in right now. we can add it to mctx later if need be. (whereas 'watch' had to be added to the base context because 'read' was already in the base context)
i think i addressed this somewhat in the other thread, but just to the API point, I don't have a strong feeling as of right now. Do you think having glob patterns are potentially useful enough in this api that we should e.g. change rctx.watch_tree to path.glob instead? |
I think the important part is not renaming Symmetry is a nice bonus, but what leads me think it's better that way is that |
01a297c
to
1cf3caa
Compare
da59002
to
551f437
Compare
1cf3caa
to
89c2a26
Compare
I mentioned "renaming So my current thinking is that we can add support for glob patterns to
One major difference is that |
89c2a26
to
372bed2
Compare
372bed2
to
5922093
Compare
So I did some simple benchmarking using the Bazel project itself (just the bazelbuild/bazel git repo). With a simple repo rule that watches the Bazel source tree in its entirety (including the convenience symlinks going into output base, and the .git directory), totalling ~12K files, on my 2021 M1 macbook pro:
so the recursive digesting takes about ~5 seconds. This is probably already an improvement for @ismell since you do benefit from hot builds unlike the |
The fact that And now, on for the review of the actual code! |
src/main/java/com/google/devtools/build/lib/skyframe/DirectoryTreeDigestFunction.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/skyframe/DirectoryTreeDigestFunction.java
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/skyframe/DirectoryTreeDigestFunction.java
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/skyframe/DirectoryTreeDigestFunction.java
Show resolved
Hide resolved
The code itself looks quite reasonable, modulo some nits. I have a question about semantics to both @Wyverald and @ismell though: AFAICT this implementation is agnostic to the symlink structure. I.e. if you have files Also, if there is a dangling symlink @Wyverald : Are these statements correct? @Wyverald @ismell : are these properties desirable? Working on Blaze, I'd be more comfortable with being conservative and making the symlink structure affect the checksum, although that'd add quite a bit of complexity since it's not immedately obvious what differences in the symlink tree should matter (e.g. if a tree has an absolute symlink, but to within itself, should the checksum depend on the location of the tree in the file system?) |
@Wyverald do you know what takes 5 seconds? Initially, I thought that it was because the checksumming and tree traversal run on one thread, but now I know that it's parallelized. My Bazel tree (according to |
Indeed; there's even a test case for that!
what do you mean by 'content'? as in, the path of the symlink target? then yes, the digest only contains the fact that it's dangling. |
how does that help with migration, actually? adding the |
not my sharpest moment
Yep, that's exactly it. TBH I'd be happier if in that case, there was a refetch on the theory that it's better to be slow than to be correct. Do you think that's feasible? (if @ismell says that this is OK for him, I'll relent, grudgingly) |
Not a lot, just a tiny bit My line of reasoning was then you don't need to default |
Could we guard the |
In our implementation we follow symlinks and hash the underlying file. If a symlink can't be resolved we hash the symlink contents instead. This way the digest changes if the missing file gets created and the symlink becomes valid. |
Not completely sure, but just noting that
I think that's as small as deals get, haha...
Not totally against it, but I feel like we've gone through most of the concerns already and this API and its behavior are malleable enough that we can add to it comfortably. Happy to hear @lberki's thoughts on this.
For the record, you don't need to hash the symlink contents for that -- just storing the fact that it's dangling is enough to cause the digest to change if the missing file gets created. (which is what I'm doing in this PR.) |
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents. - In the future we could add glob patterns to this method. - Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting. - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044) Work towards #20952.
5922093
to
50be63b
Compare
@Wyverald -- the question is, if you feel comfortable supporting this API forever, or else pay the migration costs. Let's game out what possible changes we might want to make:
So I think we are fine marking this as non-experimental. @fmeum can you come up with a change to
|
Good point -- I actually ran
|
@lberki Agreed, the changes I would reasonably expect could just be new parameters or, in the worst case, Bazel flags with a smaller scope. |
import javax.annotation.Nullable; | ||
|
||
/** A {@link SkyFunction} for {@link DirectoryTreeDigestValue}s. */ | ||
public final class DirectoryTreeDigestFunction implements SkyFunction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that adding this feature turned out to be pretty manageable with decent performance makes me think that we should reevaluate the state of BAZEL_TRACK_SOURCE_DIRECTORIES
. It's essentially an undocumented "forever experimental" flag and maybe, as we saw here, stabilizing it wouldn't have to be that much effort.
@lberki What do you think?
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents. - In the future we could add glob patterns to this method. - Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting. - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044) Work towards #20952. Closes #21362. PiperOrigin-RevId: 608667062 Change-Id: Ibacbb7af4cf4d7628fe8fcf06e2c4fa50e811e4e
- Added `rctx.watch_tree()` to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents. - In the future we could add glob patterns to this method. - Added a new SkyFunction DirectoryTreeDigestFunction to do the heavy lifting. - In the future, for performance, we could try to get this skyfunction to have a mode where it only digests stat(), to use as heuristics (similar to #21044) Work towards #20952. Closes #21362. PiperOrigin-RevId: 608667062 Change-Id: Ibacbb7af4cf4d7628fe8fcf06e2c4fa50e811e4e
rctx.watch_tree()
to watch a directory tree, which includes all transitive descendants' names, and if they're files, their contents.Work towards #20952.