Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ctime in file digest cache key #18003

Closed
wants to merge 7 commits into from

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Apr 6, 2023

File digests are now additionally keyed by ctime for supported file system implementations. Since Bazel has a non-zero default for --cache_computed_file_digests, this may be required for correctness in cases where different files have identical mtime and inode number. For example, this can happen on Linux when files are extracted from a tar file with fixed mtime and are then replaced with mv, which preserves inodes.

Since Java (N)IO doesn't have support for reading file ctimes on Windows, a new method backed by a native implementation is added to WindowsFileOperation. Adding a call to this function to stat uncovered previously silent bugs where Unix-style PathFragments were created on Windows:

  1. Bzlmod's createLocalRepoSpec did not correctly extract the path from a registry's file:// URI on Windows.
  2. --package_path isn't usable with absolute paths on Windows as it splits on :. Since the flag is deprecated, this commit fixes the tests rather than the implementation.

Fixes #14723

@fmeum
Copy link
Collaborator Author

fmeum commented Apr 6, 2023

@janakdr Could you review this fix?

test_cache_computed_file_digests_behavior is failing as it incurs two cache misses rather than getting the expected cache hit for stable-status.txt. This is surprising to me as BazelWorkspaceStatusAction only recreates stable-status.txt if the content changed, which it shouldn't in this test.

Edit: Is it possible that ctime is changed when the action inputs are made read-only? That could make the digest cache useless for output files with this change.

Edit: Can confirm that the failure goes away if I comment out https://cs.opensource.google/bazel/bazel/+/9800ffd1a471582fb42fd782e2e2eeb39dba6c9b:src/main/java/com/google/devtools/build/lib/skyframe/ActionMetadataHandler.java;l=731, so this probably requires more extensive changes.

Edit: The Windows file system implementation does not support ctimes, which means that the issue isn't fixed by this PR on Windows. However, ctime could be obtained with native calls.

@fmeum fmeum force-pushed the 14723-ctime-in-cache branch 2 times, most recently from ad913d0 to 26c0be6 Compare April 6, 2023 09:25
@meteorcloudy meteorcloudy added the team-Core Skyframe, bazel query, BEP, options parsing, bazelrc label Apr 6, 2023
@meteorcloudy
Copy link
Member

/cc @justinhorvitz can you take a look at this?

@justinhorvitz
Copy link
Contributor

test_cache_computed_file_digests_behavior is failing as it incurs two cache misses rather than getting the expected cache hit for stable-status.txt. This is surprising to me as BazelWorkspaceStatusAction only recreates stable-status.txt if the content changed, which it shouldn't in this test.

Does the latest change in 1affa3e fix this?

@fmeum
Copy link
Collaborator Author

fmeum commented Apr 6, 2023

@justinhorvitz Yes, at least as far as the tests are concerned. I don't know whether the cache thrashing is only relevant for stable-status.txt because it's generating actions runs on every build or whether this would affect performance more broadly. If it could, then tree artifact files may still be affected.

Regarding ctime on Windows: I could add native functions that get it, but this requires an additional stat call per file. I have no intuition for performance on Windows, would that still be better than disabling the digest cache?

@justinhorvitz
Copy link
Contributor

@justinhorvitz Yes, at least as far as the tests are concerned. I don't know whether the cache thrashing is only relevant for stable-status.txt because it's generating actions runs on every build or whether this would affect performance more broadly. If it could, then tree artifact files may still be affected.

Regarding ctime on Windows: I could add native functions that get it, but this requires an additional stat call per file. I have no intuition for performance on Windows, would that still be better than disabling the digest cache?

I don't have much personal stake in external bazel or Windows performance so let's ask someone else. @tjgq do you see any potential scenario where tree artifact outputs would have a changed ctime but the same mtime?

@fmeum
Copy link
Collaborator Author

fmeum commented Apr 6, 2023

I don't have much personal stake in external bazel or Windows performance so let's ask someone else. @tjgq do you see any potential scenario where tree artifact outputs would have a changed ctime but the same mtime?

Just want to note that these are separate issues:

  1. ActionMetadataHandler may still thrash the digest cache for tree artifacts by unconditionally chmodding them on all OSes.
  2. Windows doesn't provide an actual ctime, which means that this PR can't fix the actual issue.

@justinhorvitz
Copy link
Contributor

I don't have much personal stake in external bazel or Windows performance so let's ask someone else. @tjgq do you see any potential scenario where tree artifact outputs would have a changed ctime but the same mtime?

Just want to note that these are separate issues:

  1. ActionMetadataHandler may still thrash the digest cache for tree artifacts by unconditionally chmodding them on all OSes.
  2. Windows doesn't provide an actual ctime, which means that this PR can't fix the actual issue.

Yup, understood.

@@ -120,6 +120,22 @@ public long getNodeId() {
return System.identityHashCode(this);
}

@Override
public synchronized int getPermissions() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see what this needs to be synchronized with. Also, please make it final.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it final and removed synchronized. I am not entirely sure about the intended behavior of getPermissions when the file permissions are updated concurrently, but I agree that synchronized doesn't help as the setters aren't synchronized.

Copy link
Contributor

@justinhorvitz justinhorvitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks good to me, but I think we should get a second opinion specifically for the tree artifact question.

@fmeum fmeum force-pushed the 14723-ctime-in-cache branch 7 times, most recently from beac457 to 8043408 Compare April 9, 2023 14:11
@meteorcloudy meteorcloudy added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Apr 11, 2023
@fmeum
Copy link
Collaborator Author

fmeum commented Apr 12, 2023

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Apr 12, 2023
@keertk
Copy link
Member

keertk commented Apr 12, 2023

@bazel-io fork 6.2.0

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Apr 12, 2023
@fmeum
Copy link
Collaborator Author

fmeum commented Apr 12, 2023

@keertk @meteorcloudy I think that we should probably fork this for 5.4.1 and 6.1.2 as well as it fixes a correctness issue.

@keertk
Copy link
Member

keertk commented Apr 12, 2023

@bazel-io fork 5.4.1

@keertk
Copy link
Member

keertk commented Apr 12, 2023

@bazel-io fork 6.1.2

@meteorcloudy
Copy link
Member

meteorcloudy commented Apr 13, 2023

@fmeum Some update: we are importing this CL, but one internal test was broken, looking into a fix.

@@ -727,7 +727,8 @@ private static FileArtifactValue fileArtifactValueFromStat(
}

private void setPathPermissionsIfFile(Path path) throws IOException {
if (path.isFile(Symlinks.NOFOLLOW)) {
FileStatus stat = path.stat(Symlinks.NOFOLLOW);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use statIfFound? All of the other stats in this file use that to tolerate a nonexistent file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, it should to preserve the original semantics. I pushed a commit to do this. I am a bit surprised that no test failed due to this change - do all missing files at this point result in a missing outputs error down the line anyway?

@justinhorvitz
Copy link
Contributor

@fmeum Some update: we are importing this CL, but one internal test was broken, looking into a fix.

The broken test was counting the number of stats, but wasn't doing so in a very robust way. I've updated it which should unblock the import. Meanwhile, while looking into this, I found I have one more question which I asked in ActionMetadataHandler.

@ulrfa
Copy link
Contributor

ulrfa commented Apr 14, 2023

I'm thinking about https://bazel.build/reference/command-line-reference#flag--experimental_use_hermetic_linux_sandbox that populates sandboxes by creating hard links.

Ctime is changed when additional hard links are created to existing inodes (link count changes).

How is this change affecting the file digest cache for that use case?

@fmeum
Copy link
Collaborator Author

fmeum commented Apr 14, 2023

I'm thinking about https://bazel.build/reference/command-line-reference#flag--experimental_use_hermetic_linux_sandbox that populates sandboxes by creating hard links.

Ctime is changed when additional hard links are created to existing inodes (link count changes).

How is this change affecting the file digest cache for that use case?

I assume it would make it so that digests are recomputed every time a hard link is created, which doesn't sound good. Could you maybe try this out with Bazel built from this PR?

I don't know enough about the hermetic sandbox to think of ways to improve digest caching for it. Since Bazel doesn't really care about changes to metadata, we would ideally like to have a "modification time, but not overrideable by users", for which ctime is really just a proxy, but I don't see a way to get this more restricted information efficiently.

@fmeum fmeum deleted the 14723-ctime-in-cache branch April 14, 2023 15:31
@keertk keertk removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Apr 14, 2023
keertk added a commit that referenced this pull request Apr 14, 2023
File digests are now additionally keyed by ctime for supported file system implementations. Since Bazel has a non-zero default for `--cache_computed_file_digests`, this may be required for correctness in cases where different files have identical mtime and inode number. For example, this can happen on Linux when files are extracted from a tar file with fixed mtime and are then replaced with `mv`, which preserves inodes.

Since Java (N)IO doesn't have support for reading file ctimes on Windows, a new method backed by a native implementation is added to `WindowsFileOperation`. Adding a call to this function to `stat` uncovered previously silent bugs where Unix-style `PathFragment`s were created on Windows:

1. Bzlmod's `createLocalRepoSpec` did not correctly extract the path from a registry's `file://` URI on Windows.
2. `--package_path` isn't usable with absolute paths on Windows as it splits on `:`. Since the flag is deprecated, this commit fixes the tests rather than the implementation.

Fixes #14723

Closes #18003.

PiperOrigin-RevId: 524297459
Change-Id: I96bfc0210e2f71bf8603c7b7cc5eb06a04048c85

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
keertk added a commit that referenced this pull request Apr 14, 2023
File digests are now additionally keyed by ctime for supported file system implementations. Since Bazel has a non-zero default for `--cache_computed_file_digests`, this may be required for correctness in cases where different files have identical mtime and inode number. For example, this can happen on Linux when files are extracted from a tar file with fixed mtime and are then replaced with `mv`, which preserves inodes.

Since Java (N)IO doesn't have support for reading file ctimes on Windows, a new method backed by a native implementation is added to `WindowsFileOperation`. Adding a call to this function to `stat` uncovered previously silent bugs where Unix-style `PathFragment`s were created on Windows:

1. Bzlmod's `createLocalRepoSpec` did not correctly extract the path from a registry's `file://` URI on Windows.
2. `--package_path` isn't usable with absolute paths on Windows as it splits on `:`. Since the flag is deprecated, this commit fixes the tests rather than the implementation.

Fixes #14723

Closes #18003.

PiperOrigin-RevId: 524297459
Change-Id: I96bfc0210e2f71bf8603c7b7cc5eb06a04048c85

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
fmeum added a commit to fmeum/bazel that referenced this pull request Apr 17, 2023
File digests are now additionally keyed by ctime for supported file system implementations. Since Bazel has a non-zero default for `--cache_computed_file_digests`, this may be required for correctness in cases where different files have identical mtime and inode number. For example, this can happen on Linux when files are extracted from a tar file with fixed mtime and are then replaced with `mv`, which preserves inodes.

Since Java (N)IO doesn't have support for reading file ctimes on Windows, a new method backed by a native implementation is added to `WindowsFileOperation`. Adding a call to this function to `stat` uncovered previously silent bugs where Unix-style `PathFragment`s were created on Windows:

1. Bzlmod's `createLocalRepoSpec` did not correctly extract the path from a registry's `file://` URI on Windows.
2. `--package_path` isn't usable with absolute paths on Windows as it splits on `:`. Since the flag is deprecated, this commit fixes the tests rather than the implementation.

Fixes bazelbuild#14723

Closes bazelbuild#18003.

PiperOrigin-RevId: 524297459
Change-Id: I96bfc0210e2f71bf8603c7b7cc5eb06a04048c85
keertk pushed a commit that referenced this pull request Apr 17, 2023
File digests are now additionally keyed by ctime for supported file system implementations. Since Bazel has a non-zero default for `--cache_computed_file_digests`, this may be required for correctness in cases where different files have identical mtime and inode number. For example, this can happen on Linux when files are extracted from a tar file with fixed mtime and are then replaced with `mv`, which preserves inodes.

Since Java (N)IO doesn't have support for reading file ctimes on Windows, a new method backed by a native implementation is added to `WindowsFileOperation`. Adding a call to this function to `stat` uncovered previously silent bugs where Unix-style `PathFragment`s were created on Windows:

1. Bzlmod's `createLocalRepoSpec` did not correctly extract the path from a registry's `file://` URI on Windows.
2. `--package_path` isn't usable with absolute paths on Windows as it splits on `:`. Since the flag is deprecated, this commit fixes the tests rather than the implementation.

Fixes #14723

Closes #18003.

PiperOrigin-RevId: 524297459
Change-Id: I96bfc0210e2f71bf8603c7b7cc5eb06a04048c85
fweikert pushed a commit to fweikert/bazel that referenced this pull request May 25, 2023
File digests are now additionally keyed by ctime for supported file system implementations. Since Bazel has a non-zero default for `--cache_computed_file_digests`, this may be required for correctness in cases where different files have identical mtime and inode number. For example, this can happen on Linux when files are extracted from a tar file with fixed mtime and are then replaced with `mv`, which preserves inodes.

Since Java (N)IO doesn't have support for reading file ctimes on Windows, a new method backed by a native implementation is added to `WindowsFileOperation`. Adding a call to this function to `stat` uncovered previously silent bugs where Unix-style `PathFragment`s were created on Windows:

1. Bzlmod's `createLocalRepoSpec` did not correctly extract the path from a registry's `file://` URI on Windows.
2. `--package_path` isn't usable with absolute paths on Windows as it splits on `:`. Since the flag is deprecated, this commit fixes the tests rather than the implementation.

Fixes bazelbuild#14723

Closes bazelbuild#18003.

PiperOrigin-RevId: 524297459
Change-Id: I96bfc0210e2f71bf8603c7b7cc5eb06a04048c85
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Core Skyframe, bazel query, BEP, options parsing, bazelrc team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Action outputs with fixed timestamps cause problems
7 participants