Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vendor mode: move the external repo instead of copying #22668

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions site/en/external/vendor.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,8 @@ repository cache.
Therefore, you should be able to check in the vendored source and build the same
targets offline on another machine.

Note: If you build different targets or change the external dependencies, build
configuration, or Bazel version, you may need to re-vendor.
Note: If you make changes to the targets to build, the external dependencies, the build
configuration, or the Bazel version, you may need to re-vendor to make sure offline build still works.

## Vendor all external dependencies {:#vendor-all-dependencies}

Expand Down Expand Up @@ -139,7 +139,7 @@ always excluded from vendoring.
## Understand how vendor mode works {:#how-vendor-mode-works}

Bazel fetches external dependencies of a project under `$(bazel info
output_base)/external`. Vendoring external dependencies means copying out
output_base)/external`. Vendoring external dependencies means moving out
relevant files and directories to a given vendor directory and use the vendored
source for later builds.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,25 +57,36 @@ public void vendorRepos(Path externalRepoRoot, ImmutableList<RepositoryName> rep
}

for (RepositoryName repo : reposToVendor) {
// Only re-vendor the repository if it is not up-to-date.
if (!isRepoUpToDate(repo, externalRepoRoot)) {
Path markerUnderVendor = vendorDirectory.getChild(repo.getMarkerFileName());
Path repoUnderVendor = vendorDirectory.getRelative(repo.getName());

// 1. Clean up existing marker file and repo vendor directory
markerUnderVendor.delete();
repoUnderVendor.deleteTree();
repoUnderVendor.createDirectory();

// 2. Copy over the repo source.
FileSystemUtils.copyTreesBelow(
externalRepoRoot.getRelative(repo.getName()), repoUnderVendor, Symlinks.NOFOLLOW);
Path repoUnderExternal = externalRepoRoot.getChild(repo.getName());
Path repoUnderVendor = vendorDirectory.getChild(repo.getName());
// This could happen when running the vendor command twice without changing anything.
if (repoUnderExternal.isSymbolicLink() && repoUnderExternal.resolveSymbolicLinks().equals(repoUnderVendor)) {
continue;
}

// 3. Copy the marker file atomically
Path tMarker = vendorDirectory.getChild(repo.getMarkerFileName() + ".tmp");
FileSystemUtils.copyFile(externalRepoRoot.getChild(repo.getMarkerFileName()), tMarker);
tMarker.renameTo(markerUnderVendor);
// At this point, the repo should exist under external dir, but check if the vendor src is already up-to-date.
Path markerUnderExternal = externalRepoRoot.getChild(repo.getMarkerFileName());
Path markerUnderVendor = vendorDirectory.getChild(repo.getMarkerFileName());
if (isRepoUpToDate(markerUnderVendor, markerUnderExternal)) {
continue;
}

// Actually vendor the repo:
// 1. Clean up existing marker file and vendor dir.
markerUnderVendor.delete();
repoUnderVendor.deleteTree();
repoUnderVendor.createDirectory();
// 2. Move the marker file to a temporary one under vendor dir.
Path tMarker = vendorDirectory.getChild(repo.getMarkerFileName() + ".tmp");
meteorcloudy marked this conversation as resolved.
Show resolved Hide resolved
FileSystemUtils.moveFile(markerUnderExternal, tMarker);
// 3. Move the external repo to vendor dir. It's fine if this step fails or is interrupted, because the marker
// file under external is gone anyway.
FileSystemUtils.moveTreesBelow(repoUnderExternal, repoUnderVendor);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this behave if a repo symlinks files from another repo and one is vendored while the other is not? It looks like it may be necessary to follow relative symlinks but not absolute symlinks.

Copy link
Member Author

@meteorcloudy meteorcloudy Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The moveTreesBelow doesn't follow any symlinks. Judging from the code here, it's actually impossible to create relative symlink with the ctx.symlink API.

I tested with

ctx.symlink("/tmp/foo", "path_abs")
ctx.symlink("data", "path_rel")
ctx.symlink(ctx.path(Label("@bar//:data")), "path_bar")
ctx.symlink("../_main~ext~bar~/data", "path_bar_2")

and it resulted

path_abs@ -> /tmp/foo
path_bar@ -> /private/var/tmp/_bazel_pcloudy/d278f827a729facdbfb1ff0fc0002042/external/_main~ext~bar/data
path_bar_2@ -> /private/var/tmp/_bazel_pcloudy/d278f827a729facdbfb1ff0fc0002042/external/_main~ext~bar~/data
path_rel@ -> /private/var/tmp/_bazel_pcloudy/d278f827a729facdbfb1ff0fc0002042/external/_main~ext~foo/data

in both external and vendor dir.

This is fine if only foo is vendored, since eventually <output_base>/external/_main~ext~bar would exist and point to the right location. However, I noticed there is problem if output base is changed after vendoring.

Copy link
Collaborator

@fmeum fmeum Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is the current behavior, wouldn't we have to change it so that symlinks in vendored repos do not contain absolute paths? I think there was another issue about this filed recently.

Copy link
Member Author

@meteorcloudy meteorcloudy Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To deal with potential output base, maybe we could

  1. create a symlink pointing to the external repo root under the vendor dir
  2. Rewrite all symlinks pointing some path under external repo root to a relative path to the symlink created in 1.

I have an experimental implementation in meteorcloudy@bf0ec69, which results

$ ll vendor_src/_bazel-external
lrwxr-xr-x  1 pcloudy  primarygroup  73 Jun 10 15:17 vendor_src/_bazel-external@ -> /private/var/tmp/_bazel_pcloudy/d278f827a729facdbfb1ff0fc0002042/external
pcloudy@pcloudy-macbookpro2:~/workspace/my_tests/simple_cpp_test (master)
$ ll vendor_src/_main~ext~foo/
total 8
drwxr-xr-x  9 pcloudy  primarygroup  288 Jun 10 15:17 ./
drwxr-xr-x  7 pcloudy  primarygroup  224 Jun 10 15:17 ../
-rwxr-xr-x  1 pcloudy  wheel           0 Jun 10 15:17 BUILD*
-rwxr-xr-x  1 pcloudy  wheel           0 Jun 10 15:17 REPO.bazel*
-rwxr-xr-x  1 pcloudy  wheel          15 Jun 10 15:17 data*
lrwxr-xr-x  1 pcloudy  wheel           8 Jun 10 15:17 path_abs@ -> /tmp/foo
lrwxr-xr-x  1 pcloudy  primarygroup   37 Jun 10 15:17 path_bar@ -> ../_bazel-external/_main~ext~bar/data
lrwxr-xr-x  1 pcloudy  primarygroup   38 Jun 10 15:17 path_bar_2@ -> ../_bazel-external/_main~ext~bar/data2
lrwxr-xr-x  1 pcloudy  primarygroup   37 Jun 10 15:17 path_rel@ -> ../_bazel-external/_main~ext~foo/data

Please let me know what you think, and preferably I'll do it in another PR.
/cc @Wyverald @fmeum

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is the current behavior, wouldn't we have to change it so that symlinks in vendored repos do not contain absolute paths? I think there was another issue about this filed recently.

#22303, probably

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To deal with potential output base, maybe we could

  1. create a symlink pointing to the external repo root under the vendor dir
  2. Rewrite all symlinks pointing some path under external repo root to a relative path to the symlink created in 1.

this is quite clever! but what will version-control systems do with this special symlink? Usually people put bazel-* symlinks in the workspace root in .gitignore, so presumably this new special symlink will also need to be ignored? And the symlink is generated on demand if it's not there, etc.? (I agree that this should be done in a separate PR)

Either way, some sort of symlink rewriting will need to happen, and we'll probably need to do something similar for the true repo cache.

Copy link
Member Author

@meteorcloudy meteorcloudy Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so presumably this new special symlink will also need to be ignored? And the symlink is generated on demand if it's not there, etc.? (I agree that this should be done in a separate PR)

Yes, I also think it should be gitignored since it's machine specific. And we can just always re-create the symlink since it's quite cheap to keep the code simple.

// 4. Rename to temporary marker file after the move is done.
tMarker.renameTo(markerUnderVendor);
// 5. Leave a symlink in external dir.
repoUnderExternal.deleteTree();
meteorcloudy marked this conversation as resolved.
Show resolved Hide resolved
FileSystemUtils.ensureSymbolicLink(repoUnderExternal, repoUnderVendor);
}
}

Expand Down Expand Up @@ -131,20 +142,17 @@ public byte[] readRegistryUrl(URL url, Checksum checksum) throws IOException {
* one under <output_base>/external. This function assumes the marker file under
* <output_base>/external exists and is up-to-date.
*
* @param repo The name of the repository.
* @param externalPath The root directory of the external repositories.
* @param markerUnderVendor The marker file path under vendor dir
* @param markerUnderExternal The marker file path under external dir
* @return true if the repository is up-to-date, false otherwise.
* @throws IOException if an I/O error occurs.
*/
private boolean isRepoUpToDate(RepositoryName repo, Path externalPath) throws IOException {
Path vendorMarkerFile = vendorDirectory.getChild(repo.getMarkerFileName());
if (!vendorMarkerFile.exists()) {
private boolean isRepoUpToDate(Path markerUnderVendor, Path markerUnderExternal) throws IOException {
if (!markerUnderVendor.exists()) {
return false;
}

Path externalMarkerFile = externalPath.getChild(repo.getMarkerFileName());
String vendorMarkerContent = FileSystemUtils.readContent(vendorMarkerFile, UTF_8);
String externalMarkerContent = FileSystemUtils.readContent(externalMarkerFile, UTF_8);
String vendorMarkerContent = FileSystemUtils.readContent(markerUnderVendor, UTF_8);
String externalMarkerContent = FileSystemUtils.readContent(markerUnderExternal, UTF_8);
return Objects.equals(vendorMarkerContent, externalMarkerContent);
}

Expand Down
Loading