Skip to content

Commit

Permalink
Add the ability to emit *.runfiles_manifest as JSON
Browse files Browse the repository at this point in the history
The *.runfiles_manifest files currently use a pretty simple format: two
pathnames separated by a space character. This can be problematic at
times. On Windows, usernames may contain spaces, meaning home
directories with spaces are not uncommon. If you look at the macOS
ecosystem, it's also not uncommon to have applications or packages
containing files with spaces in their names. The goal of this change is
to start addressing this issue, by moving *.runfiles_manifest to newline
delimited JSON.

Runfiles libraries for each programming language will need to be
adjusted to support this new format. Because not all runfiles libraries
are bundled/versioned together with Bazel, those libraries will need to
seamlessly support both formats. Tuples are encoded as lists, JSON
entries will always start with an opening square bracket. Because paths
are generally prefixed with a workspace name, it's extremely unlikely
that this causes any conflict. Runfiles libraries can thus distinguish
both formats as follows:

    if (line[0] == '[') {
      // Parse as JSON.
    } else {
      // Parse as legacy format.
    }

To prevent any breakage, we place this feature behind a flag named
--incompatible_json_source_manifests. Even if this flag is disabled, we
emit JSON entries for paths that are not representable in the legacy
format. This should at least make it possible to build targets and
instantiate their runfiles directories with build-runfiles, regardless
of whether the manifest can be parsed at runtime.

Before considering JSON, I experimented with CSV instead. Though easier
to parse, RFC 4180 requires the use of carriage return newlines. Because
both Bazel and runfiles libraries tend to open runfiles manifests in
text mode, this turned out to be hard to get right.

Given the fact that Bazel does not permit non-printable characters and
backslashes in pathnames, parsing the resulting JSON turns out to be
remarkably easy. If it's not realistic to let a given runfiles library
depend on a third-party library to do the JSON parsing, it's most likely
acceptable to handroll a simplified parser.

This PR already contains updates to build-runfiles, the Bash runfiles
library and the Python runfiles library to support the new format.
Updates to the C++ and Java runfiles libraries remain to be done.

Issue: #4327
  • Loading branch information
EdSchouten committed Jan 31, 2022
1 parent 464bac3 commit 2ec1149
Show file tree
Hide file tree
Showing 13 changed files with 236 additions and 38 deletions.
1 change: 1 addition & 0 deletions src/main/java/com/google/devtools/build/lib/analysis/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -446,6 +446,7 @@ java_library(
"//third_party:auto_value",
"//third_party:caffeine",
"//third_party:flogger",
"//third_party:gson",
"//third_party:guava",
"//third_party:jsr305",
"//third_party/protobuf:protobuf_java",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -348,7 +348,8 @@ private static Artifact createRunfilesAction(
context.getActionOwner(),
inputManifest,
runfiles,
context.getConfiguration().remotableSourceManifestActions()));
context.getConfiguration().remotableSourceManifestActions(),
context.getConfiguration().jsonSourceManifests()));

if (!createSymlinks) {
// Just return the manifest if that's all the build calls for.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import static java.nio.charset.StandardCharsets.ISO_8859_1;

import com.google.common.annotations.VisibleForTesting;
import com.google.common.base.CharMatcher;
import com.google.devtools.build.lib.actions.ActionExecutionContext;
import com.google.devtools.build.lib.actions.ActionKeyContext;
import com.google.devtools.build.lib.actions.ActionOwner;
Expand All @@ -28,6 +29,7 @@
import com.google.devtools.build.lib.events.EventHandler;
import com.google.devtools.build.lib.util.Fingerprint;
import com.google.devtools.build.lib.vfs.PathFragment;
import com.google.gson.stream.JsonWriter;
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.OutputStream;
Expand Down Expand Up @@ -71,7 +73,7 @@ interface ManifestWriter {
* @param symlink (optional) symlink that resolves the above path
*/
void writeEntry(Writer manifestWriter, PathFragment rootRelativePath,
@Nullable Artifact symlink) throws IOException;
@Nullable Artifact symlink, boolean useJson) throws IOException;

/**
* Fulfills {@link com.google.devtools.build.lib.actions.AbstractAction#getMnemonic()}
Expand Down Expand Up @@ -101,6 +103,7 @@ void writeEntry(Writer manifestWriter, PathFragment rootRelativePath,
private final Runfiles runfiles;

private final boolean remotableSourceManifestActions;
private final boolean jsonSourceManifests;

/**
* Creates a new AbstractSourceManifestAction instance using latin1 encoding to write the manifest
Expand All @@ -114,7 +117,13 @@ void writeEntry(Writer manifestWriter, PathFragment rootRelativePath,
@VisibleForTesting
SourceManifestAction(
ManifestWriter manifestWriter, ActionOwner owner, Artifact primaryOutput, Runfiles runfiles) {
this(manifestWriter, owner, primaryOutput, runfiles, /*remotableSourceManifestActions=*/ false);
this(
manifestWriter,
owner,
primaryOutput,
runfiles,
/*remotableSourceManifestActions=*/ false,
/*jsonSourceManifests=*/ false);
}

/**
Expand All @@ -131,11 +140,13 @@ public SourceManifestAction(
ActionOwner owner,
Artifact primaryOutput,
Runfiles runfiles,
boolean remotableSourceManifestActions) {
boolean remotableSourceManifestActions,
boolean jsonSourceManifests) {
super(owner, NestedSetBuilder.emptySet(Order.STABLE_ORDER), primaryOutput, false);
this.manifestWriter = manifestWriter;
this.runfiles = runfiles;
this.remotableSourceManifestActions = remotableSourceManifestActions;
this.jsonSourceManifests = jsonSourceManifests;
}

@VisibleForTesting
Expand Down Expand Up @@ -170,7 +181,7 @@ private void writeFile(OutputStream out, Map<PathFragment, Artifact> output) thr
Collections.sort(sortedManifest, ENTRY_COMPARATOR);

for (Map.Entry<PathFragment, Artifact> line : sortedManifest) {
manifestWriter.writeEntry(manifestFile, line.getKey(), line.getValue());
manifestWriter.writeEntry(manifestFile, line.getKey(), line.getValue(), jsonSourceManifests);
}

manifestFile.flush();
Expand All @@ -193,9 +204,13 @@ protected void computeKey(
Fingerprint fp) {
fp.addString(GUID);
fp.addBoolean(remotableSourceManifestActions);
fp.addBoolean(jsonSourceManifests);
runfiles.fingerprint(fp);
}

private static final CharMatcher LEGACY_MANIFEST_UNSAFE_CHAR_MATCHER =
CharMatcher.anyOf(" \n").precomputed();

/**
* Supported manifest writing strategies.
*/
Expand All @@ -212,13 +227,28 @@ public enum ManifestType implements ManifestWriter {
*/
SOURCE_SYMLINKS {
@Override
public void writeEntry(Writer manifestWriter, PathFragment rootRelativePath, Artifact symlink)
throws IOException {
manifestWriter.append(rootRelativePath.getPathString());
// This trailing whitespace is REQUIRED to process the single entry line correctly.
manifestWriter.append(' ');
if (symlink != null) {
manifestWriter.append(symlink.getPath().getPathString());
public void writeEntry(Writer manifestWriter, PathFragment rootRelativePath, Artifact symlink,
boolean useJson) throws IOException {
String source = rootRelativePath.getPathString();
String target = symlink == null ? "" : symlink.getPath().getPathString();

// Either emit the entry as newline delimited JSON or the legacy space separated format.
// Entries that cannot be represented safely in the legacy format are always emitted in JSON
// format, so that we're at least capable of building the associated target and
// instantiating its runfiles directory.
if (useJson || source.startsWith("[") ||
LEGACY_MANIFEST_UNSAFE_CHAR_MATCHER.matchesAnyOf(source) ||
LEGACY_MANIFEST_UNSAFE_CHAR_MATCHER.matchesAnyOf(target)) {
JsonWriter jsonWriter = new JsonWriter(manifestWriter);
jsonWriter.beginArray();
jsonWriter.value(source);
jsonWriter.value(target);
jsonWriter.endArray();
} else {
manifestWriter.append(source);
// This trailing whitespace is REQUIRED to process the single entry line correctly.
manifestWriter.append(' ');
manifestWriter.append(target);
}
manifestWriter.append('\n');
}
Expand Down Expand Up @@ -251,8 +281,8 @@ public boolean isRemotable() {
*/
SOURCES_ONLY {
@Override
public void writeEntry(Writer manifestWriter, PathFragment rootRelativePath, Artifact symlink)
throws IOException {
public void writeEntry(Writer manifestWriter, PathFragment rootRelativePath, Artifact symlink,
boolean useJson) throws IOException {
manifestWriter.append(rootRelativePath.getPathString());
manifestWriter.append('\n');
manifestWriter.flush();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -799,6 +799,10 @@ public boolean remotableSourceManifestActions() {
return options.remotableSourceManifestActions;
}

public boolean jsonSourceManifests() {
return options.jsonSourceManifests;
}

/**
* Returns a modified copy of {@code executionInfo} if any {@code executionInfoModifiers} apply to
* the given {@code mnemonic}. Otherwise returns {@code executionInfo} unchanged.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -776,6 +776,19 @@ public OutputPathsConverter() {
help = "Whether to make source manifest actions remotable")
public boolean remotableSourceManifestActions;

@Option(
name = "incompatible_json_source_manifests",
defaultValue = "false",
documentationCategory = OptionDocumentationCategory.OUTPUT_SELECTION,
effectTags = {OptionEffectTag.AFFECTS_OUTPUTS},
metadataTags = {OptionMetadataTag.INCOMPATIBLE_CHANGE},
help =
"If true, write source manifests in newline delimited JSON format. Regardless of whether "
+ "this option is set, entries for pathnames containing spaces or leading square "
+ "brackets are encoded in the JSON format, as the resulting output would be "
+ "ambiguous or unrepresentable otherwise.")
public boolean jsonSourceManifests;

@Option(
name = "flag_alias",
converter = Converters.FlagAliasConverter.class,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -456,7 +456,8 @@ public ConfiguredTarget create(RuleContext ruleContext)
// This matches the code below in collectDefaultRunfiles.
.addTransitiveArtifactsWrappedInStableOrder(common.getRuntimeClasspath())
.build(),
true));
true,
false));
filesBuilder.add(runtimeClasspathArtifact);

// Pass the artifact through an environment variable in the coverage environment so it
Expand Down
82 changes: 66 additions & 16 deletions src/main/tools/build-runfiles.cc
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,39 @@ class RunfilesCreator {
}
}

void ReadJSONWhitespace(const char **bufptr) {
*bufptr += strspn(*bufptr, " \n\r\t");
}

void ReadJSONSingleChar(const char **bufptr, char ch, int lineno,
const char *buf) {
if (**bufptr != ch) {
DIE("expected '%c' at line %d: '%s'\n", ch, lineno, buf);
}
++*bufptr;
}

std::string ReadJSONString(const char **bufptr, int lineno, const char *buf) {
ReadJSONWhitespace(bufptr);
ReadJSONSingleChar(bufptr, '\"', lineno, buf);
std::string unquoted_value;
while (**bufptr != '\"') {
if (**bufptr == '\0') {
DIE("missing string terminator at line %d: '%s'\n", lineno, buf);
} else if (**bufptr == '\\') {
++*bufptr;
if (strchr("\"\\/", **bufptr) == nullptr) {
DIE("unsupported escape sequence at line %d: '%s'\n", lineno, buf);
}
}
unquoted_value += **bufptr;
++*bufptr;
}
ReadJSONSingleChar(bufptr, '\"', lineno, buf);
ReadJSONWhitespace(bufptr);
return unquoted_value;
}

void ReadManifest(const std::string &manifest_file, bool allow_relative,
bool use_metadata) {
FILE *outfile = fopen(temp_filename_.c_str(), "w");
Expand Down Expand Up @@ -143,23 +176,40 @@ class RunfilesCreator {
// dependency checking.
if (use_metadata && lineno % 2 == 0) continue;

int n = strlen(buf)-1;
if (!n || buf[n] != '\n') {
DIE("missing terminator at line %d: '%s'\n", lineno, buf);
}
buf[n] = '\0';
if (buf[0] == '/') {
DIE("paths must not be absolute: line %d: '%s'\n", lineno, buf);
}
const char *s = strchr(buf, ' ');
if (!s) {
DIE("missing field delimiter at line %d: '%s'\n", lineno, buf);
} else if (strchr(s+1, ' ')) {
DIE("link or target filename contains space on line %d: '%s'\n",
lineno, buf);
std::string link, target;
if (buf[0] == '[') {
// Entry in JSON format.
const char *bufptr = buf;
ReadJSONWhitespace(&bufptr);
ReadJSONSingleChar(&bufptr, '[', lineno, buf);
link = ReadJSONString(&bufptr, lineno, buf);
ReadJSONSingleChar(&bufptr, ',', lineno, buf);
target = ReadJSONString(&bufptr, lineno, buf);
ReadJSONSingleChar(&bufptr, ']', lineno, buf);
ReadJSONWhitespace(&bufptr);
if (*bufptr != '\0') {
DIE("trailing garbage at line %d: '%s'\n", lineno, buf);
}
} else {
// Entry in legacy format: two fields separated with a space character.
int n = strlen(buf)-1;
if (!n || buf[n] != '\n') {
DIE("missing terminator at line %d: '%s'\n", lineno, buf);
}
buf[n] = '\0';
if (buf[0] == '/') {
DIE("paths must not be absolute: line %d: '%s'\n", lineno, buf);
}
const char *s = strchr(buf, ' ');
if (!s) {
DIE("missing field delimiter at line %d: '%s'\n", lineno, buf);
} else if (strchr(s+1, ' ')) {
DIE("link or target filename contains space on line %d: '%s'\n",
lineno, buf);
}
link = std::string(buf, s-buf);
target = s+1;
}
std::string link(buf, s-buf);
const char *target = s+1;
if (!allow_relative && target[0] != '\0' && target[0] != '/'
&& target[1] != ':') { // Match Windows paths, e.g. C:\foo or C:/foo.
DIE("expected absolute path at line %d: '%s'\n", lineno, buf);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ public MockManifestWriter() {

@Override
public void writeEntry(Writer manifestWriter, PathFragment rootRelativePath,
@Nullable Artifact symlink) throws IOException {
@Nullable Artifact symlink, boolean useJson) throws IOException {
assertWithMessage("Expected manifest input to be exhausted").that(expectedSequence)
.isNotEmpty();
Map.Entry<PathFragment, Artifact> expectedEntry = expectedSequence.remove(0);
Expand Down
70 changes: 70 additions & 0 deletions src/test/shell/bazel/runfiles_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -180,4 +180,74 @@ EOF
[[ -f bazel-bin/world.runfiles/MANIFEST ]] || fail "expected output manifest world to exist"
}

# The *.runfiles_manifest file traditionally stores the source and
# target paths separated with a space. This means pathnames containing
# spaces could not be encoded.
#
# This is being addressed by switching to newline delimited JSON.
# Because not all runfiles libraries support this yet, we only use this
# format for entries that cannot be represented otherwise. Only if
# --incompatible_json_source_manifests is set, we enable it for all
# entries.
function test_runfiles_json_source_manifests() {
touch \
"with space" \
"with,comma" \
"with\"quotes" \
"with all,special\"characters"
create_workspace_with_default_repos WORKSPACE foo

cat > hello.sh <<'EOF'
#!/bin/bash
exit 0
EOF
chmod 755 hello.sh
cat > BUILD <<'EOF'
sh_binary(
name = "hello",
srcs = ["hello.sh"],
data = [
"with space",
"with,comma",
"with\"quotes",
"with all,special\"characters",
],
)
EOF

bazel build --spawn_strategy=local //:hello \
|| fail "Building //:hello failed"

[[ -f "bazel-bin/hello.runfiles/foo/with space" ]] || fail "expected runfile \"with space\" to exist"
[[ -f "bazel-bin/hello.runfiles/foo/with,comma" ]] || fail "expected runfile \"with,comma\" to exist"
[[ -f "bazel-bin/hello.runfiles/foo/with\"quotes" ]] || fail "expected runfile \"with,quotes\" to exist"
[[ -f "bazel-bin/hello.runfiles/foo/with all,special\"characters" ]] || fail "expected runfile \"with all,special\"characters\" to exist"
[[ -f "bazel-bin/hello.runfiles/MANIFEST" ]] || fail "expected output manifest to exist"

bindir="$(cd bazel-bin && pwd -P)"
srcdir="$(pwd -P)"
cat > expected_runfiles_manifest << EOF
foo/hello ${bindir}/hello
foo/hello.sh ${srcdir}/hello.sh
["foo/with all,special\\"characters","${srcdir}/with all,special\\"characters"]
["foo/with space","${srcdir}/with space"]
foo/with"quotes ${srcdir}/with"quotes
foo/with,comma ${srcdir}/with,comma
EOF
diff -w expected_runfiles_manifest bazel-bin/hello.runfiles_manifest || fail "runfiles manifest does not match expected contents"

bazel build --spawn_strategy=local --incompatible_json_source_manifests //:hello \
|| fail "Building //:hello failed"

cat > expected_runfiles_manifest << EOF
["foo/hello","${bindir}/hello"]
["foo/hello.sh","${srcdir}/hello.sh"]
["foo/with all,special\\"characters","${srcdir}/with all,special\\"characters"]
["foo/with space","${srcdir}/with space"]
["foo/with\\"quotes","${srcdir}/with\\"quotes"]
["foo/with,comma","${srcdir}/with,comma"]
EOF
diff -w expected_runfiles_manifest bazel-bin/hello.runfiles_manifest || fail "runfiles manifest does not match expected contents"
}

run_suite "runfiles tests"
Loading

0 comments on commit 2ec1149

Please sign in to comment.