From 8c3ea99c2e65edbf400f739cc5cb3115900ac4eb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tristan=20Dani=C3=ABl=20Maat?= <tm@tlater.net>
Date: Mon, 31 Jan 2022 15:33:10 +0000
Subject: [PATCH] rules/python: Rework built-in coveragepy support

[Coveragepy recently gained support for lcov
output][nedbat/coveragepy#1289], which allows implementing full
support for python coverage without relying on a downstream fork of
that project

Coveragepy actually must be invoked twice; One generating a
`.coverage` database file, the other exporting the data. This means
that it is incompatible with the old implementation.

The fork of coveragepy previously intended for use with bazel
circumvented this by just changing how coveragepy works, and never
outputting that database - just outputting the lcov directly
instead. If we'd like to use upstream coveragepy, this is of course
not possible.

The stub_template seems to be written with the idea of supporting
other coverage tooling in mind, however it still hard-codes arguments
specific to coveragepy. Instead, we think it makes sense to properly
support one of them for now, and to rethink a more generic interface
later - it will probably take specific scripting for each
implementation of coverage in python anyway.

As such, this patch rewrites the python stub template to fully support
upstream coveragepy as a coverage tool, and reworks some of the logic
around invoking python to do so more cleanly.

Additional notes:

  - Python coverage will only work with Python 3.7+ with upstream
    coveragepy, since the first release with lcov support does not
    support earlier Python versions - this is unfortunate, but there
    is not much we can do downstream short of forking to resolve
    that. The stub template itself should still work with Python 2.4+.

  - Comments in the code claim to use `os.execv` for performance
    reasons. There may be a small overhead to `subprocess.call`, but
    it shouldn't be too impactful, especially considering the overhead
    in logic (written in Python) this involves - if this is indeed for
    performance reasons, this is probably a somewhat premature
    optimization.

    A colleauge helped dig through some history, finding
    3bed4af4f27bbd22b5531454095f0eda30bfac9f as the source of this -
    if that commit is to believed, this is actually to resolve issues
    with signal handling, however that seems odd as well, since this
    calls arbitrary python applications, which in turn may use
    subprocesses again as well, and therefore break what that commit
    seems to attempt to fix.

    It's completely opaque to me why we put so much effort into trying
    to ensure we use `os.execv`. I've replicated the behavior and
    comments assuming it was correct previously, but the patch
    probably shouldn't land as-is - the comment explaining the use of
    `os.execv` is most likely misleading.

---

[nedbat/coveragepy#1289]: https://github.com/nedbat/coveragepy/pull/1289

Co-authored-by: Bradley Burns <bradley.burns@codethink.co.uk>
---
 .../rules/python/python_stub_template.txt     | 130 +++++++++++++-----
 1 file changed, 96 insertions(+), 34 deletions(-)

diff --git a/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt b/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt
index c389af01c2df1f..1bc9e295d62c2b 100644
--- a/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt
+++ b/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt
@@ -281,6 +281,71 @@ def Deduplicate(items):
           seen.add(it)
           yield it
 
+def ExecuteFile(python_program, main_filename, args, env, module_space,
+                coverage_tool=None, workspace=None):
+  """Executes the given python file using the various environment settings.
+
+  This will not return, and acts much like os.execv, except is much
+  more restricted, and handles bazel-related edge cases.
+
+  Args:
+    python_program: Path to the python binary to use for execution
+    main_filename: The python file to execute
+    args: Additional args to pass to the python file
+    env: A dict of environment variables to set for the execution
+    module_space: The module space/runfiles tree
+    coverage_tool: The coverage tool to execute with
+    workspace: The workspace to execute in. This is expected to be a
+               directory under the runfiles tree, and will recursively
+               delete the runfiles directory if set.
+  """
+  # We want to use os.execv instead of subprocess.call, which causes
+  # problems with signal passing (making it difficult to kill
+  # bazel). However, these conditions force us to run via
+  # subprocess.call instead:
+  #
+  # - On Windows, os.execv doesn't handle arguments with spaces
+  #   correctly, and it actually starts a subprocess just like
+  #   subprocess.call.
+  # - When running in a workspace (i.e., if we're running from a zip),
+  #   we need to clean up the workspace after the process finishes so
+  #   control must return here.
+  # - If we may need to emit a host config warning after execution, we
+  #   can't execv because we need control to return here. This only
+  #   happens for targets built in the host config.
+  # - For coverage targets, at least coveragepy requires running in
+  #   two invocations, which also requires control to return here.
+  #
+  if not (IsWindows() or workspace or %enable_host_version_warning% or coverage_tool):
+    os.environ.update(env)
+    os.execv(python_program, [python_program, main_filename] + args)
+
+  if coverage_tool is not None:
+    # Coveragepy wants to frst create a .coverage database file, from
+    # which we can then export lcov.
+    subprocess.call(
+      [python_program, coverage_tool, "run", "--append", "--branch", main_filename] + args,
+      env=env,
+      cwd=workspace
+    )
+    output_filename = os.environ.get('COVERAGE_DIR') + '/pylcov.dat'
+    ret_code = subprocess.call(
+      [python_program, coverage_tool, "lcov", "-o", output_filename] + args,
+      env=env,
+      cwd=workspace
+    )
+  else:
+    ret_code = subprocess.call(
+      [python_program, main_filename] + args,
+      env=env,
+      cwd=workspace
+    )
+
+  if workspace:
+    shutil.rmtree(os.path.dirname(module_space), True)
+  MaybeEmitHostVersionWarning(ret_code)
+  sys.exit(ret_code)
+
 def Main():
   args = sys.argv[1:]
 
@@ -332,13 +397,21 @@ def Main():
   if python_program is None:
     raise AssertionError('Could not find python binary: ' + PYTHON_BINARY)
 
-  cov_tool = os.environ.get('PYTHON_COVERAGE')
-  if cov_tool:
-    # Inhibit infinite recursion:
-    del os.environ['PYTHON_COVERAGE']
+  # COVERAGE_DIR is set iff the instrumentation is configured for the
+  # file and coverage is enabled.
+  if os.environ.get('COVERAGE_DIR'):
+    if 'PYTHON_COVERAGE' in os.environ:
+      cov_tool = os.environ.get('PYTHON_COVERAGE')
+    else:
+      raise EnvironmentError(
+        'No python coverage tool set, '
+        'set PYTHON_COVERAGE '
+        'to configure the coverage tool'
+      )
+
     if not os.path.exists(cov_tool):
       raise EnvironmentError('Python coverage tool %s not found.' % cov_tool)
-    args = [python_program, cov_tool, 'run', '-a', '--branch', main_filename] + args
+
     # coverage library expects sys.path[0] to contain the library, and replaces
     # it with the directory of the program it starts. Our actual sys.path[0] is
     # the runfiles directory, which must not be replaced.
@@ -346,40 +419,29 @@ def Main():
     #
     # Update sys.path such that python finds the coverage package. The coverage
     # entry point is coverage.coverage_main, so we need to do twice the dirname.
-    new_env['PYTHONPATH'] = \
-        new_env['PYTHONPATH'] + ':' + os.path.dirname(os.path.dirname(cov_tool))
-    new_env['PYTHON_LCOV_FILE'] = os.environ.get('COVERAGE_DIR') + '/pylcov.dat'
+    new_env['PYTHONPATH'] = (
+      new_env['PYTHONPATH'] + ':' + os.path.dirname(os.path.dirname(cov_tool))
+    )
   else:
-    args = [python_program, main_filename] + args
+    cov_tool = None
 
-  os.environ.update(new_env)
+  new_env.update(os.environ)
+
+  workspace = None
+  if IsRunningFromZip():
+    # If RUN_UNDER_RUNFILES equals 1, it means we need to
+    # change directory to the right runfiles directory.
+    # (So that the data files are accessible)
+    if os.environ.get('RUN_UNDER_RUNFILES') == '1':
+      workspace = os.path.join(module_space, '%workspace_name%')
 
   try:
     sys.stdout.flush()
-    if IsRunningFromZip():
-      # If RUN_UNDER_RUNFILES equals 1, it means we need to
-      # change directory to the right runfiles directory.
-      # (So that the data files are accessible)
-      if os.environ.get('RUN_UNDER_RUNFILES') == '1':
-        os.chdir(os.path.join(module_space, '%workspace_name%'))
-      ret_code = subprocess.call(args)
-      shutil.rmtree(os.path.dirname(module_space), True)
-      MaybeEmitHostVersionWarning(ret_code)
-      sys.exit(ret_code)
-    else:
-      # On Windows, os.execv doesn't handle arguments with spaces correctly,
-      # and it actually starts a subprocess just like subprocess.call.
-      #
-      # If we may need to emit a host config warning after execution, don't
-      # execv because we need control to return here. This only happens for
-      # targets built in the host config, so other targets still get to take
-      # advantage of the performance benefits of execv.
-      if IsWindows() or %enable_host_version_warning%:
-        ret_code = subprocess.call(args)
-        MaybeEmitHostVersionWarning(ret_code)
-        sys.exit(ret_code)
-      else:
-        os.execv(args[0], args)
+    ExecuteFile(
+      python_program, main_filename, args, new_env, module_space,
+      cov_tool, workspace
+    )
+
   except EnvironmentError:
     # This works from Python 2.4 all the way to 3.x.
     e = sys.exc_info()[1]