Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use upstream coveragepy's new lcov support #14677

Closed

Conversation

TLATER
Copy link

@TLATER TLATER commented Jan 31, 2022

As explained in 8c3ea99, coveragepy recently added lcov support. This patch set intends to tackle #10660 (and bazelbuild/rules_python#43), and make python coverage fully usable out-of-the box. It builds on work by @ulfjack in #10433, but now with actual support from upstream.

This also incidentally fixes a bug where bazel test would no longer work if the PYTHON_COVERAGE environment variable was set (can report/fix separately if desired), as well as #14436 (this should perhaps be raised in a separate PR, but made much more sense after refactoring here).

See the commit messages for more detailed comments.

This is a draft, but I would like some review of the overall approach, and whether this is workable in this state. I'm raising this as a draft for these reasons:

  • A commit to allow coveragepy resolution by label is included. It's not implemented correctly at the moment (see commit message), and I'd like feedback on how this could be achieved better.
  • The comments regarding os.execv usage seem to have a strange history (see commit message), and I would like to figure out what happened here before continuing the legacy.
  • More consideration for other coverage tools might be more important than I think.
  • Testing for this is slim because it depends on coveragepy and I'm unsure how/if I should integrate that into CI.
  • The python coverage documentation should be updated.

@c-mita and @lfpino may also be interested.

@TLATER
Copy link
Author

TLATER commented Jan 31, 2022

Hm, interesting, I'll have to look into those test failures.

@TLATER TLATER force-pushed the tlater/coveragepy-lcov-support branch from d923f21 to da31bce Compare February 1, 2022 12:59
@TLATER
Copy link
Author

TLATER commented Feb 1, 2022

The failing test is //src/test/shell/bazel:python_version_test, the culprit is an accidental f-string. New habits die hard, it seems.

@gregestren gregestren added the team-Rules-Python Native rules for Python label Feb 1, 2022
@comius
Copy link
Contributor

comius commented Feb 2, 2022

cc @rickeylev for Python

@comius comius assigned comius and unassigned comius Feb 2, 2022
TLATER and others added 2 commits February 2, 2022 16:27
[Coveragepy recently gained support for lcov
output][nedbat/coveragepy#1289], which allows implementing full
support for python coverage without relying on a downstream fork of
that project

Coveragepy actually must be invoked twice; One generating a
`.coverage` database file, the other exporting the data. This means
that it is incompatible with the old implementation.

The fork of coveragepy previously intended for use with bazel
circumvented this by just changing how coveragepy works, and never
outputting that database - just outputting the lcov directly
instead. If we'd like to use upstream coveragepy, this is of course
not possible.

The stub_template seems to be written with the idea of supporting
other coverage tooling in mind, however it still hard-codes arguments
specific to coveragepy. Instead, we think it makes sense to properly
support one of them for now, and to rethink a more generic interface
later - it will probably take specific scripting for each
implementation of coverage in python anyway.

As such, this patch rewrites the python stub template to fully support
upstream coveragepy as a coverage tool, and reworks some of the logic
around invoking python to do so more cleanly.

Additional notes:

  - Python coverage will only work with Python 3.7+ with upstream
    coveragepy, since the first release with lcov support does not
    support earlier Python versions - this is unfortunate, but there
    is not much we can do downstream short of forking to resolve
    that. The stub template itself should still work with Python 2.4+.

  - Comments in the code claim to use `os.execv` for performance
    reasons. There may be a small overhead to `subprocess.call`, but
    it shouldn't be too impactful, especially considering the overhead
    in logic (written in Python) this involves - if this is indeed for
    performance reasons, this is probably a somewhat premature
    optimization.

    A colleauge helped dig through some history, finding
    3bed4af as the source of this -
    if that commit is to believed, this is actually to resolve issues
    with signal handling, however that seems odd as well, since this
    calls arbitrary python applications, which in turn may use
    subprocesses again as well, and therefore break what that commit
    seems to attempt to fix.

    It's completely opaque to me why we put so much effort into trying
    to ensure we use `os.execv`. I've replicated the behavior and
    comments assuming it was correct previously, but the patch
    probably shouldn't land as-is - the comment explaining the use of
    `os.execv` is most likely misleading.

---

[nedbat/coveragepy#1289]: nedbat/coveragepy#1289

Co-authored-by: Bradley Burns <bradley.burns@codethink.co.uk>
This resolves bazelbuild#14436 by adding a new environment variable that will
perform the coverage label resolution inside the python_stub_template
directly, which resolves the python coverage tool by label rather than
path.

Currently this resolves the path in the runfiles directory by guessing
the path the label should resolve to - this of course does not work in
general, even just defining an alias breaks it. Since labels appear to
only be resolved in the analysis phase, there does not seem to be an
easy way around this, however.

This is a draft, showing the behavior I would like - suggestions on
how to correctly implement this would be appreciated.

Co-authored-by: Bradley Burns <bradley.burns@codethink.co.uk>
@TLATER TLATER force-pushed the tlater/coveragepy-lcov-support branch from da31bce to f4f1c23 Compare February 2, 2022 16:28
@comius comius removed their assignment Mar 23, 2022
@comius comius requested a review from c-mita April 6, 2022 06:23
@adam-azarchs
Copy link
Contributor

Not to distract from what this PR is trying to accomplish, but it's only part of the puzzle when it comes to hermetic python toolchains. In order to support those, we'd need to also add an attribute to py_runtime to supply a target for coveragepy, and I guess plumb that through to PYTHON_COVERAGE. That sort of thing is particularly important in a remote build scenario where the developer may have no control over whether there exists a system-wide installation of coverage on the remote builder.

@TLATER
Copy link
Author

TLATER commented Apr 30, 2022

That sounds like a good idea. We were using rules_python to provide it instead, but it also currently lacks toolchain support (and did things rather more impurely than I'm comfortable with).

@rickeylev
Copy link
Contributor

we'd need to also add an attribute to py_runtime to supply a target for coveragepy, and I guess plumb that through to PYTHON_COVERAGE.

I'm not too familiar with how coverage works, so forgive me: why does the py_runtime rule need a coverage attribute? That would imply that a particular runtime and coverage are intrinsically linked (e.g. a particular coverage implementation only works with a particular runtime, and vice versa). IIUC, coverage isn't so strictly tied to a runtime, so it seems more like something that would go on the toolchain or rules (which wire together the runtime with other parts).

@adam-azarchs
Copy link
Contributor

py_runtime is how you set up the python toolchain. A python toolchain setup looks something like

py_runtime(
    name = "python3_runtime",
    files = [":python"],
    interpreter = ":python_interpreter_exe",
    python_version = "PY3",
)

py_runtime_pair(
    name = "python_runtime_pair",
    py3_runtime = ":python3_runtime",
)

toolchain(
    name = "python_toolchain",
    exec_compatible_with = ["@platforms//os:linux"],
    target_compatible_with = ["@platforms//os:linux"],
    toolchain = ":python_runtime_pair",
    toolchain_type = "@bazel_tools//tools/python:toolchain_type",
    visibility = ["//visibility:public"],
)

While one could just put coverage into the files attribute of py_runtime, making it a separate attribute would have a few advantages:

  1. No need for the user to muck around with environment variables (which always seems a little but "un-bazely" to me).
  2. Only depending on the coverage library when running with code coverage.
  3. Being able to supply different coverage libraries to different runtimes, e.g. probably an older version for python 2, if anyone is still using that.

@rickeylev
Copy link
Contributor

Ahhhh, huh. Thanks. I didn't realize py_runtime was, in effect, the toolchain (we don't have toolchains enabled for Python at Google, so this isn't an area I'm very familiar with). Well, that's kind of unfortunate naming, but yeah, putting the coverage settings on the toolchain could make sense. TBH, I don't fully understand the pros/cons of a toolchain over an implicit attribute of the rule, but both are pretty transparent to the user, so I'm not sure if it matters much.

@adam-azarchs
Copy link
Contributor

The advantage of a toolchain is that it can be configured. An implicit attribute of the rule has to be a fixed target. In our use case we very specifically cannot use the automatically-configured python toolchain based on the system-wide installation, because that is usually not the right version of python.

@sgowroji sgowroji added the awaiting-user-response Awaiting a response from the author label May 5, 2022
@adam-azarchs
Copy link
Contributor

Adding an attribute to py_runtime doesn't look like it would be that hard to do; I'd probably have already made a PR to do it except it's still in the java side of things, with which I have a lot less experience. But also, as I said earlier, it's kind of orthogonal to this PR which, as far as I understand it, would still be required in order to make things actually work.

copybara-service bot pushed a commit that referenced this pull request Sep 23, 2022
This allows users to specify a target providing the coveragepy tool (and its dependencies).  This is essential for hermetic python builds, where an absolute path will not really work.  It's also superior to other potential methods using environment variables because the runfiles dependency on the coverage tool and its files is only incurred when building with coverage enabled.

This also builds on the work @TLATER began with #14677 to integrate with `coveragepy`'s `lcov` support, with an additional step of at least attempting to convert the absolute paths which `coveragepy` uses in the lcov output into the relative paths which the rest of bazel can actually consume.

This is my first time touching Java code professionally, so I'll admit to mostly cargo-culting those parts, and would welcome any feedback on how to improve things there.  I also would have no objections to someone else taking over this PR to get it over the finish line.  I've tested this out with our own team's internal monorepo, and have successfully generated a full combined coverage report for most of our python and go code.  There's still a bunch of things which don't quite work, in particular when it comes to compiled extension modules or executables run from within python tests, but those will need to be addressed separately, and this is already a giant step forward for our team.

Closes #14436.

Closes #15590.

PiperOrigin-RevId: 476314433
Change-Id: I4be4d10e0af741f4ba1a7b5367c6f7a338a3c43d
aiuto pushed a commit to aiuto/bazel that referenced this pull request Oct 12, 2022
This allows users to specify a target providing the coveragepy tool (and its dependencies).  This is essential for hermetic python builds, where an absolute path will not really work.  It's also superior to other potential methods using environment variables because the runfiles dependency on the coverage tool and its files is only incurred when building with coverage enabled.

This also builds on the work @TLATER began with bazelbuild#14677 to integrate with `coveragepy`'s `lcov` support, with an additional step of at least attempting to convert the absolute paths which `coveragepy` uses in the lcov output into the relative paths which the rest of bazel can actually consume.

This is my first time touching Java code professionally, so I'll admit to mostly cargo-culting those parts, and would welcome any feedback on how to improve things there.  I also would have no objections to someone else taking over this PR to get it over the finish line.  I've tested this out with our own team's internal monorepo, and have successfully generated a full combined coverage report for most of our python and go code.  There's still a bunch of things which don't quite work, in particular when it comes to compiled extension modules or executables run from within python tests, but those will need to be addressed separately, and this is already a giant step forward for our team.

Closes bazelbuild#14436.

Closes bazelbuild#15590.

PiperOrigin-RevId: 476314433
Change-Id: I4be4d10e0af741f4ba1a7b5367c6f7a338a3c43d
@TLATER
Copy link
Author

TLATER commented Nov 10, 2022

Superseded by @adam-azarchs ' #15590, thanks :)

@TLATER TLATER closed this Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-user-response Awaiting a response from the author team-Rules-Python Native rules for Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants