Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leverage a multiprocess-safe strategy for MyPy's cache #16290

Merged
merged 8 commits into from
Jul 28, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 18 additions & 14 deletions docs/markdown/Python/python-goals/python-check-goal.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ updatedAt: "2022-02-09T00:27:23.086Z"
Activating MyPy
---------------

To opt-in, add `pants.backend.python.typecheck.mypy` to `backend_packages` in your config file.
To opt-in, add `pants.backend.python.typecheck.mypy` to `backend_packages` in your config file.

```toml pants.toml
[GLOBAL]
Expand All @@ -27,11 +27,11 @@ $ ./pants check ::
```

> 👍 Benefit of Pants: typecheck Python 2-only and Python 3-only code at the same time
>
>
> MyPy determines which Python version to use based on its `python_version` option. If that's undefined, MyPy uses the interpreter the tool is run with. Because you can only use one config file at a time with MyPy, you cannot normally say to use `2.7` for part of your codebase but `3.6` for the rest; you must choose a single version.
>
>
> Instead, Pants will group your targets based on their [interpreter constraints](doc:python-interpreter-compatibility), and run all the Python 2 targets together and all the Python 3 targets together. It will automatically set `python_version` to the minimum compatible interpreter, such as a constraint like `["==2.7.*", ">3.6"]` using `2.7`.
>
>
> To turn this off, you can still set `python_version` in `mypy.ini` or `--python-version`/`--py2` in `--mypy-args`; Pants will respect the value you set.

### Hook up a MyPy config file
Expand Down Expand Up @@ -80,9 +80,9 @@ python_sources(
When you run `./pants check ::`, Pants will skip any files belonging to skipped targets.

> 🚧 MyPy may still try to check the skipped files!
>
> The `skip_mypy` field only tells Pants not to provide the skipped files as direct input to MyPy. But MyPy, by default, will still try to check files that are [dependencies of the direct inputs](https://mypy.readthedocs.io/en/stable/running_mypy.html#following-imports). So if your skipped files are dependencies of unskipped files, they may still be checked.
>
>
> The `skip_mypy` field only tells Pants not to provide the skipped files as direct input to MyPy. But MyPy, by default, will still try to check files that are [dependencies of the direct inputs](https://mypy.readthedocs.io/en/stable/running_mypy.html#following-imports). So if your skipped files are dependencies of unskipped files, they may still be checked.
>
> To change this behavior, use MyPy's [`--follow-imports` option](https://mypy.readthedocs.io/en/stable/command_line.html#cmdoption-mypy-follow-imports), typically by setting it to `silent`. You can do so either by adding it to the [`args` option](https://www.pantsbuild.org/docs/reference-mypy#section-args) in the `[mypy]` section of your Pants config file, or by setting it in [`mypy.ini`](https://mypy.readthedocs.io/en/stable/config_file.html).

### First-party type stubs (`.pyi` files)
Expand All @@ -98,7 +98,7 @@ When writing stubs for third-party libraries, you may need the set up the `[sour
root_patterns = ["mypy-stubs", "src/python"]
```
```python mypy-stubs/colors.pyi
# Because we set `mypy-stubs` as a source root, this file will be
# Because we set `mypy-stubs` as a source root, this file will be
# stripped to be simply `colors.pyi`. MyPy will look at this file for
# imports of the `colors` module.

Expand Down Expand Up @@ -182,7 +182,7 @@ python_source(name="django_settings", source="django_settings.py")
```

> 📘 MyPy Protobuf support
>
>
> Add `mypy_plugin = true` to the `[python-protobuf]` scope. See [Protobuf](doc:protobuf-python) for more information.

### Add a first-party plugin
Expand All @@ -207,7 +207,7 @@ plugins =
python_source(name="plugin", source="change_return_type.py")
```
```python pants-plugins/mypy_plugins/change_return_type.py
"""A contrived plugin that changes the return type of any
"""A contrived plugin that changes the return type of any
function ending in `__overriden_by_plugin` to return None."""

from typing import Callable, Optional, Type
Expand Down Expand Up @@ -236,7 +236,7 @@ Because this is a `python_source` or `python_sources` target, Pants will treat t

### Reports

MyPy can generate [various report files](https://mypy.readthedocs.io/en/stable/command_line.html#report-generation).
MyPy can generate [various report files](https://mypy.readthedocs.io/en/stable/command_line.html#report-generation).

For Pants to properly preserve the reports, instruct MyPy to write to the `reports/` folder by updating its config file or `--mypy-args`. For example, in your pants.toml:

Expand All @@ -250,11 +250,15 @@ Pants will copy all reports into the folder `dist/check/mypy`.
Known limitations
-----------------

### Performance is often slower than normal
### Performance is sometimes slower than normal

Pants does not yet leverage MyPy's caching mechanism and daemon, so a typical run with Pants will likely be slower than using MyPy directly.
Pants 2.14 added support for leveraging MyPy's cache, making subsequent runs of MyPy extremely performant.
The support, however, requires features that were added to MyPy in version `0.700`, and requires that
`python_version` isn't set in MyPy's config or in `[mypy].args`.

If you're using a version of MyPy older than `0.700`, consider upgrading to unlock super-speedy subsequent runs of MyPy.
Additionally consider not providing `python_version` in your config or args.

We are [working to figure out](https://github.com/pantsbuild/pants/issues/10864) how to leverage MyPy's cache in a way that is safe and allows for things like remote execution.

Tip: only run over changed files and their dependees
----------------------------------------------------
Expand Down
87 changes: 69 additions & 18 deletions src/python/pants/backend/python/typecheck/mypy/rules.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@

from __future__ import annotations

import dataclasses
import itertools
from dataclasses import dataclass
from hashlib import sha256
from textwrap import dedent
from typing import Iterable, Optional, Tuple

import packaging
Expand Down Expand Up @@ -36,9 +38,10 @@
from pants.base.build_root import BuildRoot
from pants.core.goals.check import REPORT_DIR, CheckRequest, CheckResult, CheckResults
from pants.core.util_rules.source_files import SourceFiles, SourceFilesRequest
from pants.core.util_rules.system_binaries import CpBinary, MkdirBinary, MvBinary
from pants.engine.collection import Collection
from pants.engine.fs import CreateDigest, Digest, FileContent, MergeDigests, RemovePrefix
from pants.engine.process import FallibleProcessResult
from pants.engine.process import FallibleProcessResult, Process
from pants.engine.rules import Get, MultiGet, collect_rules, rule, rule_helper
from pants.engine.target import CoarsenedTargets, FieldSet, Target
from pants.engine.unions import UnionRule
Expand Down Expand Up @@ -89,7 +92,7 @@ async def _generate_argv(
file_list_path: str,
python_version: Optional[str],
) -> Tuple[str, ...]:
args = [f"--python-executable={venv_python}", *mypy.args]
args = [pex.pex.argv0, f"--python-executable={venv_python}", *mypy.args]
if mypy.config:
args.append(f"--config-file={mypy.config}")
if python_version:
Expand All @@ -98,8 +101,12 @@ async def _generate_argv(
mypy_pex_info = await Get(PexResolveInfo, VenvPex, pex)
mypy_info = mypy_pex_info.find("mypy")
assert mypy_info is not None
if mypy_info.version > packaging.version.Version("0.700"):
if mypy_info.version > packaging.version.Version("0.700") and python_version is not None:
# Skip mtime checks because we don't propogate mtime when materialzing the sandbox, so the
# mtime checks will always fail otherwise.
args.append("--skip-cache-mtime-check")
# See "__run_wrapper.sh" below for explanation
args.append("--sqlite-cache") # Added in v 0.660
args.extend(("--cache-dir", cache_dir))
else:
# Don't bother caching
Expand Down Expand Up @@ -135,6 +142,9 @@ async def mypy_typecheck_partition(
build_root: BuildRoot,
mypy: MyPy,
python_setup: PythonSetup,
mkdir: MkdirBinary,
cp: CpBinary,
mv: MvBinary,
) -> CheckResult:
# MyPy requires 3.5+ to run, but uses the typed-ast library to work with 2.7, 3.4, 3.5, 3.6,
# and 3.7. However, typed-ast does not understand 3.8+, so instead we must run MyPy with
Expand Down Expand Up @@ -227,6 +237,56 @@ async def mypy_typecheck_partition(
requirements_venv_pex_request, file_list_digest_request
)

py_version = config_file.python_version_to_autoset(
partition.interpreter_constraints, python_setup.interpreter_versions_universe
)
named_cache_dir = f".cache/mypy_cache/{sha256(build_root.path.encode()).hexdigest()}"
run_cache_dir = ".tmp_cache/mypy_cache"
argv = await _generate_argv(
mypy,
pex=mypy_pex,
venv_python=requirements_venv_pex.python.argv0,
cache_dir=run_cache_dir,
file_list_path=file_list_path,
python_version=py_version,
)

script_runner_digest = await Get(
Digest,
CreateDigest(
[
FileContent(
"__mypy_runner.sh",
dedent(
f"""\
# We want to leverage the MyPy cache for fast incremental runs of MyPy.
# Pants exposes "append_only_caches" we can leverage, but with the caveat
# that it requires either only appending files, or multiprocess-safe access.
#
# MyPy guaruntees neither, but there's workarounds!
thejcannon marked this conversation as resolved.
Show resolved Hide resolved
#
# By default, MyPy uses 2 cache files persource file, which introduces a
# whole slew of race conditions. To workaround this we enable MyPy's SQLite cache,
# which enables us to run MyPy on a temporary copy of the cache, then mv
# the modified cache back to the real location. This is multiprocess-safe
# as mv on the same filesystem is an atomic "rename", and any processes
# copying the "old" file will still have valid file descriptors for the
# "old" file.

{mkdir.path} -p {run_cache_dir}/{py_version} 2>&1 > /dev/null
{cp.path} {named_cache_dir}/{py_version}/cache.db {run_cache_dir}/{py_version}/cache.db 2>&1 > /dev/null || true
thejcannon marked this conversation as resolved.
Show resolved Hide resolved
{' '.join(argv)}
EXIT_CODE=$?
{mv.path} {run_cache_dir}/{py_version}/cache.db {named_cache_dir}/{py_version}/cache.db 2>&1 > /dev/null || true
exit $EXIT_CODE
"""
).encode(),
is_executable=True,
thejcannon marked this conversation as resolved.
Show resolved Hide resolved
)
]
),
)

merged_input_files = await Get(
Digest,
MergeDigests(
Expand All @@ -236,6 +296,7 @@ async def mypy_typecheck_partition(
closure_sources.source_files.snapshot.digest,
requirements_venv_pex.digest,
config_file.digest,
script_runner_digest,
]
),
)
Expand All @@ -248,30 +309,20 @@ async def mypy_typecheck_partition(
"MYPYPATH": ":".join(all_used_source_roots),
}

cache_dir = f".cache/mypy_cache/{sha256(build_root.path.encode()).hexdigest()}"

result = await Get(
FallibleProcessResult,
process = await Get(
Process,
VenvPexProcess(
mypy_pex,
argv=await _generate_argv(
mypy,
pex=mypy_pex,
venv_python=requirements_venv_pex.python.argv0,
cache_dir=cache_dir,
file_list_path=file_list_path,
python_version=config_file.python_version_to_autoset(
partition.interpreter_constraints, python_setup.interpreter_versions_universe
),
),
input_digest=merged_input_files,
extra_env=env,
output_directories=(REPORT_DIR,),
description=f"Run MyPy on {pluralize(len(python_files), 'file')}.",
level=LogLevel.DEBUG,
append_only_caches={"mypy_cache": cache_dir},
append_only_caches={"mypy_cache": named_cache_dir},
),
)
process = dataclasses.replace(process, argv=("__mypy_runner.sh",))
result = await Get(FallibleProcessResult, Process, process)
report = await Get(Digest, RemovePrefix(result.output_digest, REPORT_DIR))
return CheckResult.from_fallible_process_result(
result,
Expand Down
8 changes: 5 additions & 3 deletions src/python/pants/backend/python/typecheck/mypy/subsystem.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,9 +200,11 @@ def check_and_warn_if_python_version_configured(self, config: FileContent | None
({doc_url('python-interpreter-compatibility')}). Instead, it will
use what you set.

(Automatically setting the option allows Pants to partition your targets by their
constraints, so that, for example, you can run MyPy on Python 2-only code and
Python 3-only code at the same time. This feature may no longer work.)
(Allowing Pants to automatically set the option allows Pants to partition your
targets by their constraints, so that, for example, you can run MyPy on
Python 2-only code and Python 3-only code at the same time. It also allows Pants
to leverage MyPy's cache, making subsequent runs of MyPy very fast.
In the future, this feature may no longer work.)
"""
)
)
Expand Down
24 changes: 24 additions & 0 deletions src/python/pants/core/util_rules/system_binaries.py
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,14 @@ class MkdirBinary(BinaryPath):
pass


class CpBinary(BinaryPath):
pass


class MvBinary(BinaryPath):
pass


class ChmodBinary(BinaryPath):
pass

Expand Down Expand Up @@ -687,6 +695,22 @@ async def find_mkdir() -> MkdirBinary:
return MkdirBinary(first_path.path, first_path.fingerprint)


@rule(desc="Finding the `cp` binary", level=LogLevel.DEBUG)
thejcannon marked this conversation as resolved.
Show resolved Hide resolved
async def find_cp() -> CpBinary:
request = BinaryPathRequest(binary_name="cp", search_path=SEARCH_PATHS)
paths = await Get(BinaryPaths, BinaryPathRequest, request)
first_path = paths.first_path_or_raise(request, rationale="copy files")
return CpBinary(first_path.path, first_path.fingerprint)


@rule(desc="Finding the `mv` binary", level=LogLevel.DEBUG)
async def find_mv() -> MvBinary:
request = BinaryPathRequest(binary_name="mv", search_path=SEARCH_PATHS)
paths = await Get(BinaryPaths, BinaryPathRequest, request)
first_path = paths.first_path_or_raise(request, rationale="move files")
return MvBinary(first_path.path, first_path.fingerprint)


@rule(desc="Finding the `chmod` binary", level=LogLevel.DEBUG)
async def find_chmod() -> ChmodBinary:
request = BinaryPathRequest(binary_name="chmod", search_path=SEARCH_PATHS)
Expand Down