diff --git a/docs/markdown/Python/python-goals/python-check-goal.md b/docs/markdown/Python/python-goals/python-check-goal.md index acbda4b9af4..7a0bf878a04 100644 --- a/docs/markdown/Python/python-goals/python-check-goal.md +++ b/docs/markdown/Python/python-goals/python-check-goal.md @@ -9,7 +9,7 @@ updatedAt: "2022-02-09T00:27:23.086Z" Activating MyPy --------------- -To opt-in, add `pants.backend.python.typecheck.mypy` to `backend_packages` in your config file. +To opt-in, add `pants.backend.python.typecheck.mypy` to `backend_packages` in your config file. ```toml pants.toml [GLOBAL] @@ -27,11 +27,11 @@ $ ./pants check :: ``` > 👍 Benefit of Pants: typecheck Python 2-only and Python 3-only code at the same time -> +> > MyPy determines which Python version to use based on its `python_version` option. If that's undefined, MyPy uses the interpreter the tool is run with. Because you can only use one config file at a time with MyPy, you cannot normally say to use `2.7` for part of your codebase but `3.6` for the rest; you must choose a single version. -> +> > Instead, Pants will group your targets based on their [interpreter constraints](doc:python-interpreter-compatibility), and run all the Python 2 targets together and all the Python 3 targets together. It will automatically set `python_version` to the minimum compatible interpreter, such as a constraint like `["==2.7.*", ">3.6"]` using `2.7`. -> +> > To turn this off, you can still set `python_version` in `mypy.ini` or `--python-version`/`--py2` in `--mypy-args`; Pants will respect the value you set. ### Hook up a MyPy config file @@ -80,9 +80,9 @@ python_sources( When you run `./pants check ::`, Pants will skip any files belonging to skipped targets. > 🚧 MyPy may still try to check the skipped files! -> -> The `skip_mypy` field only tells Pants not to provide the skipped files as direct input to MyPy. But MyPy, by default, will still try to check files that are [dependencies of the direct inputs](https://mypy.readthedocs.io/en/stable/running_mypy.html#following-imports). So if your skipped files are dependencies of unskipped files, they may still be checked. -> +> +> The `skip_mypy` field only tells Pants not to provide the skipped files as direct input to MyPy. But MyPy, by default, will still try to check files that are [dependencies of the direct inputs](https://mypy.readthedocs.io/en/stable/running_mypy.html#following-imports). So if your skipped files are dependencies of unskipped files, they may still be checked. +> > To change this behavior, use MyPy's [`--follow-imports` option](https://mypy.readthedocs.io/en/stable/command_line.html#cmdoption-mypy-follow-imports), typically by setting it to `silent`. You can do so either by adding it to the [`args` option](https://www.pantsbuild.org/docs/reference-mypy#section-args) in the `[mypy]` section of your Pants config file, or by setting it in [`mypy.ini`](https://mypy.readthedocs.io/en/stable/config_file.html). ### First-party type stubs (`.pyi` files) @@ -98,7 +98,7 @@ When writing stubs for third-party libraries, you may need the set up the `[sour root_patterns = ["mypy-stubs", "src/python"] ``` ```python mypy-stubs/colors.pyi -# Because we set `mypy-stubs` as a source root, this file will be +# Because we set `mypy-stubs` as a source root, this file will be # stripped to be simply `colors.pyi`. MyPy will look at this file for # imports of the `colors` module. @@ -182,7 +182,7 @@ python_source(name="django_settings", source="django_settings.py") ``` > 📘 MyPy Protobuf support -> +> > Add `mypy_plugin = true` to the `[python-protobuf]` scope. See [Protobuf](doc:protobuf-python) for more information. ### Add a first-party plugin @@ -207,7 +207,7 @@ plugins = python_source(name="plugin", source="change_return_type.py") ``` ```python pants-plugins/mypy_plugins/change_return_type.py -"""A contrived plugin that changes the return type of any +"""A contrived plugin that changes the return type of any function ending in `__overriden_by_plugin` to return None.""" from typing import Callable, Optional, Type @@ -236,7 +236,7 @@ Because this is a `python_source` or `python_sources` target, Pants will treat t ### Reports -MyPy can generate [various report files](https://mypy.readthedocs.io/en/stable/command_line.html#report-generation). +MyPy can generate [various report files](https://mypy.readthedocs.io/en/stable/command_line.html#report-generation). For Pants to properly preserve the reports, instruct MyPy to write to the `reports/` folder by updating its config file or `--mypy-args`. For example, in your pants.toml: @@ -250,11 +250,15 @@ Pants will copy all reports into the folder `dist/check/mypy`. Known limitations ----------------- -### Performance is often slower than normal +### Performance is sometimes slower than normal -Pants does not yet leverage MyPy's caching mechanism and daemon, so a typical run with Pants will likely be slower than using MyPy directly. +Pants 2.14 added support for leveraging MyPy's cache, making subsequent runs of MyPy extremely performant. +The support, however, requires features that were added to MyPy in version `0.700`, and requires that +`python_version` isn't set in MyPy's config or in `[mypy].args`. + +If you're using a version of MyPy older than `0.700`, consider upgrading to unlock super-speedy subsequent runs of MyPy. +Additionally consider not providing `python_version` in your config or args. -We are [working to figure out](https://github.com/pantsbuild/pants/issues/10864) how to leverage MyPy's cache in a way that is safe and allows for things like remote execution. Tip: only run over changed files and their dependees ---------------------------------------------------- diff --git a/src/python/pants/backend/python/typecheck/mypy/rules.py b/src/python/pants/backend/python/typecheck/mypy/rules.py index 8088ef03cb6..4dc6aa67618 100644 --- a/src/python/pants/backend/python/typecheck/mypy/rules.py +++ b/src/python/pants/backend/python/typecheck/mypy/rules.py @@ -3,9 +3,11 @@ from __future__ import annotations +import dataclasses import itertools from dataclasses import dataclass from hashlib import sha256 +from textwrap import dedent from typing import Iterable, Optional, Tuple import packaging @@ -36,15 +38,16 @@ from pants.base.build_root import BuildRoot from pants.core.goals.check import REPORT_DIR, CheckRequest, CheckResult, CheckResults from pants.core.util_rules.source_files import SourceFiles, SourceFilesRequest +from pants.core.util_rules.system_binaries import CpBinary, MkdirBinary, MvBinary from pants.engine.collection import Collection from pants.engine.fs import CreateDigest, Digest, FileContent, MergeDigests, RemovePrefix -from pants.engine.process import FallibleProcessResult +from pants.engine.process import FallibleProcessResult, Process from pants.engine.rules import Get, MultiGet, collect_rules, rule, rule_helper from pants.engine.target import CoarsenedTargets, FieldSet, Target from pants.engine.unions import UnionRule from pants.util.logging import LogLevel from pants.util.ordered_set import FrozenOrderedSet, OrderedSet -from pants.util.strutil import pluralize +from pants.util.strutil import pluralize, shell_quote @dataclass(frozen=True) @@ -89,7 +92,7 @@ async def _generate_argv( file_list_path: str, python_version: Optional[str], ) -> Tuple[str, ...]: - args = [f"--python-executable={venv_python}", *mypy.args] + args = [pex.pex.argv0, f"--python-executable={venv_python}", *mypy.args] if mypy.config: args.append(f"--config-file={mypy.config}") if python_version: @@ -98,8 +101,12 @@ async def _generate_argv( mypy_pex_info = await Get(PexResolveInfo, VenvPex, pex) mypy_info = mypy_pex_info.find("mypy") assert mypy_info is not None - if mypy_info.version > packaging.version.Version("0.700"): + if mypy_info.version > packaging.version.Version("0.700") and python_version is not None: + # Skip mtime checks because we don't propogate mtime when materialzing the sandbox, so the + # mtime checks will always fail otherwise. args.append("--skip-cache-mtime-check") + # See "__run_wrapper.sh" below for explanation + args.append("--sqlite-cache") # Added in v 0.660 args.extend(("--cache-dir", cache_dir)) else: # Don't bother caching @@ -135,6 +142,9 @@ async def mypy_typecheck_partition( build_root: BuildRoot, mypy: MyPy, python_setup: PythonSetup, + mkdir: MkdirBinary, + cp: CpBinary, + mv: MvBinary, ) -> CheckResult: # MyPy requires 3.5+ to run, but uses the typed-ast library to work with 2.7, 3.4, 3.5, 3.6, # and 3.7. However, typed-ast does not understand 3.8+, so instead we must run MyPy with @@ -227,6 +237,74 @@ async def mypy_typecheck_partition( requirements_venv_pex_request, file_list_digest_request ) + py_version = config_file.python_version_to_autoset( + partition.interpreter_constraints, python_setup.interpreter_versions_universe + ) + named_cache_dir = f".cache/mypy_cache/{sha256(build_root.path.encode()).hexdigest()}" + run_cache_dir = ".tmp_cache/mypy_cache" + argv = await _generate_argv( + mypy, + pex=mypy_pex, + venv_python=requirements_venv_pex.python.argv0, + cache_dir=run_cache_dir, + file_list_path=file_list_path, + python_version=py_version, + ) + + script_runner_digest = await Get( + Digest, + CreateDigest( + [ + FileContent( + "__mypy_runner.sh", + dedent( + f"""\ + # We want to leverage the MyPy cache for fast incremental runs of MyPy. + # Pants exposes "append_only_caches" we can leverage, but with the caveat + # that it requires either only appending files, or multiprocess-safe access. + # + # MyPy guarantees neither, but there's workarounds! + # + # By default, MyPy uses 2 cache files per source file, which introduces a + # whole slew of race conditions. We can minimize the race conditions by + # using MyPy's SQLite cache. MyPy still has race conditions when using the + # db, as it issues at least 2 single-row queries per source file at different + # points in time (therefore SQLite's own safety guarantees don't apply). + # + # To workaround this we make a copy of the db from the append_only_cache, + # run MyPy on it, then move the updated cache back to the append_only_cache. + # This is multiprocess-safe as mv on the same filesystem is an atomic "rename", + # and any processes copying the "old" file will still have valid file + # descriptors for the "old" file. + # + # There is a chance of multiple processes thrashing on the cache, leaving + # it in a state that doesn't reflect reality at the current point in time, + # and forcing other processes to do potentially done work. This strategy + # still provides a net benefit because the cache is generally _mostly_ + # valid (it includes entries for the standard library, and 3rdparty deps, + # among 1stparty sources), and even in the worst case + # (every single file has changed) the overhead of missing the cache each + # query should be small when compared to the work being done of typechecking. + # + # Lastly, we expect that since this is run through Pants which attempts + # to partition MyPy runs by python version (which the DB is independent + # for different versions) and uses a one-process-at-a-time daemon by default, + # multuple MyPy processes operating on a single db cache should be rare. + + {mkdir.path} -p {run_cache_dir}/{py_version} 2>&1 > /dev/null + {cp.path} {named_cache_dir}/{py_version}/cache.db {run_cache_dir}/{py_version}/cache.db 2>&1 > /dev/null || true + {' '.join((shell_quote(arg) for arg in argv))} + EXIT_CODE=$? + {mv.path} {run_cache_dir}/{py_version}/cache.db {named_cache_dir}/{py_version}/cache.db 2>&1 > /dev/null || true + exit $EXIT_CODE + """ + ).encode(), + is_executable=True, + ) + ] + ), + ) + merged_input_files = await Get( Digest, MergeDigests( @@ -236,6 +314,7 @@ async def mypy_typecheck_partition( closure_sources.source_files.snapshot.digest, requirements_venv_pex.digest, config_file.digest, + script_runner_digest, ] ), ) @@ -248,30 +327,20 @@ async def mypy_typecheck_partition( "MYPYPATH": ":".join(all_used_source_roots), } - cache_dir = f".cache/mypy_cache/{sha256(build_root.path.encode()).hexdigest()}" - - result = await Get( - FallibleProcessResult, + process = await Get( + Process, VenvPexProcess( mypy_pex, - argv=await _generate_argv( - mypy, - pex=mypy_pex, - venv_python=requirements_venv_pex.python.argv0, - cache_dir=cache_dir, - file_list_path=file_list_path, - python_version=config_file.python_version_to_autoset( - partition.interpreter_constraints, python_setup.interpreter_versions_universe - ), - ), input_digest=merged_input_files, extra_env=env, output_directories=(REPORT_DIR,), description=f"Run MyPy on {pluralize(len(python_files), 'file')}.", level=LogLevel.DEBUG, - append_only_caches={"mypy_cache": cache_dir}, + append_only_caches={"mypy_cache": named_cache_dir}, ), ) + process = dataclasses.replace(process, argv=("__mypy_runner.sh",)) + result = await Get(FallibleProcessResult, Process, process) report = await Get(Digest, RemovePrefix(result.output_digest, REPORT_DIR)) return CheckResult.from_fallible_process_result( result, diff --git a/src/python/pants/backend/python/typecheck/mypy/subsystem.py b/src/python/pants/backend/python/typecheck/mypy/subsystem.py index 9a8211ea065..0082717b89f 100644 --- a/src/python/pants/backend/python/typecheck/mypy/subsystem.py +++ b/src/python/pants/backend/python/typecheck/mypy/subsystem.py @@ -200,9 +200,11 @@ def check_and_warn_if_python_version_configured(self, config: FileContent | None ({doc_url('python-interpreter-compatibility')}). Instead, it will use what you set. - (Automatically setting the option allows Pants to partition your targets by their - constraints, so that, for example, you can run MyPy on Python 2-only code and - Python 3-only code at the same time. This feature may no longer work.) + (Allowing Pants to automatically set the option allows Pants to partition your + targets by their constraints, so that, for example, you can run MyPy on + Python 2-only code and Python 3-only code at the same time. It also allows Pants + to leverage MyPy's cache, making subsequent runs of MyPy very fast. + In the future, this feature may no longer work.) """ ) ) diff --git a/src/python/pants/core/util_rules/system_binaries.py b/src/python/pants/core/util_rules/system_binaries.py index afe8a827612..f5958a2ce40 100644 --- a/src/python/pants/core/util_rules/system_binaries.py +++ b/src/python/pants/core/util_rules/system_binaries.py @@ -326,6 +326,14 @@ class MkdirBinary(BinaryPath): pass +class CpBinary(BinaryPath): + pass + + +class MvBinary(BinaryPath): + pass + + class ChmodBinary(BinaryPath): pass @@ -687,6 +695,22 @@ async def find_mkdir() -> MkdirBinary: return MkdirBinary(first_path.path, first_path.fingerprint) +@rule(desc="Finding the `cp` binary", level=LogLevel.DEBUG) +async def find_cp() -> CpBinary: + request = BinaryPathRequest(binary_name="cp", search_path=SEARCH_PATHS) + paths = await Get(BinaryPaths, BinaryPathRequest, request) + first_path = paths.first_path_or_raise(request, rationale="copy files") + return CpBinary(first_path.path, first_path.fingerprint) + + +@rule(desc="Finding the `mv` binary", level=LogLevel.DEBUG) +async def find_mv() -> MvBinary: + request = BinaryPathRequest(binary_name="mv", search_path=SEARCH_PATHS) + paths = await Get(BinaryPaths, BinaryPathRequest, request) + first_path = paths.first_path_or_raise(request, rationale="move files") + return MvBinary(first_path.path, first_path.fingerprint) + + @rule(desc="Finding the `chmod` binary", level=LogLevel.DEBUG) async def find_chmod() -> ChmodBinary: request = BinaryPathRequest(binary_name="chmod", search_path=SEARCH_PATHS)