`TCH` setting to autofix with conversion to string #5559

smackesey · 2023-07-06T11:46:36Z

Currently the TCH (flake8-type-checking) rules require annotations to be stored as strings (i.e. not evaluated). Annotations are stored a strings if:

The annotations are string literals (i.e. are quoted)
The annotations are in a module with from __future__ import annotations (which causes evaluation of all annotations in the module to be skipped)
The annotations are of local variables (the Python interpreter does not evaluate local variable annotations)

One approach to getting TCH autofix working is to inject from __future__ import annotations into all your source modules (using I002). This is not ideal because string annotations unavoidably break code that meets these conditions:

annotations are being created in a local scope
an annotation references a symbol only available in that local scope
the annotation must be operated on at runtime

Here is an example of code meeting these conditions:

def make_bar():
    class Bar:
        ...

    def bar(x: "Bar"):
        ...
    return bar

bar = make_bar()

# `bar` has a string annotation-- if we want to operate on it, we need to evaluate it. This requires
# calling `get_type_hints`, but this fails because "Bar" can't be dereferenced since referent `Bar`
# vanished with local scope
get_type_hints(bar)  # => NameError: name 'Bar' is not defined.

We have plenty of code meeting these conditions in Dagster (mostly in tests), so my experiment in injecting from __future__ import annotations to enable TCH autofix has failed. Other projects that use runtime type annotation introspection are likely to encounter the same issues.

Therefore, a significant enhancement to TCH autofix would be to detect where annotations could be stored as strings, and perform a targeted conversion for you using quoting. Example:

from threading import Lock

def foo(lock: Lock):
    ...

After applying TCH autofix:

from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from threading import Lock

def foo(lock: "Lock"):
    ...

This would make TCH autofix a lot more useful for us and probably many other projects.

The text was updated successfully, but these errors were encountered:

charliermarsh · 2023-07-06T13:54:22Z

Yeah, I agree this would be useful. I'm not yet certain how hard it is to get right, but I agree that it would be useful... flake8-type-checking has rules for this, which we could look at for reference: https://pypi.org/project/flake8-type-checking/.

annotations are being created in a local scope.

Can you give an example of this, just to illustrate the problem?

smackesey · 2023-07-06T14:04:34Z

Can you give an example of this, just to illustrate the problem?

I updated OP with an example.

charliermarsh · 2023-07-14T20:28:02Z

Thanks. I think it will be difficult to reliably determine whether an annotation is required at runtime 😓

smackesey · 2023-07-14T20:48:54Z

I think it will be difficult to reliably determine whether an annotation is required at runtime

Haven't thought it all the way through, but can't you just see if the only uses of a symbol are inside non-local type annotations, while also applying the existing settings used to special case e.g. Pydantic? Maybe determining scoping is harder than I think.

charliermarsh · 2023-07-14T21:02:30Z

Can you explain what you mean in terms of this example? (I.e., why x: Bar can't be quoted.)

def make_bar():
    class Bar:
        ...

    def bar(x: Bar):
        ...

    return bar

bar = make_bar()

print(get_type_hints(bar))

smackesey · 2023-07-14T21:08:55Z

Can you explain what you mean in terms of this example? (I.e., why x: Bar can't be quoted.)

With x: Bar (unquoted case), bar.__annotations__["x"] stores a direct reference to Bar.
When x: "Bar" (quoted case), bar.__annotations__["x"] stores the string "Bar".
When get_type_hints(bar) is applied, Python tries to resolve each annotation
- If it's a type, just return it.
- If it's as string, dereference it.
When Python tries to dereference the string "Bar", it comes up empty and throws a NameError.

I'm not sure why Python fails in exactly this scenario. I guess the defining module for the object holding the annotations is known, so Python is able to look up the referent of string annotations if they exist in the defining module's __dict__.

charliermarsh · 2023-07-14T21:10:06Z

But, does "whether we can quote it" depend on whether and when get_type_hints is called? Or is there a more robust and general methodology we can follow?

smackesey · 2023-07-14T21:15:11Z

But, does "whether we can quote it" depend on whether and when get_type_hints is called? Or is there a more robust and general methodology we can follow?

I would say just be conservative and always assume that get_type_hints might be called-- so if it references a local scope symbol, don't quote it.

In practice referencing local scope symbols is going to be pretty rare in 99% of Python code anyway.

charliermarsh · 2023-07-14T21:16:35Z

Ah I see -- so we would avoid quoting annotations that rely on symbols defined outside of the module scope.

charliermarsh · 2023-07-14T22:30:51Z

It's also a little less critical for the TCH rules in particular since we're only dealing with imported symbols. But I suppose if the import itself is within a function scope...

@op

…g import autosort) (#14675) ## Summary & Motivation Internal companion PR: dagster-io/internal#5847 - Turn on TID251 (banned imports) and configure it to ban `from __future__ import annotations`. - Turn on TCH suite of rules, which allows auto-sort to automatically place typing-only imports in a `TYPE_CHECKING` block. This PR was a journey. The goal was to get automatic sorting of imports only used in type annotations into `TYPE_CHECKING` blocks. Ruff offers the `TCH` set of rules for this, for which autofix was recently added. Unfortunately, it turned out to be more complicated than just turning on these rules, because the way things currently work, ruff will only autosort the imports if the annotations are already all strings, which most of the time means `from __future__ import annotations` is imported. The initial plan was actually opposite the current state of this PR-- the idea was to _inject_ `from __future__ import annotations` (using `I002`) into all Python modules for maximum `TCH` power. However, for reasons explained below, `from __future__ import annotations` (a) does not work well with Dagster in particular; (b) is not a good thing to add everywhere because it's not part of the future of Python. Therefore, I think the best thing to do for simplicity and future-proofing of the codebase is to ban use of `from __future__ import annotations`. It was imported in ~15 of our 2500 source files-- this PR removes them, adds needed quotes to affected type annotations, and turns on `TID251` to flag any future use of `from __future__ import annotations`. Without `from __future__ import annotations`, typing-only import auto-sort (`TCH`) is relatively limited. However, I have submitted a [ruff feature request](astral-sh/ruff#5559) to get a new ruff setting that will make it much more powerful for codebases not using `from __future__ import annotations`. --- The issues with string annotations, `TYPE_CHECKING`, and `from __future__ import annotations` are complex and merit a thorough explanation. Suppose I have the following module: ``` from dagster._core.definitions.events import AssetKey def do_something_to_asset(key: AssetKey): ... ``` `AssetKey` above is only used in a type annotation. However, if we move it to a `TYPE_CHECKING` block with no other changes, then our code will crash. That's because, by default, Python evaluates type annotations (in function sigs and module-scope variables, but _not_ local var annotations) as standard expressions at runtime. If the `AssetKey` import is in a `TYPE_CHECKING` block, then it won't execute at runtime, and when the `AssetKey` annotation is encountered, Python will be unable to evaluate the symbol `AssetKey` and will error. Therefore we can only move the `AssetKey` import to a typing block under two conditions (1) we turn it into a string annotation; (2) the evaluated expression never needs to be accessed. Let's consider both of these separately: ## 1. Turning annotation into a string There are two ways to do this: 1. Quote the `AssetKey` annotation. 2. Add `from __future__ import annotations` to the file. This will cause _all_ annotations in the file to be interpreted it as strings. It is equivalent to quoting all annotations in the file. Option (2) is smoother and eliminates the extra visual complexity of having some arguments quoted and others not. It also improves import time. (Improvement for Dagster is 10-20%; I performed a rough experiment timing `import dagster` on this branch vs master) This was one of the motivations for PEP 562, which introduced `from __future__ import annotations` and _was_ scheduled to become the standard way annotations are evaluated in Python 3.11-- but that plan has been changed, see below. Turning _all_ annotations into strings does not play nicely with most libs that do runtime processing of annotations. This includes Pydantic and... Dagster. The reason is that these libs need access to the result of the evaluated expression, not just a string: ``` from dagster import op from pandas import DataFrame @op def foo(bar: "DataFrame"): # This is currently an error ... ``` This problem can be somewhat mitigated by using `typing.get_type_hints` on functions that might have string annotations-- `get_type_hints` evaluates the annotations and returns them, providing the necessary runtime values. The problem is that it doesn't always work-- if string annotations refer to symbols in local scope, `get_type_hints` can't resolve them: ``` from typing import get_type_hints class Foo: ... def foo(x: "Foo"): ... # works to dereference "Foo" since referent `Foo` is in module-level scope get_type_hints(foo) # => {'x': <class '__main__.Foo'>} def make_bar(): class Bar: ... def bar(x: "Bar"): ... return bar bar = make_bar() # does not work to dereference "Bar" since referent `Bar` vanished with local scope get_type_hints(bar) # => NameError: name 'Bar' is not defined. ``` So string annotations, either explicit or via `from __future__ import annotations`, are flatly incompatible with any situation where (1) annotations are being created in a local scope; (2) an annotation references a symbol only available in that local scope; (3) the annotation must be operated on at runtime (and therefore evaluated with `get_type_hints`). For the above reasons (and because of the popularity of Pydantic), the Python Steering Committee nixed plans to adopt string annotations in Python 3.11. The new plan is described in [PEP 649](https://peps.python.org/pep-0649/) and will instead package all annotations in a lazily evaluated function. This both enables forward references _and_ ensures that the evaluated value of an annotation can always be accessed. Unfortunately, PEP 649 won't become part of Python until 3.12, and it's unclear to me whether it will be backportable via another `from __future__` import (but I suspect no). Since we need to support Python <3.12 for many years, that means we probably won't have access to this solution. ## 2. Evaluated expression never needs to be accessed Even if we had access to PEP 649 the problems discussed in the preceding section were solved, the ability to move "annotation-only" imports to a `TYPE_CHECKING` block is still limited by whether the evaluated value _ever_ needs to be accessed. If it does, you can't move an import to a `TYPE_CHECKING` block. For us, this mostly means that any type used in an annotation for a Dagster decorator should never be moved to a `TYPE_CHECKING` block: ``` # This is only used in an annotation but it will break dagster if it is moved to a `TYPE_CHECKING` class from foo import SomeDagsterConfigClass @asset def an_asset(config: SomeDagsterConfigClass): ... ``` Fortunately ruff allows configuring `TCH` rules to detect special decorators like `@asset`, so I don't think this limitation should block us. ## How I Tested These Changes ruff

charliermarsh · 2023-07-23T04:44:06Z

I have something that I've been testing against Dagster here: #6001. The outputs look correct to me -- one example:

--- a/examples/assets_smoke_test/assets_smoke_test/pure_python_assets.py
+++ b/examples/assets_smoke_test/assets_smoke_test/pure_python_assets.py
@@ -1,5 +1,9 @@
+from typing import TYPE_CHECKING
+
 from dagster import SourceAsset, TableSchema, asset
-from pandas import DataFrame
+
+if TYPE_CHECKING:
+    from pandas import DataFrame

 raw_country_populations = SourceAsset(
     "raw_country_populations",
@@ -19,7 +23,7 @@


 @asset
-def country_populations(raw_country_populations) -> DataFrame:
+def country_populations(raw_country_populations) -> "DataFrame":
     country_populations = raw_country_populations.copy()
     country_populations["change"] = (
         country_populations["change"]
@@ -32,13 +36,13 @@ def country_populations(raw_country_populations) -> DataFrame:


 @asset
-def continent_stats(country_populations: DataFrame) -> DataFrame:
+def continent_stats(country_populations: "DataFrame") -> "DataFrame":
     result = country_populations.groupby("continent").agg({"pop2019": "sum", "change": "mean"})
     return result


 @asset
-def country_stats(country_populations: DataFrame, continent_stats: DataFrame) -> DataFrame:
+def country_stats(country_populations: "DataFrame", continent_stats: "DataFrame") -> "DataFrame":
     result = country_populations.join(continent_stats, on="continent", lsuffix="_continent")
     result["continent_pop_fraction"] = result["pop2019"] / result["pop2019_continent"]
     return result

But not sure how best to check whether the rule is having any unintended effects on runtime behavior.

tylerlaprade · 2023-07-23T16:04:18Z

If there's an easy way for me to locally run a preview version of Ruff, I'm happy to run your enhanced rule on my private codebase and verify the runtime behavior. Obviously would be easier if someone else has a large open-source codebase to test on, though.

charliermarsh · 2023-07-23T16:08:24Z

Thanks! I can kick off a release build dry-run which will generate wheel files that you can download and pip install. I'll post here with the link.

charliermarsh · 2023-07-23T16:43:21Z

If you'd like, you can click here, download the "Wheels" at the bottom, unzip, and pip install the wheel that matches your platform.

tylerlaprade · 2023-07-23T16:52:24Z

This looks much more comprehensive than when I made from __future__ import annotations required (which barely modified anything)! I really like the looks of it. However, a runtime error has been introduced that doesn't get detected by my typechecker:

tylerlaprade · 2023-07-23T17:15:25Z

I did a global find-and-replace to convert "(\w+)" \| None to $1 | None. This fixed all my runtime issues (at least until I re-ran Ruff).

EDIT: The above find-and-replace loses some of the benefits of this rule. Replacing with "$1 | None" is much better for two reasons:

It retains the performance benefits of this rule.
It does not get undone when re-running Ruff.

charliermarsh · 2023-07-23T17:31:00Z

Oh yes, that case makes sense, we need handle it. Thanks so much for testing it out!

smackesey · 2023-07-24T13:16:32Z

The above outputs look good to me, as long as there's a setting to avoid applying the rule to functions with certain decorators (and IIRC this already exists).

But not sure how best to check whether the rule is having any unintended effects on runtime behavior.

Potential unintended runtime effects are unavoidable since people could always be accessing .__annotations__ and performing arbitrary logic with it. But as long as those annotations aren't coming from local scope, they can be easily worked around by changing direct .__annotations__ access to use typing.get_type_hints.

charliermarsh · 2023-07-26T20:09:50Z

I'm blocked on supporting unions (e.g., when quoting DataFrame in DataFrame | int, you need to quote the entire union, like "DataFrame | int"). There are just a bunch of edge cases and we don't have access to the full AST when constructing the fix (so we need to do manual lexing). This might require some new infrastructure to be fully robust.

tylerlaprade · 2023-07-26T20:50:13Z

Totally understandable. For now, is there value in shipping it with an exclusion to not add quotes to any types involving unions? It still seemed like a major improvement over from __future__ import annotations when I tested the preview.

…luated references (#6001) ## Summary This allows us to fix usages like: ```python from pandas import DataFrame def baz() -> DataFrame: ... ``` By quoting the `DataFrame` in `-> DataFrame`. Without quotes, moving `from pandas import DataFrame` into an `if TYPE_CHECKING:` block will fail at runtime, since Python tries to evaluate the annotation to add it to the function's `__annotations__`. Unfortunately, this does require us to split our "annotation kind" flags into three categories, rather than two: - `typing-only`: The annotation is only evaluated at type-checking-time. - `runtime-evaluated`: Python will evaluate the annotation at runtime (like above) -- but we're willing to quote it. - `runtime-required`: Python will evaluate the annotation at runtime (like above), and some library (like Pydantic) needs it to be available at runtime, so we _can't_ quote it. This functionality is gated behind a setting (`flake8-type-checking.quote-annotations`). Closes #5559.

smackesey mentioned this issue Jul 6, 2023

[ruff] Ban from __future__ import annotations and enable TCH (typing import autosort) dagster-io/dagster#14675

Merged

charliermarsh added the core Related to core functionality label Jul 6, 2023

smackesey mentioned this issue Jul 14, 2023

[feature-request] Detect invalid unquoted imports #5753

Open

charliermarsh mentioned this issue Jul 23, 2023

Allow flake8-type-checking rules to automatically quote runtime-evaluated references #6001

Merged

charliermarsh closed this as completed in #6001 Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`TCH` setting to autofix with conversion to string #5559

`TCH` setting to autofix with conversion to string #5559

smackesey commented Jul 6, 2023 •

edited

Loading

charliermarsh commented Jul 6, 2023

smackesey commented Jul 6, 2023

charliermarsh commented Jul 14, 2023

smackesey commented Jul 14, 2023

charliermarsh commented Jul 14, 2023

smackesey commented Jul 14, 2023 •

edited

Loading

charliermarsh commented Jul 14, 2023

smackesey commented Jul 14, 2023 •

edited

Loading

charliermarsh commented Jul 14, 2023

charliermarsh commented Jul 14, 2023

charliermarsh commented Jul 23, 2023

tylerlaprade commented Jul 23, 2023

charliermarsh commented Jul 23, 2023

charliermarsh commented Jul 23, 2023

tylerlaprade commented Jul 23, 2023

tylerlaprade commented Jul 23, 2023 •

edited

Loading

charliermarsh commented Jul 23, 2023

smackesey commented Jul 24, 2023

charliermarsh commented Jul 26, 2023

tylerlaprade commented Jul 26, 2023

TCH setting to autofix with conversion to string #5559

TCH setting to autofix with conversion to string #5559

Comments

smackesey commented Jul 6, 2023 • edited Loading

charliermarsh commented Jul 6, 2023

smackesey commented Jul 6, 2023

charliermarsh commented Jul 14, 2023

smackesey commented Jul 14, 2023

charliermarsh commented Jul 14, 2023

smackesey commented Jul 14, 2023 • edited Loading

charliermarsh commented Jul 14, 2023

smackesey commented Jul 14, 2023 • edited Loading

charliermarsh commented Jul 14, 2023

charliermarsh commented Jul 14, 2023

charliermarsh commented Jul 23, 2023

tylerlaprade commented Jul 23, 2023

charliermarsh commented Jul 23, 2023

charliermarsh commented Jul 23, 2023

tylerlaprade commented Jul 23, 2023

tylerlaprade commented Jul 23, 2023 • edited Loading

charliermarsh commented Jul 23, 2023

smackesey commented Jul 24, 2023

charliermarsh commented Jul 26, 2023

tylerlaprade commented Jul 26, 2023

`TCH` setting to autofix with conversion to string #5559

`TCH` setting to autofix with conversion to string #5559

smackesey commented Jul 6, 2023 •

edited

Loading

smackesey commented Jul 14, 2023 •

edited

Loading

smackesey commented Jul 14, 2023 •

edited

Loading

tylerlaprade commented Jul 23, 2023 •

edited

Loading