Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False positive for possibly unbound variable #496

Closed
hauntsaninja opened this issue Oct 15, 2020 · 19 comments
Closed

False positive for possibly unbound variable #496

hauntsaninja opened this issue Oct 15, 2020 · 19 comments
Labels
fixed in next version (main) A fix has been implemented and will appear in an upcoming version

Comments

@hauntsaninja
Copy link

Language Server version: Pylance language server 2020.10.0 (pyright d1f950ac)
Python version (& distribution if applicable, e.g. Anaconda): 3.8

Screen Shot 2020-10-15 at 4 49 47 PM

Expected behaviour

Pylance should treat the else statement as being unreachable and shouldn't complain about the possibility of a being unbound.

Actual behaviour

Pylance doesn't treat the else statement as being unreachable and does complain about the possibility of a being unbound.

Code Snippet / Additional information

from typing import Union

def f(x: Union[int, str, list]) -> None:
    if isinstance(x, int):
        a = 1
    elif isinstance(x, str):
        a = 2
    elif isinstance(x, list):
        a = 3
    else:
        reveal_type(x)
    print(a)
@erictraut
Copy link
Contributor

erictraut commented Oct 16, 2020

I agree that it should theoretically be possible to determine that a is not unbound in this instance, but I can't think of a practical way to determine this using the lazy evaluation technique employed in the pyright architecture. The TypeScript compiler (upon which the pyright architecture is based) does not handle this case either.

Screen Shot 2020-10-15 at 9 18 10 PM

Here are several workarounds:

  1. Assign a a dummy value in the else clause.
  2. Remove the unnecessary elif isinstance(x, list) statement and replace it with an else.
  3. Use a # type: ignore comment to suppress the error on that line.
  4. Use a # pyright: reportUnboundVariable=false comment to suppress the error in that file.

@savannahostrowski savannahostrowski added the waiting for user response Requires more information from user label Oct 19, 2020
@github-actions github-actions bot removed the triage label Oct 19, 2020
@savannahostrowski
Copy link
Contributor

This issue has been open for 30 days waiting for information. I'm going to close this issue but if the above guidance doesn't work for you, please let us know by reopening the issue.

@hauntsaninja
Copy link
Author

I don't think this is a waiting-for-info (if it is, let me know what more information I can give!); looks to me like Eric considers this a won't-fix given the limitations of pyright's architecture.

@jakebailey
Copy link
Member

We marked this as waiting-for-info to see if you would respond with any feedback for the workarounds, or if the original code is the only way to do it as intended.

@jakebailey jakebailey reopened this Nov 19, 2020
@erictraut
Copy link
Contributor

Here's another better workaround. I've seen this technique recommended in some mypy discussions. It involves the use of an assert_never function.

def assert_never(a: NoReturn) -> NoReturn:
    raise RuntimeError("Should not get here")

def f(x: Union[int, str, list]) -> None:
    if isinstance(x, int):
        a = 1
    elif isinstance(x, str):
        a = 2
    elif isinstance(x, list):
        a = 3
    else:
        assert_never(x)

    print(a)

@hauntsaninja
Copy link
Author

hauntsaninja commented Nov 19, 2020

Yeah, I used an else that had raise AssertionError, i.e. the runtime equivalent of the assert_never trick. assert_never is a good fit for this kind of issue, thanks for reminding me of it!
Feedback for the other workarounds: 1) is kind of ugly / didn't work well with my actual code; 2) reads less explicitly / isn't as robust in the face of change; 3,4) type ignores are always an acceptable workaround for false positives :-)

There's also the question of the false negative for the unreachable code (which obviously doesn't need a workaround / isn't as big a deal). For what it's worth, mypy --warn-unreachable catches this (and also correctly avoids flagging assert_never / raise AssertionError).

@mcrumiller
Copy link

mcrumiller commented Jan 5, 2021

Edit: I'm super wrong, sorry. Keeping what I wrote for posterity.


Yeah, there's no way pylance could do this. isinstance might be overloaded in the current module, in which case you could get a runtime error once you got to print(a). Imagine the following:

def isinstance(x, type):
    return False

from typing import Union

def f(x: Union[int, str, list]) -> None:
    if isinstance(x, int):
        a = 1
    elif isinstance(x, str):
        a = 2
    elif isinstance(x, list):
        a = 3
    else:
        print(type(x))

    print(a)

f(3)

This gives error:

Exception has occurred: UnboundLocalError
local variable 'a' referenced before assignment
  File "C:\code\blah.py", line 16, in f
    print(a)
  File "C:\code\blah.py", line 18, in 
    f(3)

pylance would have to be able to interpret and validate isinstance, which would probably violate the halting problem.

@erictraut
Copy link
Contributor

@mcrumiller, that's not a problem. Pylance knows the difference between builtins.isinstance and a locally-defined version. Narrowing logic is applied only for the builtins version.

@hauntsaninja
Copy link
Author

That's not really relevant — the same argument argument applies to any type narrowing involving isinstance. It's easy for static type checkers to detect the overloading you describe and all of them do.

@sylann
Copy link

sylann commented Jan 29, 2021

I am not sure about the first example but there is one other case that I encounter a lot, in many code bases, and it seems a bit more problematic in my opinion since it is a basic specificity of python:

Here is an example:

def best_unit(dt: float) -> Tuple[float, str]:
    units = (
        (1.0, "sec"),
        (1e-3, "msec"),
        (1e-6, "usec"),
        (1e-9, "nsec"),
    )
    for scale, unit in units:
        if dt >= scale:
            break
    return scale, unit  # "unit" is possibly unbound Pylance(reportUnboundVariable)

In this case the warning message is semantically incorrect because units is obviously not empty.

@erictraut
Copy link
Contributor

A type checker has no way of knowing whether an iterable type is going to iterate zero or more times. It simply knows that it's iterable and what types it will return. In this case, units is an iterable type, and the for statement is iterating over it. You happen to know that units is a non-empty tuple, and when a non-empty tuple is iterated over, it will not iterate zero times. A type checker does not have this information or knowledge. If it assumes that all iterables iterate at least once in all cases, it will fail to detect an entire class of errors.

A tuple is the only iterable type where a type checker sometimes (but not always) knows the length. All other iterable types (like lists, dictionaries, etc.) have lengths that are unknown to a static type checker. It's possible that in the special case of the tuple (as shown in your example), a type checker could avoid this false positive, but it would be complicated because the control flow graph would need to vary based on the type of the iterable. I'll give more thought to this and see if I can come up with a way to handle this case without introducing other false positives or greatly slowing down analysis.

In the meantime, the workaround is straightforward. If you're going to use a variable after a loop, always initialize it before the loop.

unit = ""
for scale, unit in units:
    ...
return scale, unit

Or add a # type: ignore comment.

@erictraut erictraut added the fixed in next version (main) A fix has been implemented and will appear in an upcoming version label Jan 31, 2021
@jakebailey jakebailey removed the waiting for user response Requires more information from user label Feb 1, 2021
@jakebailey
Copy link
Member

This issue has been fixed in version 2021.2.0, which we've just released. You can find the changelog here: https://github.com/microsoft/pylance-release/blob/main/CHANGELOG.md#202120-3-february-2021

@localrobot
Copy link

I can still see reproduce this in a try-except-else-finally block in version 2021.2.0, python 3.9.0, VS Code 1.53.0

try:
    some_fn_that_can_throw_an_exception()
except Exception:
    success = False
else:
    success = True
finally:
    print(success)  # "success" is possibly unbound

Or, am I missing something?

@erictraut
Copy link
Contributor

The try/except/else/finally clause is different from this issue. Please refer to this issue instead.

@hauntsaninja
Copy link
Author

hauntsaninja commented Feb 17, 2022

Thank you for the talk earlier today!

I was reminded of this issue and thought it might be interesting to understand why lazy type evaluation means pyright can't determine that that branch is unreachable and "a" is unbound (although maybe I just need to start reading pyright code :-) Feel free to ignore this comment, it's basically just me rambling to myself.

From the slides, it sounds like: pyright walks backward through the code flow graph. It sees a, tries to evaluate it, and sees a branch that doesn't have a assigned. But it doesn't yet know that that branch is unreachable — because you need to go forward from narrowed variables to deduce this — so it complains.

Trying out some stuff, pyright doesn't complain about:

def g():
    if True:
        a = 0
    else:
        pass
    print(a)  # no complaint

So clearly pyright is able to eliminate parts of the code flow graph as unreachable. But:

def h(truth: Literal[True]):
    if truth:
        a = 0
    else:
        pass
    print(a)  # pyright complains

def i():
    truth: Final = True
    if truth:
        a = 0
    else:
        pass
    print(a)  # pyright complains

So seems like whatever reachability checking of the code flow graph pyright does happens before we get a chance to infer local types?

mypy has some reachability analysis like if TYPE_CHECKING happen early in semantic analysis, before we know any types, but most of it happens at type check time, where it's eager and forward and knows about everything. It knows when it's done some impossible narrowing and marks statements in scope as unreachable, with some silencing logic for other diverging statements like raise Exception and assert_never

Anyway, I think I see why it's not really feasible for pyright to determine branch reachability. There's just no good time to do it! So (idiomatic use of) assert_never actually gives pyright information, whereas with mypy it's usually telling mypy something mypy already knows.

def j():
    truth: Final = True
    if truth:
        a = 0
    else:
        assert_never(truth)
    print(a)  # no complaints

Here's an interesting example:

def k():
    truth: Final = True
    if truth:
        a = 0
    else:
        def inner():
            return truth
        inner()  # pyright knows that this returns Never, but there are no reachability implications
        1 + 1
    print(a)  # pyright complains

That sort of squares with "pyright determines reachability when building a code flow graph, at which point it doesn't have access to inferences made from code flow".

Okay, but maybe pyright could still avoid the false positive for a being unbound? There is still a good time to do this: do extra checks when we'd otherwise complain about it being unbound (e.g. traverse the problematic code paths in the forward direction and see if anything gets narrowed to Never). That sounds kind of gross, but not too performance killing.

(I've already rambled longer than I thought I would, so I'm cutting myself off. Feel free to ignore, but I wouldn't mind code pointers or commentary)

@erictraut
Copy link
Contributor

Like mypy, pyright is able to statically evaluate certain expression forms within the binder. This includes expressions like True or TYPE_CHECKING or sys.version_info >= (3, 10). The binder takes into account these expressions when it builds the code flow graph.

The other examples above involve expressions that cannot be evaluated by the binder. They involve the type evaluator, which depends on the code flow graph.

All of the examples you provided could theoretically be handled by pyright in the type evaluator, but they would be expensive in terms of analysis performance. In response to this bug report, I added support for "implied else" narrowing. When I initially added this support, we saw a significant spike in analysis and completion times. I mitigated this by excluding the "falsy/truthy" type narrowing pattern, which is not typically used in cases where type exhaustion is the goal.

The "implied else" support doesn't work in the case where the "else" is explicit, as in a couple of your cases above.

I'm wondering whether you're raising this primarily because you're curious about a theoretical edge case, or is this a real case that you think is important to handle? Like I said, there are ways to handle this in pyright, but I question whether that would be the right tradeoff if it were to have a performance impact that affected the vast majority of pylance and pyright users.

@hauntsaninja
Copy link
Author

I don't think anything here is particularly important to handle and I trust you to make the correct tradeoffs.
My comment ^ was just me trying to understand what can be done performantly with lazy type evaluation and why pyright can or cannot do certain things :-) Sorry for not making that clear!

Oh neat, "implied else" is very similar to these examples. It looks like it might be implemented here: https://github.com/microsoft/pyright/blob/b9e33caee9823fc59a4ee53103ce5fcbdca1f71b/packages/pyright-internal/src/analyzer/binder.ts#L1125 I'll play around with it and see if I start to understand pyright's impl better.

I also noticed this maybe-missing-feature?

def func1(x: Literal[True]):
    reveal_type(x)  # Literal[True]
    if x is True:
        y = True
    
    print(y) # no error


def func2():
    x: Final = True
    reveal_type(x)  # Literal[True]
    if x is True:
        y = True
    
    print(y) # error

@erictraut
Copy link
Contributor

This difference in behavior highlights another limitation of the "implied else narrowing" implementation. It requires that the reference expression for the condition (in this case, the reference expression is x) has a declared type. If its type were not declared, it would need to be inferred, and that's expensive. That's because we're already in the process of inferring the type of y (which is an expensive code flow analysis operation to begin with), and we'd need to also infer the type of x. This is especially expensive in code that is untyped.

So your second example works if you annotate x.

def func2():
    x: Literal[True] = True
    if x is True:
        y = True

    reveal_type(y)

Here's the code that imposes that limitation: https://github.com/microsoft/pyright/blob/561ec5b093cc09c3b6216c51a161388447b551b4/packages/pyright-internal/src/analyzer/codeFlowEngine.ts#L721

My main motivation for adding "implied else narrowing" was to support type exhaustion checks like the sample at the top of this issue. And in those cases, the type of the reference expression is typically declared.

@gonzalogjuka
Copy link

I agree that it should theoretically be possible to determine that a is not unbound in this instance, but I can't think of a practical way to determine this using the lazy evaluation technique employed in the pyright architecture. The TypeScript compiler (upon which the pyright architecture is based) does not handle this case either.

Screen Shot 2020-10-15 at 9 18 10 PM

Here are several workarounds:

  1. Assign a a dummy value in the else clause.
  2. Remove the unnecessary elif isinstance(x, list) statement and replace it with an else.
  3. Use a # type: ignore comment to suppress the error on that line.
  4. Use a # pyright: reportUnboundVariable=false comment to suppress the error in that file.

Excelent workaround!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed in next version (main) A fix has been implemented and will appear in an upcoming version
Projects
None yet
Development

No branches or pull requests

8 participants