Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursion error when using UDFs in pyspark #709

Closed
skasai5296 opened this issue Dec 10, 2020 · 3 comments
Closed

Recursion error when using UDFs in pyspark #709

skasai5296 opened this issue Dec 10, 2020 · 3 comments
Labels
bug Something isn't working fixed in next version (main) A fix has been implemented and will appear in an upcoming version

Comments

@skasai5296
Copy link

Environment data

  • Language Server version: v2020.12.1
  • OS and version: macOS Catalina 10.15.7
  • Python version (& distribution if applicable, e.g. Anaconda): 3.6.9, in pipenv

Expected behaviour

Pylance works

Actual behaviour

Script runs without errors, but Pylance doesn't work when using the UDF.
Commenting out the get_ids function returns reportUndefinedVariable on the commented out function without the RecursionError, which is expected.

Code Snippet

import pyspark
from pyspark import sql
from pyspark.sql import functions as F, types as T


@F.udf(returnType=T.ArrayType(T.StringType()))
def get_ids():
    return ["a", "b", "c", "d"]


def main():
    spark = pyspark.SparkContext(appName="sample")
    session = sql.SparkSession(spark)
    df = session.createDataFrame([("hoge", 10), ("fuga", 100)])
    df = df.select(get_ids().alias("ids"))
    df.show(5)


if __name__ == "__main__":
    main()
An internal error occurred while type checking file \"/Users/xxx/sample.py\": RangeError: Maximum call stack size exceeded
    at canAssignType (/Users/xxx/.vscode/extensions/ms-python.vscode-pylance-2020.12.1/dist/pyright-internal/src/analyzer/typeEvaluator.ts:15557:14)
    at assignTypeToTypeVar (/Users/xxx/.vscode/extensions/ms-python.vscode-pylance-2020.12.1/dist/pyright-internal/src/analyzer/typeEvaluator.ts:15478:21)
    at canAssignType (/Users/xxx/.vscode/extensions/ms-python.vscode-pylance-2020.12.1/dist/pyright-internal/src/analyzer/typeEvaluator.ts:15634:24)
    at verifyTypeArgumentsAssignable (/Users/xxx/.vscode/extensions/ms-python.vscode-pylance-2020.12.1/dist/pyright-internal/src/analyzer/typeEvaluator.ts:15286:30)
    at canAssignClassWithTypeArgs (/Users/xxx/.vscode/extensions/ms-python.vscode-pylance-2020.12.1/dist/pyright-internal/src/analyzer/typeEvaluator.ts:15160:18)
    at canAssignClass (/Users/xxx/.vscode/extensions/ms-python.vscode-pylance-2020.12.1/dist/pyright-internal/src/analyzer/typeEvaluator.ts:14998:24)
    at canAssignType (/Users/xxx/.vscode/extensions/ms-python.vscode-pylance-2020.12.1/dist/pyright-internal/src/analyzer/typeEvaluator.ts:15961:22)
    at assignTypeToTypeVar (/Users/xxx/.vscode/extensions/ms-python.vscode-pylance-2020.12.1/dist/pyright-internal/src/analyzer/typeEvaluator.ts:15521:18)
    at canAssignType (/Users/xxx/.vscode/extensions/ms-python.vscode-pylance-2020.12.1/dist/pyright-internal/src/analyzer/typeEvaluator.ts:15634:24)
    at zn (/Users/xxx/.vscode/extensions/ms-python.vscode-pylance-2020.12.1/dist/pyright-internal/src/analyzer/typeEvaluator.ts:17268:25)
@erictraut
Copy link
Contributor

erictraut commented Dec 10, 2020

I'm not able to repro the problem with the above sample with just pyspark installed. I was able to repro it with pyspark-stubs installed. I presume you've installed the stubs.

The cause of the overflow is due to what is arguably a bug in the pyspark stubs. The Column class is annotated to include a __getattr__ method that returns a Column object. When you use a Column object as the left side of a call expression (as in get_ids(), Pylance looks for a __call__ attribute on the object. There is no __call__ attribute, but there is a __getattr__. This returns a Column object, which is then called. This results in infinite recursion.

I've added protection against this case by ignoring any __call__ attribute (or its proxy returned by __getattr__) that is not a function. That will avoid the infinite recursion in the type analyzer. This fix will be in the next release.

@erictraut erictraut added bug Something isn't working fixed in next version (main) A fix has been implemented and will appear in an upcoming version and removed triage labels Dec 10, 2020
@skasai5296
Copy link
Author

Thank you @erictraut for your rocket-fast response, I'll look forward to the next release.

@jakebailey
Copy link
Member

This issue has been fixed in version 2020.12.2, which we've just released. You can find the changelog here: https://github.com/microsoft/pylance-release/blob/master/CHANGELOG.md#2020122-11-december-2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed in next version (main) A fix has been implemented and will appear in an upcoming version
Projects
None yet
Development

No branches or pull requests

3 participants