Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: static code analysis fails with SystemError: AST constructor recursion depth mismatch (before=90, after=65) #2976

Closed
1 task done
nfx opened this issue Oct 16, 2024 · 0 comments · Fixed by #3000
Closed
1 task done
Labels
migrate/code Abstract Syntax Trees and other dark magic

Comments

@nfx
Copy link
Collaborator

nfx commented Oct 16, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Tree.normalize_and_parse(code) fails

Expected Behavior

Tree.normalize_and_parse(code) must fail with a good explanation

Steps To Reproduce

No response

Cloud

AWS

Operating System

macOS

Version

latest via Databricks CLI

Relevant log output

21:31:55 ERROR [d.l.blueprint.parallel][linting_workflows_0] linting workflows(543887818684385) task failed: AST constructor recursion depth mismatch (before=90, after=65): Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/blueprint/parallel.py", line 158, in inner
    return func(*args, **kwargs), None
           ^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/source_code/jobs.py", line 423, in lint_job
    problems, dfsas, tables = self._lint_job(job)
                              ^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/source_code/jobs.py", line 463, in _lint_job
    for dfsa in task_dfsas:
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/source_code/jobs.py", line 521, in _collect_task_dfsas
    for dfsa in DfsaCollectorWalker(graph, set(), self._path_lookup, session_state, self._migration_index):
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/source_code/graph.py", line 601, in __iter__
    yield from self._iter_one(dependency, self._graph, root_path)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/source_code/graph.py", line 618, in _iter_one
    yield from self._iter_one(child_dependency, child_graph, root_path)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/source_code/graph.py", line 618, in _iter_one
    yield from self._iter_one(child_dependency, child_graph, root_path)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/source_code/graph.py", line 618, in _iter_one
    yield from self._iter_one(child_dependency, child_graph, root_path)
  [Previous line repeated 4 more times]
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/source_code/graph.py", line 612, in _iter_one
    yield from self._process_dependency(dependency, path_lookup, inherited_tree)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/source_code/jobs.py", line 609, in _process_dependency
    yield from self._collect_from_source(source, cell_language, dependency.path, inherited_tree)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/source_code/jobs.py", line 647, in _collect_from_source
    for item in iterable:
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/source_code/jobs.py", line 661, in _collect_from_python
    yield from collector.collect_dfsas(source)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/source_code/base.py", line 447, in collect_dfsas
    tree = self._parse_and_append(source_code)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/source_code/base.py", line 423, in _parse_and_append
    tree = Tree.normalize_and_parse(code)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/databricks/labs/ucx/source_code/python/python_ast.py", line 46, in normalize_and_parse
    root = parse(code)
           ^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/astroid/builder.py", line 300, in parse
    return builder.string_build(code, modname=module_name, path=path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/astroid/builder.py", line 151, in string_build
    module, builder = self._data_build(data, modname, path)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/astroid/builder.py", line 181, in _data_build
    node, parser_module = _parse_string(
                          ^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/astroid/builder.py", line 477, in _parse_string
    parsed = parser_module.parse(
             ^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/astroid/_ast.py", line 30, in parse
    return ast.parse(string, type_comments=type_comments)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SystemError: AST constructor recursion depth mismatch (before=90, after=65)
@nfx nfx added bug migrate/code Abstract Syntax Trees and other dark magic labels Oct 16, 2024
nfx added a commit that referenced this issue Oct 17, 2024
… the entire job

This PR adds more deterministic, Go-style, error handling for parsing Python code

Fix #2976
@nfx nfx closed this as completed in #3000 Oct 17, 2024
nfx added a commit that referenced this issue Oct 17, 2024
… the entire job (#3000)

This PR adds more deterministic, Go-style, error handling for parsing
Python code

Fix #2976
nfx added a commit that referenced this issue Oct 17, 2024
* Added `lazy_loader` to known list ([#2991](#2991)). With this commit, the `lazy_loader` module has been added to the known list in the configuration file, addressing a portion of issue [#193](#193), which may have been caused by the discovery or loading of this module. The `lazy_loader` is a package or module that, once added to the known list, will be recognized and loaded by the system. This change does not affect any existing functionality or introduce new methods. The commit solely updates the known.json file to include `lazy_loader` with an empty list, indicating that it is ready for use. This modification will enable the correct loading and recognition of the `lazy_loader` module in the system.
* Added `librosa` to known list ([#2992](#2992)). In this update, we have added several open-source libraries to the known list in the configuration file, including `librosa`, `llvmlite`, `msgpack`, `pooch`, `soundfile`, and `soxr`. These libraries are commonly used in data engineering, machine learning, and scientific computing tasks. `librosa` is a Python library for audio and music analysis, while `llvmlite` is a lightweight Python interface to the LLVM compiler infrastructure. `msgpack` is a binary serialization format like JSON, `pooch` is a package for managing external data files, `soundfile` is a library for reading and writing audio files, and `soxr` is a library for high-quality audio resampling. Each library has an empty list next to it for specifying additional configuration related to the library. This update partially resolves issue [#1931](#1931) by adding `librosa` to the known list, ensuring that these libraries will be properly recognized and utilized by the codebase.
* Added `linkify-it-py` to known list ([#2993](#2993)). In this release, we have added support for two new open-source packages, `linkify-it-py` and `uc-micro-py`, to enhance the software's functionality and compatibility. The addition of `linkify-it-py` and its constituent modules, as well as the incorporation of `uc-micro-py` with its modules and classes, aims to expand the software's capabilities. These changes are related to the resolution of issue [#1931](#1931), and they will enable the software to work seamlessly with these packages, thereby providing a better user experience.
* Added `lz4` to known list ([#2994](#2994)). In this release, we have added support for the LZ4 lossless data compression algorithm, which is known for its focus on compression and decompression speed. The implementation includes four variants: lz4, lz4.block, lz4.frame, and lz4.version, each providing different levels of compression and decompression speed and flexibility. This addition expands the range of supported compression algorithms, providing more options for users to choose from and partially addressing issue [#1931](#1931) related to supporting additional compression algorithms. This improvement will be beneficial to software engineers working with data compression in their projects.
* Fixed `SystemError: AST constructor recursion depth mismatch` failing the entire job ([#3000](#3000)). This PR introduces more deterministic, Go-style, error handling for parsing Python code, addressing issues that caused the entire job to fail due to a `SystemError: AST constructor recursion depth mismatch` ([#3000](#3000)) and bug [#2976](#2976). It includes removing the `AstroidSyntaxError` import, adding an import for `SqlglotError`, and updating the `SqlParseError` exception to `SqlglotError` in the `lint` method of the `SqlLinter` class. Additionally, abstract classes `TablePyCollector` and `DfsaPyCollector` and their respective methods for collecting tables and direct file system accesses have been removed. The `PythonSequentialLinter` class, previously handling multiple responsibilities, has also been removed, enhancing code modularity, understandability, maintainability, and testability. The changes affect the `base.py`, `python_ast.py`, and `python_sequential_linter.py` modules.
* Skip applying permissions for workspace system groups to Unity Catalog resources ([#2997](#2997)). This commit introduces changes to the ACL-related code in the `databricks labs ucx create-catalog-schemas` command and the `migrate-table-*` workflow, skipping the application of permissions for workspace system groups in the Unity Catalog. These system groups, which include 'admins', do not exist at the account level. To ensure the correctness of these modifications, unit and integration tests have been added, including a test that checks the proper handling of user privileges in system groups during catalog schema creation. The `AccessControlResponse` object has been updated for the `admins` and `users` groups, granting them specific permissions for a workspace and warehouse object, respectively, enhancing the system's functionality in multi-user environments with system groups.
@nfx nfx mentioned this issue Oct 17, 2024
nfx added a commit that referenced this issue Oct 17, 2024
* Added `lazy_loader` to known list
([#2991](#2991)). With this
commit, the `lazy_loader` module has been added to the known list in the
configuration file, addressing a portion of issue
[#193](#193), which may have
been caused by the discovery or loading of this module. The
`lazy_loader` is a package or module that, once added to the known list,
will be recognized and loaded by the system. This change does not affect
any existing functionality or introduce new methods. The commit solely
updates the known.json file to include `lazy_loader` with an empty list,
indicating that it is ready for use. This modification will enable the
correct loading and recognition of the `lazy_loader` module in the
system.
* Added `librosa` to known list
([#2992](#2992)). In this
update, we have added several open-source libraries to the known list in
the configuration file, including `librosa`, `llvmlite`, `msgpack`,
`pooch`, `soundfile`, and `soxr`. These libraries are commonly used in
data engineering, machine learning, and scientific computing tasks.
`librosa` is a Python library for audio and music analysis, while
`llvmlite` is a lightweight Python interface to the LLVM compiler
infrastructure. `msgpack` is a binary serialization format like JSON,
`pooch` is a package for managing external data files, `soundfile` is a
library for reading and writing audio files, and `soxr` is a library for
high-quality audio resampling. Each library has an empty list next to it
for specifying additional configuration related to the library. This
update partially resolves issue
[#1931](#1931) by adding
`librosa` to the known list, ensuring that these libraries will be
properly recognized and utilized by the codebase.
* Added `linkify-it-py` to known list
([#2993](#2993)). In this
release, we have added support for two new open-source packages,
`linkify-it-py` and `uc-micro-py`, to enhance the software's
functionality and compatibility. The addition of `linkify-it-py` and its
constituent modules, as well as the incorporation of `uc-micro-py` with
its modules and classes, aims to expand the software's capabilities.
These changes are related to the resolution of issue
[#1931](#1931), and they
will enable the software to work seamlessly with these packages, thereby
providing a better user experience.
* Added `lz4` to known list
([#2994](#2994)). In this
release, we have added support for the LZ4 lossless data compression
algorithm, which is known for its focus on compression and decompression
speed. The implementation includes four variants: lz4, lz4.block,
lz4.frame, and lz4.version, each providing different levels of
compression and decompression speed and flexibility. This addition
expands the range of supported compression algorithms, providing more
options for users to choose from and partially addressing issue
[#1931](#1931) related to
supporting additional compression algorithms. This improvement will be
beneficial to software engineers working with data compression in their
projects.
* Fixed `SystemError: AST constructor recursion depth mismatch` failing
the entire job
([#3000](#3000)). This PR
introduces more deterministic, Go-style, error handling for parsing
Python code, addressing issues that caused the entire job to fail due to
a `SystemError: AST constructor recursion depth mismatch`
([#3000](#3000)) and bug
[#2976](#2976). It includes
removing the `AstroidSyntaxError` import, adding an import for
`SqlglotError`, and updating the `SqlParseError` exception to
`SqlglotError` in the `lint` method of the `SqlLinter` class.
Additionally, abstract classes `TablePyCollector` and `DfsaPyCollector`
and their respective methods for collecting tables and direct file
system accesses have been removed. The `PythonSequentialLinter` class,
previously handling multiple responsibilities, has also been removed,
enhancing code modularity, understandability, maintainability, and
testability. The changes affect the `base.py`, `python_ast.py`, and
`python_sequential_linter.py` modules.
* Skip applying permissions for workspace system groups to Unity Catalog
resources ([#2997](#2997)).
This commit introduces changes to the ACL-related code in the
`databricks labs ucx create-catalog-schemas` command and the
`migrate-table-*` workflow, skipping the application of permissions for
workspace system groups in the Unity Catalog. These system groups, which
include 'admins', do not exist at the account level. To ensure the
correctness of these modifications, unit and integration tests have been
added, including a test that checks the proper handling of user
privileges in system groups during catalog schema creation. The
`AccessControlResponse` object has been updated for the `admins` and
`users` groups, granting them specific permissions for a workspace and
warehouse object, respectively, enhancing the system's functionality in
multi-user environments with system groups.
nfx added a commit that referenced this issue Nov 7, 2024
…loop

`default-format-changed-in-dbr8` and `sql-parse-error` are ignored for LSP plugin output. Bug was fixed in v0.46.0

- #3000
- #3027

See:
- #2976
nfx added a commit that referenced this issue Nov 8, 2024
…loop (#3225)

Bug was fixed in v0.46.0

- #3000
- #3027

See:
- #2976

`default-format-changed-in-dbr8` and `sql-parse-error` are ignored for
LSP plugin output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
migrate/code Abstract Syntax Trees and other dark magic
Projects
1 participant