Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cursor based lexer #6012

Merged
merged 2 commits into from
Jul 26, 2023
Merged

Use cursor based lexer #6012

merged 2 commits into from
Jul 26, 2023

Conversation

MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Jul 23, 2023

Summary

This pulls in the cursor-based Lexer implementation that significantly speeds up performance. The RustPython-parser PR contains a more in-depth description of the change.

Test Plan

cargo test, ecosystem check

@MichaReiser
Copy link
Member Author

Current dependencies on/for this PR:

This comment was auto-generated by Graphite.

@MichaReiser MichaReiser force-pushed the use-cursor-based-lexer branch 3 times, most recently from 1aaef3e to a8c1609 Compare July 23, 2023 10:53
@github-actions
Copy link
Contributor

github-actions bot commented Jul 23, 2023

PR Check Results

Ecosystem

ℹ️ ecosystem check detected changes. (+12, -12, 0 error(s))

airflow (+12, -12)

- tests/api_connexion/endpoints/test_dag_endpoint.py:124:25: S108 Probable insecure usage of temporary file or directory: "/tmp/dag_"
+ tests/api_connexion/endpoints/test_dag_endpoint.py:124:27: S108 Probable insecure usage of temporary file or directory: "/tmp/dag_"
- tests/api_connexion/endpoints/test_import_error_endpoint.py:239:49: S108 Probable insecure usage of temporary file or directory: "/tmp/file_"
+ tests/api_connexion/endpoints/test_import_error_endpoint.py:239:51: S108 Probable insecure usage of temporary file or directory: "/tmp/file_"
- tests/api_connexion/endpoints/test_import_error_endpoint.py:241:48: S108 Probable insecure usage of temporary file or directory: "/tmp/file_"
+ tests/api_connexion/endpoints/test_import_error_endpoint.py:241:50: S108 Probable insecure usage of temporary file or directory: "/tmp/file_"
- tests/api_connexion/endpoints/test_import_error_endpoint.py:242:48: S108 Probable insecure usage of temporary file or directory: "/tmp/file_"
+ tests/api_connexion/endpoints/test_import_error_endpoint.py:242:50: S108 Probable insecure usage of temporary file or directory: "/tmp/file_"
- tests/api_connexion/endpoints/test_import_error_endpoint.py:244:56: S108 Probable insecure usage of temporary file or directory: "/tmp/file_"
+ tests/api_connexion/endpoints/test_import_error_endpoint.py:244:58: S108 Probable insecure usage of temporary file or directory: "/tmp/file_"
- tests/api_connexion/endpoints/test_import_error_endpoint.py:251:26: S108 Probable insecure usage of temporary file or directory: "/tmp/file_"
+ tests/api_connexion/endpoints/test_import_error_endpoint.py:251:28: S108 Probable insecure usage of temporary file or directory: "/tmp/file_"
- tests/api_connexion/endpoints/test_import_error_endpoint.py:269:26: S108 Probable insecure usage of temporary file or directory: "/tmp/file_"
+ tests/api_connexion/endpoints/test_import_error_endpoint.py:269:28: S108 Probable insecure usage of temporary file or directory: "/tmp/file_"
- tests/api_connexion/endpoints/test_import_error_endpoint.py:285:26: S108 Probable insecure usage of temporary file or directory: "/tmp/file_"
+ tests/api_connexion/endpoints/test_import_error_endpoint.py:285:28: S108 Probable insecure usage of temporary file or directory: "/tmp/file_"
- tests/system/providers/google/cloud/dataproc/example_dataproc_pyspark.py:64:23: S108 Probable insecure usage of temporary file or directory: "/tmp/"
+ tests/system/providers/google/cloud/dataproc/example_dataproc_pyspark.py:64:25: S108 Probable insecure usage of temporary file or directory: "/tmp/"
- tests/system/providers/google/cloud/dataproc/example_dataproc_sparkr.py:63:23: S108 Probable insecure usage of temporary file or directory: "/tmp/"
+ tests/system/providers/google/cloud/dataproc/example_dataproc_sparkr.py:63:25: S108 Probable insecure usage of temporary file or directory: "/tmp/"
- tests/system/providers/google/cloud/ml_engine/example_mlengine.py:61:15: S108 Probable insecure usage of temporary file or directory: "/tmp/"
+ tests/system/providers/google/cloud/ml_engine/example_mlengine.py:61:30: S108 Probable insecure usage of temporary file or directory: "/tmp/"
- tests/system/providers/google/cloud/ml_engine/example_mlengine_async.py:61:15: S108 Probable insecure usage of temporary file or directory: "/tmp/"
+ tests/system/providers/google/cloud/ml_engine/example_mlengine_async.py:61:30: S108 Probable insecure usage of temporary file or directory: "/tmp/"

Rules changed: 1
Rule Changes Additions Removals
S108 24 12 12

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.09      9.3±0.02ms     4.4 MB/sec    1.00      8.5±0.04ms     4.8 MB/sec
formatter/numpy/ctypeslib.py               1.11   1866.5±2.29µs     8.9 MB/sec    1.00  1684.1±27.56µs     9.9 MB/sec
formatter/numpy/globals.py                 1.12    210.0±0.36µs    14.0 MB/sec    1.00    187.5±7.74µs    15.7 MB/sec
formatter/pydantic/types.py                1.12      4.0±0.01ms     6.3 MB/sec    1.00      3.6±0.06ms     7.1 MB/sec
linter/all-rules/large/dataset.py          1.08     12.5±0.02ms     3.3 MB/sec    1.00     11.6±0.05ms     3.5 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.07      3.2±0.00ms     5.2 MB/sec    1.00      3.0±0.03ms     5.6 MB/sec
linter/all-rules/numpy/globals.py          1.07    420.6±0.83µs     7.0 MB/sec    1.00    394.3±2.67µs     7.5 MB/sec
linter/all-rules/pydantic/types.py         1.09      5.8±0.01ms     4.4 MB/sec    1.00      5.3±0.02ms     4.8 MB/sec
linter/default-rules/large/dataset.py      1.14      6.5±0.01ms     6.2 MB/sec    1.00      5.7±0.02ms     7.1 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.16   1406.0±1.58µs    11.8 MB/sec    1.00   1212.7±5.57µs    13.7 MB/sec
linter/default-rules/numpy/globals.py      1.17    156.7±1.31µs    18.8 MB/sec    1.00    133.8±0.36µs    22.1 MB/sec
linter/default-rules/pydantic/types.py     1.17      3.0±0.01ms     8.6 MB/sec    1.00      2.5±0.01ms    10.1 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00     11.2±0.18ms     3.6 MB/sec    1.07     12.0±0.18ms     3.4 MB/sec
formatter/numpy/ctypeslib.py               1.00      2.2±0.03ms     7.6 MB/sec    1.01      2.2±0.04ms     7.6 MB/sec
formatter/numpy/globals.py                 1.01    240.0±5.41µs    12.3 MB/sec    1.00    236.5±8.27µs    12.5 MB/sec
formatter/pydantic/types.py                1.00      4.8±0.07ms     5.3 MB/sec    1.02      4.9±0.17ms     5.2 MB/sec
linter/all-rules/large/dataset.py          1.04     15.7±0.26ms     2.6 MB/sec    1.00     15.1±0.24ms     2.7 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.04      4.0±0.07ms     4.1 MB/sec    1.00      3.9±0.07ms     4.3 MB/sec
linter/all-rules/numpy/globals.py          1.07    480.9±9.25µs     6.1 MB/sec    1.00    451.1±7.26µs     6.5 MB/sec
linter/all-rules/pydantic/types.py         1.08      7.3±0.16ms     3.5 MB/sec    1.00      6.8±0.13ms     3.8 MB/sec
linter/default-rules/large/dataset.py      1.10      8.4±0.11ms     4.9 MB/sec    1.00      7.6±0.10ms     5.3 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.10  1694.3±23.93µs     9.8 MB/sec    1.00  1537.9±24.96µs    10.8 MB/sec
linter/default-rules/numpy/globals.py      1.15    189.3±3.74µs    15.6 MB/sec    1.00    165.2±3.18µs    17.9 MB/sec
linter/default-rules/pydantic/types.py     1.12      3.7±0.05ms     6.9 MB/sec    1.00      3.3±0.05ms     7.8 MB/sec

@MichaReiser MichaReiser force-pushed the use-cursor-based-lexer branch 4 times, most recently from e1cddf1 to a46b9c0 Compare July 24, 2023 12:43
@MichaReiser MichaReiser added the performance Potential performance improvement label Jul 24, 2023
@MichaReiser MichaReiser force-pushed the use-cursor-based-lexer branch 4 times, most recently from defc8d0 to 78b0ee8 Compare July 26, 2023 06:33
@@ -1 +1 @@
broken "§=($/=(")
broken "§=($/=()
Copy link
Member Author

@MichaReiser MichaReiser Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to create another lexer error because the lexer no longer marks in-balanced parentheses pairs as errors (because it also doesn't complain about [{(]}) as unbalanced even though the parens are closed in the wrong order.

@MichaReiser MichaReiser marked this pull request as ready for review July 26, 2023 06:36
@MichaReiser MichaReiser requested a review from dhruvmanila as a code owner July 26, 2023 06:36
@MichaReiser
Copy link
Member Author

The ecosystem changes are due to the now more accurate ranges of f-string parts.

@MichaReiser MichaReiser force-pushed the use-cursor-based-lexer branch from 78b0ee8 to e07af0e Compare July 26, 2023 08:49
Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@konstin
Copy link
Member

konstin commented Jul 26, 2023

The benchmarks look great!

@MichaReiser MichaReiser merged commit 16e1737 into main Jul 26, 2023
@MichaReiser MichaReiser deleted the use-cursor-based-lexer branch July 26, 2023 09:32
@konstin konstin mentioned this pull request Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential performance improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants