Update individual rules to take advantage of core rule processing changes #3041

barrywhart · 2022-04-07T21:56:15Z

Update L009 and L050 to only be called once (parse tree root)
Update rules 9, 10, 14, 30, 40, 50, 63 to run in the post linter phase.
Update docs for rule authors to cover the new iteration options

Brief summary of the change made

Fixes #3034, #3035, #3037

Are there any other side effects of this change that we should be aware of?

Pull Request checklist

Please confirm you have completed any of the necessary steps below.
Included test cases to demonstrate any code changes, which may be one or more of the following:
- .yml rule test cases in test/fixtures/rules/std_rule_cases.
- .sql/.yml parser test cases in test/fixtures/dialects (note YML files can be auto generated with tox -e generate-fixture-yml).
- Full autofix test cases in test/fixtures/linter/autofix.
- Other.
Added appropriate documentation for the change.
Created GitHub issues for any relevant followup/future enhancements if appropriate.

codecov · 2022-04-08T00:55:06Z

Codecov Report

Merging #3041 (e28b033) into main (bbb8be7) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main     #3041   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          164       164           
  Lines        12106     12150   +44     
=========================================
+ Hits         12106     12150   +44

Impacted Files	Coverage Δ
src/sqlfluff/rules/L004.py	`100.00% <ø> (ø)`
src/sqlfluff/core/parser/segments/base.py	`100.00% <100.00%> (ø)`
src/sqlfluff/core/rules/base.py	`100.00% <100.00%> (ø)`
src/sqlfluff/rules/L001.py	`100.00% <100.00%> (ø)`
src/sqlfluff/rules/L002.py	`100.00% <100.00%> (ø)`
src/sqlfluff/rules/L003.py	`100.00% <100.00%> (ø)`
src/sqlfluff/rules/L005.py	`100.00% <100.00%> (ø)`
src/sqlfluff/rules/L009.py	`100.00% <100.00%> (ø)`
src/sqlfluff/rules/L010.py	`100.00% <100.00%> (ø)`
src/sqlfluff/rules/L011.py	`100.00% <100.00%> (ø)`
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bbb8be7...e28b033. Read the comment docs.

barrywhart · 2022-04-08T02:12:00Z

src/sqlfluff/core/linter/linter.py

@@ -506,6 +507,10 @@ def lint_fix_parsed(
            )

            for crawler in progress_bar_crawler:
+                # Performance: After first loop pass, skip rules that don't do fixes.
+                if fix and loop > 0 and not is_fix_compatible(crawler):
+                    continue


About 1/4 of the rules aren't fix compatible. But we were executing them on every linter pass. Wasteful!

barrywhart · 2022-04-08T02:13:29Z

src/sqlfluff/core/rules/base.py

@@ -416,6 +416,48 @@ def raw_segments(self):
        )


+class CrawlBehavior:


Extracted segment traversal code from BaseRule.crawl() into a separate class. This makes it easier to vary iteration strategies with less risk of breaking something else.

tunetheweb · 2022-04-08T09:57:34Z

Nice improvement on "Rules critical errors tests" already! Maybe this fixes #3034 sufficiently that we can close that?

barrywhart · 2022-04-08T10:21:57Z

@tunetheweb:

Nice improvement on "Rules critical errors tests" already! Maybe this fixes #3034 sufficiently that we can close that?

Oh, nice! I hadn't noticed. I updated the PR description to say "fixes" that issue. 👍

…mand

barrywhart · 2022-04-08T12:37:30Z

src/sqlfluff/core/linter/linter.py

-                    fname=fname,
-                    fix=fix,
-                    templated_file=templated_file,
+        for phase in ["main", "post"]:


Use "Hide whitespace" to review the changes in this section, since it adds a new loop and an additional level of indentation.

barrywhart · 2022-04-08T12:39:07Z

src/sqlfluff/core/rules/base.py

@@ -372,7 +391,7 @@ class FunctionalRuleContext:
    def __init__(self, context: RuleContext):
        self.context = context

-    @cached_property


Removed caching of most properties now that we are reusing RuleContext objects

barrywhart · 2022-04-08T21:12:13Z

src/sqlfluff/rules/L003.py

+
+        line_no = memory.line_no
+        target_line_no = cached_line_count + 1
+        for idx, elem in enumerate(raw_stack[memory.start_process_raw_idx :]):


Some unrelated, specific tuning of L003 here. Previously, it was starting from the beginning of raw_stack every time, which caused O(n^2) performance. Now, it remembers the last line number and position in raw_stack, so it can resume scanning on the "current" line. This shows up as a lot fewer executions of these lines in the Python line profiler and a big reduction in "percent of overall function time".

…com/barrywhart/sqlfluff into bhart-issues_3035_3037_core_changes

OTooleMichael

LGTM

OTooleMichael · 2022-04-09T15:52:18Z

src/sqlfluff/rules/L003.py

@@ -408,7 +414,7 @@ def _eval(self, context: RuleContext) -> Optional[LintResult]:
                # First non-whitespace element is our trigger
                memory.trigger = segment

-        is_last = self.is_final_segment(context)
+        is_last = context.segment is context.final_segment


OTooleMichael · 2022-04-09T15:54:00Z

src/sqlfluff/rules/L005.py

@@ -43,8 +44,8 @@ def _eval(self, context: RuleContext) -> Optional[LintResult]:
        We need at least one segment behind us for this to work.

        """
-        if len(context.raw_stack) >= 1:
-            cm1 = context.raw_stack[-1]
+        if context.raw_segment_pre is not None:


Switch to a guard here right ?

Suggested change

if context.raw_segment_pre is not None:

if context.raw_segment_pre is None:

return

OTooleMichael · 2022-04-09T16:02:11Z

Im happy enough with everything going on here (especially having read your first split out first). It seems pretty "straightforward". (I could see some places to go further with the code splitting / Rule Flags) :) but all the change look good

Comment changes suggested by Barry Pollard Co-authored-by: Barry Pollard <barry_pollard@hotmail.com>

…com/barrywhart/sqlfluff into bhart-issues_3035_3037_core_changes

…037_rule_iteration_refactor

barrywhart · 2022-04-09T18:18:49Z

src/sqlfluff/rules/L005.py

-                anchor = cm1
-                return LintResult(anchor=anchor, fixes=[LintFix.delete(cm1)])
-        # Otherwise fine
+        """Commas should not have whitespace directly before them."""


The primary change to this file is switching from raw_stack to raw_segment_pre, which is faster. Also did some tidying, but no other meaningful changes. @OTooleMichael

tunetheweb · 2022-04-09T18:47:48Z

src/sqlfluff/core/rules/base.py

@@ -503,12 +520,12 @@ class BaseRule:
    # Lint loop / crawl behavior. When appropriate, rules can (and should)
    # override these values to make linting faster.
    recurse_into = True
-    needs_raw_stack = True
+    needs_raw_stack = False  # False is faster & most rules don't need it


Can we give more commentary as towhy a rule would need or not need raw_stack? I presume it's if you are looking at a segment's parents to find if it's nested in a particular statement? Remind me again what the difference is. between raw_stack and raw_segment_pre

I'll add more comments. Briefly,raw_segment_pre is the one raw segment prior to the current one, while raw_stack is a list of every raw segment prior. The former is pretty lightweight to compute, while the latter is pretty heavyweight, as we're creating a new (and increasingly lengthy) tuple as we proceed through the file.

Oh I didn’t get that.

Is that just the parent Segment? Would that be a better name for it rather than raw_stack_prev?

No, it's not the parent. Raw segments are (IIRC) those at the leaf of the tree, i.e. no children. Generally, these are a single keyword, operator, literal, or a meta segment like indent or dedent, like the "skin" of the tree as opposed to the "bones", I guess?

previous_raw_segement seems better as a name or raw_segment_previous.
at least prev

tunetheweb · 2022-04-09T18:54:07Z

src/sqlfluff/rules/L009.py

+def get_last_segment(segment: Segments) -> Tuple[List[BaseSegment], Segments]:
+    """Returns rightmost & lowest descendant and its "parent stack"."""
+    parent_stack: List[BaseSegment] = []
+    while True:
+        children = segment.children()
+        if children:
+            parent_stack.append(segment[0])
+            segment = children.last()
+        else:
+            return parent_stack, segment


Why doesn't it use the new final_segment function?

Because it also needs the parent stack, which is not provided by RuleContext.final_segment. I could've had RuleContext return both, but that'd be harder to document clearly, and AFAIK, this is the only rule that needs it.

barrywhart · 2022-04-09T19:27:14Z

@tunetheweb: Ok, ready for final review. That is, I added the new docs to developingrules.rst. There's probably some duplication between that file and some of the code comments. Might be worth moving things around a bit? It always feels tricky to get this right.

tunetheweb

Code looks fine, but have some general comments for you to consider.

tunetheweb · 2022-04-11T13:26:25Z

docs/source/developingrules.rst

+``_works_on_unparsable``
+^^^^^^^^^^^^^^^^^^^^^^^^
+By default, `SQLFluff` calls ``_eval()`` for all segments, even "unparsable"
+segments, i.e. segments that didn't match the parsing rules in the dialect.
+This causes issues for some rules. If so, setting ``_works_on_unparsable``
+to ``False`` tells SQLFluff not to call ``_eval()`` for unparsable segments and
+their descendants.


Is this right? I thought default for this was false and you had to explicitly set it to true?

That's what I thought, too, but I checked. I think maybe I (we) were confusing this with the --fix-even-unparsable thing added recently. We've got so many settings. 🤯

IIUC, the setting documented here controls whether the rule is called for unparsable segments, and --fix-even-unparsable controls whether any fixes would be applied. (Note that you can lint without fixing!)

docs/source/developingrules.rst

tunetheweb · 2022-04-11T13:50:20Z

src/sqlfluff/core/rules/base.py

@@ -503,12 +520,27 @@ class BaseRule:
    # Lint loop / crawl behavior. When appropriate, rules can (and should)
    # override these values to make linting faster.
    recurse_into = True
-    needs_raw_stack = True
+    # False is faster & most rules don't need it. Rules that use it are usually


It's not clear to me that the False refers to recurse_into or needs_raw_stack. I know it's the latter because of this PR but could be clearer for future.

Will update to mention the property name. 👍

tunetheweb · 2022-04-11T13:51:41Z

src/sqlfluff/rules/L003.py

@@ -408,7 +414,7 @@ def _eval(self, context: RuleContext) -> Optional[LintResult]:
                # First non-whitespace element is our trigger
                memory.trigger = segment

-        is_last = self.is_final_segment(context)
+        is_last = context.segment is context.final_segment


How can we avoid rule writers using the wrong call in future? These seem very similarly named and therefore not crazy to think they are interchangeable.

Co-authored-by: Barry Pollard <barry_pollard@hotmail.com>

…thub.com/barrywhart/sqlfluff into bhart-issue_3037_rule_iteration_refactor

Refactor rule segment iteration for flexibility and speed

283649e

barrywhart marked this pull request as draft April 7, 2022 21:56

Barry Hart added 2 commits April 7, 2022 20:09

Fix a bug that was clearing memory when _eval() returned None

3aa5abd

Create CrawlBehavior class, fix broken test

ea80c56

Update rule L009 to set recurse_into = False

d5fe681

barrywhart requested review from tunetheweb and WittierDinosaur April 8, 2022 02:00

After first linter loop pass, skip rules that don't do fixes

1488c5b

barrywhart changed the title ~~Refactor rule segment iteration for flexibility and speed~~ Refactor core rule processing for flexibility and speed Apr 8, 2022

barrywhart commented Apr 8, 2022

View reviewed changes

Barry Hart and others added 7 commits April 8, 2022 06:43

Reduce number of RuleContext objects, compute siblings_pre/post on de…

de85ca7

…mand

Add "raw_segment_pre" as an alternative to "raw_stack"

d138701

Coverage

abcca8a

Rules declare whether they need context.raw_stack

78e42e8

Merge branch 'main' into bhart-issue_3037_rule_iteration_refactor

7e37250

Implement linter phases so post-processing rules only run once

b426845

Merge branch 'main' into bhart-issue_3037_rule_iteration_refactor

4056496

barrywhart commented Apr 8, 2022

View reviewed changes

Barry Hart added 6 commits April 8, 2022 08:47

Tidy, simplify

72e3bcd

Update L050 to use recurse_into = False

a723852

Tidying

5bc2b3d

Merge branch 'main' into bhart-issue_3037_rule_iteration_refactor

12141a9

Tune L003 a bit

2b11c08

Refactor L003 for performance, add RuleContext.final_segment property

3d11b6c

barrywhart commented Apr 8, 2022

View reviewed changes

Barry Hart and others added 7 commits April 9, 2022 09:11

Only "main" phase when linting (as opposed to fixing)

8fc64c7

Merge branch 'main' into bhart-issues_3035_3037_core_changes

2b2cc1f

Move test fixes from big PR

d636e10

Merge branch 'bhart-issues_3035_3037_core_changes' of https://github.…

168022c

…com/barrywhart/sqlfluff into bhart-issues_3035_3037_core_changes

PR review

7c32cd4

Simplify how we detect first linter pass and track lint violations

56102c8

Coverage

dc37633

OTooleMichael approved these changes Apr 9, 2022

View reviewed changes

barrywhart and others added 7 commits April 9, 2022 13:46

Apply suggestions from code review

cedbbe1

Comment changes suggested by Barry Pollard Co-authored-by: Barry Pollard <barry_pollard@hotmail.com>

PR review

52eeebc

Merge branch 'main' into bhart-issues_3035_3037_core_changes

9b7deb9

Merge branch 'bhart-issues_3035_3037_core_changes' of https://github.…

c4f5f8b

…com/barrywhart/sqlfluff into bhart-issues_3035_3037_core_changes

Merge branch 'bhart-issues_3035_3037_core_changes' into bhart-issue_3…

23780ad

…037_rule_iteration_refactor

PR review

d93f4f2

Merge branch 'main' into bhart-issue_3037_rule_iteration_refactor

571b209

barrywhart commented Apr 9, 2022

View reviewed changes

Comments, tidying

93efb1e

tunetheweb reviewed Apr 9, 2022

View reviewed changes

Barry Hart added 2 commits April 9, 2022 15:05

Update rule development docs with some of the new (and old) options

5f11322

Add more comments about needs_raw_stack

a220053

barrywhart added 2 commits April 9, 2022 17:34

Merge branch 'main' into bhart-issue_3037_rule_iteration_refactor

579ac9b

Merge branch 'main' into bhart-issue_3037_rule_iteration_refactor

bfef962

tunetheweb approved these changes Apr 11, 2022

View reviewed changes

Barry Hart and others added 4 commits April 11, 2022 19:17

Merge branch 'main' into bhart-issue_3037_rule_iteration_refactor

66a0507

PR review

e0d3af6

Apply suggestions from code review

a51ff1b

Co-authored-by: Barry Pollard <barry_pollard@hotmail.com>

Merge branch 'bhart-issue_3037_rule_iteration_refactor' of https://gi…

e28b033

…thub.com/barrywhart/sqlfluff into bhart-issue_3037_rule_iteration_refactor

barrywhart merged commit c039be5 into sqlfluff:main Apr 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update individual rules to take advantage of core rule processing changes #3041

Update individual rules to take advantage of core rule processing changes #3041

barrywhart commented Apr 7, 2022 •

edited

Loading

codecov bot commented Apr 8, 2022 •

edited

Loading

barrywhart Apr 8, 2022

barrywhart Apr 8, 2022 •

edited

Loading

tunetheweb commented Apr 8, 2022

barrywhart commented Apr 8, 2022

barrywhart Apr 8, 2022 •

edited

Loading

barrywhart Apr 8, 2022 •

edited

Loading

barrywhart Apr 8, 2022

OTooleMichael left a comment

OTooleMichael Apr 9, 2022

OTooleMichael Apr 9, 2022

OTooleMichael commented Apr 9, 2022

barrywhart Apr 9, 2022

tunetheweb Apr 9, 2022

barrywhart Apr 9, 2022 •

edited

Loading

tunetheweb Apr 9, 2022

barrywhart Apr 9, 2022

OTooleMichael Apr 11, 2022

tunetheweb Apr 9, 2022

barrywhart Apr 9, 2022

barrywhart commented Apr 9, 2022

tunetheweb left a comment

tunetheweb Apr 11, 2022

barrywhart Apr 11, 2022

tunetheweb Apr 11, 2022

barrywhart Apr 11, 2022

tunetheweb Apr 11, 2022

		@@ -416,6 +416,48 @@ def raw_segments(self):
		)


		class CrawlBehavior:

	if context.raw_segment_pre is not None:
	if context.raw_segment_pre is None:
	return

Update individual rules to take advantage of core rule processing changes #3041

Update individual rules to take advantage of core rule processing changes #3041

Conversation

barrywhart commented Apr 7, 2022 • edited Loading

Brief summary of the change made

Are there any other side effects of this change that we should be aware of?

Pull Request checklist

codecov bot commented Apr 8, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

barrywhart Apr 8, 2022 • edited Loading

Choose a reason for hiding this comment

tunetheweb commented Apr 8, 2022

barrywhart commented Apr 8, 2022

barrywhart Apr 8, 2022 • edited Loading

Choose a reason for hiding this comment

barrywhart Apr 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OTooleMichael left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OTooleMichael commented Apr 9, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

barrywhart Apr 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

barrywhart commented Apr 9, 2022

tunetheweb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

barrywhart commented Apr 7, 2022 •

edited

Loading

codecov bot commented Apr 8, 2022 •

edited

Loading

barrywhart Apr 8, 2022 •

edited

Loading

barrywhart Apr 8, 2022 •

edited

Loading

barrywhart Apr 8, 2022 •

edited

Loading

barrywhart Apr 9, 2022 •

edited

Loading