-
-
Notifications
You must be signed in to change notification settings - Fork 755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update individual rules to take advantage of core rule processing changes #3041
Update individual rules to take advantage of core rule processing changes #3041
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3041 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 164 164
Lines 12106 12150 +44
=========================================
+ Hits 12106 12150 +44
Continue to review full report at Codecov.
|
src/sqlfluff/core/linter/linter.py
Outdated
@@ -506,6 +507,10 @@ def lint_fix_parsed( | |||
) | |||
|
|||
for crawler in progress_bar_crawler: | |||
# Performance: After first loop pass, skip rules that don't do fixes. | |||
if fix and loop > 0 and not is_fix_compatible(crawler): | |||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About 1/4 of the rules aren't fix compatible. But we were executing them on every linter pass. Wasteful!
src/sqlfluff/core/rules/base.py
Outdated
@@ -416,6 +416,48 @@ def raw_segments(self): | |||
) | |||
|
|||
|
|||
class CrawlBehavior: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted segment traversal code from BaseRule.crawl()
into a separate class. This makes it easier to vary iteration strategies with less risk of breaking something else.
Nice improvement on "Rules critical errors tests" already! Maybe this fixes #3034 sufficiently that we can close that? |
Oh, nice! I hadn't noticed. I updated the PR description to say "fixes" that issue. 👍 |
src/sqlfluff/core/linter/linter.py
Outdated
fname=fname, | ||
fix=fix, | ||
templated_file=templated_file, | ||
for phase in ["main", "post"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use "Hide whitespace" to review the changes in this section, since it adds a new loop and an additional level of indentation.
src/sqlfluff/core/rules/base.py
Outdated
@@ -372,7 +391,7 @@ class FunctionalRuleContext: | |||
def __init__(self, context: RuleContext): | |||
self.context = context | |||
|
|||
@cached_property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed caching of most properties now that we are reusing RuleContext
objects
src/sqlfluff/rules/L003.py
Outdated
|
||
line_no = memory.line_no | ||
target_line_no = cached_line_count + 1 | ||
for idx, elem in enumerate(raw_stack[memory.start_process_raw_idx :]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some unrelated, specific tuning of L003 here. Previously, it was starting from the beginning of raw_stack
every time, which caused O(n^2) performance. Now, it remembers the last line number and position in raw_stack
, so it can resume scanning on the "current" line. This shows up as a lot fewer executions of these lines in the Python line profiler and a big reduction in "percent of overall function time".
…com/barrywhart/sqlfluff into bhart-issues_3035_3037_core_changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -408,7 +414,7 @@ def _eval(self, context: RuleContext) -> Optional[LintResult]: | |||
# First non-whitespace element is our trigger | |||
memory.trigger = segment | |||
|
|||
is_last = self.is_final_segment(context) | |||
is_last = context.segment is context.final_segment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Legend
src/sqlfluff/rules/L005.py
Outdated
@@ -43,8 +44,8 @@ def _eval(self, context: RuleContext) -> Optional[LintResult]: | |||
We need at least one segment behind us for this to work. | |||
|
|||
""" | |||
if len(context.raw_stack) >= 1: | |||
cm1 = context.raw_stack[-1] | |||
if context.raw_segment_pre is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch to a guard here right ?
if context.raw_segment_pre is not None: | |
if context.raw_segment_pre is None: | |
return |
Im happy enough with everything going on here (especially having read your first split out first). It seems pretty "straightforward". (I could see some places to go further with the code splitting / Rule Flags) :) but all the change look good |
Comment changes suggested by Barry Pollard Co-authored-by: Barry Pollard <barry_pollard@hotmail.com>
…com/barrywhart/sqlfluff into bhart-issues_3035_3037_core_changes
…037_rule_iteration_refactor
anchor = cm1 | ||
return LintResult(anchor=anchor, fixes=[LintFix.delete(cm1)]) | ||
# Otherwise fine | ||
"""Commas should not have whitespace directly before them.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary change to this file is switching from raw_stack
to raw_segment_pre
, which is faster. Also did some tidying, but no other meaningful changes. @OTooleMichael
src/sqlfluff/core/rules/base.py
Outdated
@@ -503,12 +520,12 @@ class BaseRule: | |||
# Lint loop / crawl behavior. When appropriate, rules can (and should) | |||
# override these values to make linting faster. | |||
recurse_into = True | |||
needs_raw_stack = True | |||
needs_raw_stack = False # False is faster & most rules don't need it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we give more commentary as towhy a rule would need or not need raw_stack
? I presume it's if you are looking at a segment's parents to find if it's nested in a particular statement? Remind me again what the difference is. between raw_stack
and raw_segment_pre
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add more comments. Briefly,raw_segment_pre
is the one raw segment prior to the current one, while raw_stack
is a list of every raw segment prior. The former is pretty lightweight to compute, while the latter is pretty heavyweight, as we're creating a new (and increasingly lengthy) tuple as we proceed through the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn’t get that.
Is that just the parent Segment? Would that be a better name for it rather than raw_stack_prev
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not the parent. Raw segments are (IIRC) those at the leaf of the tree, i.e. no children. Generally, these are a single keyword, operator, literal, or a meta segment like indent or dedent, like the "skin" of the tree as opposed to the "bones", I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previous_raw_segement
seems better as a name or raw_segment_previous
.
at least prev
def get_last_segment(segment: Segments) -> Tuple[List[BaseSegment], Segments]: | ||
"""Returns rightmost & lowest descendant and its "parent stack".""" | ||
parent_stack: List[BaseSegment] = [] | ||
while True: | ||
children = segment.children() | ||
if children: | ||
parent_stack.append(segment[0]) | ||
segment = children.last() | ||
else: | ||
return parent_stack, segment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't it use the new final_segment
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it also needs the parent stack, which is not provided by RuleContext.final_segment
. I could've had RuleContext
return both, but that'd be harder to document clearly, and AFAIK, this is the only rule that needs it.
@tunetheweb: Ok, ready for final review. That is, I added the new docs to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks fine, but have some general comments for you to consider.
``_works_on_unparsable`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^ | ||
By default, `SQLFluff` calls ``_eval()`` for all segments, even "unparsable" | ||
segments, i.e. segments that didn't match the parsing rules in the dialect. | ||
This causes issues for some rules. If so, setting ``_works_on_unparsable`` | ||
to ``False`` tells SQLFluff not to call ``_eval()`` for unparsable segments and | ||
their descendants. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this right? I thought default for this was false
and you had to explicitly set it to true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I thought, too, but I checked. I think maybe I (we) were confusing this with the --fix-even-unparsable
thing added recently. We've got so many settings. 🤯
IIUC, the setting documented here controls whether the rule is called for unparsable segments, and --fix-even-unparsable
controls whether any fixes would be applied. (Note that you can lint without fixing!)
src/sqlfluff/core/rules/base.py
Outdated
@@ -503,12 +520,27 @@ class BaseRule: | |||
# Lint loop / crawl behavior. When appropriate, rules can (and should) | |||
# override these values to make linting faster. | |||
recurse_into = True | |||
needs_raw_stack = True | |||
# False is faster & most rules don't need it. Rules that use it are usually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me that the False refers to recurse_into
or needs_raw_stack
. I know it's the latter because of this PR but could be clearer for future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update to mention the property name. 👍
@@ -408,7 +414,7 @@ def _eval(self, context: RuleContext) -> Optional[LintResult]: | |||
# First non-whitespace element is our trigger | |||
memory.trigger = segment | |||
|
|||
is_last = self.is_final_segment(context) | |||
is_last = context.segment is context.final_segment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we avoid rule writers using the wrong call in future? These seem very similarly named and therefore not crazy to think they are interchangeable.
Co-authored-by: Barry Pollard <barry_pollard@hotmail.com>
…thub.com/barrywhart/sqlfluff into bhart-issue_3037_rule_iteration_refactor
post
linter phase.Brief summary of the change made
Fixes #3034, #3035, #3037
Are there any other side effects of this change that we should be aware of?
Pull Request checklist
Please confirm you have completed any of the necessary steps below.
Included test cases to demonstrate any code changes, which may be one or more of the following:
.yml
rule test cases intest/fixtures/rules/std_rule_cases
..sql
/.yml
parser test cases intest/fixtures/dialects
(note YML files can be auto generated withtox -e generate-fixture-yml
).test/fixtures/linter/autofix
.Added appropriate documentation for the change.
Created GitHub issues for any relevant followup/future enhancements if appropriate.