Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rough autofix for L028 #2757

Merged
merged 11 commits into from
Mar 4, 2022
Merged

Rough autofix for L028 #2757

merged 11 commits into from
Mar 4, 2022

Conversation

OTooleMichael
Copy link
Contributor

This adds fixing L028

However L028 is maybe flawed from the bottom up? The changes are subject to opinion and maybe not all are wanted/needed

  • remove aliases when we desire unqualified
  • add aliases when qualified
  • add fix directive fix_inconsistent_to setting for fix inconsistent to either unqualified or qualified when encountered
  • check the "STUCT" case when possible. a.struct_value FROM tbl_name must be a struct as a is not the table reference. If a is intended to be the table this code will not run. If we assume its valid code and "fix" it, it will not change its runnableness (valid code will remain valid [but now linted], in valid code will remain invalid).
  • fix_inconsistent_to also works on struct dialect
  • Added tests to cover alias case, and schema.table case and quoted case. Some of these cases are a bit weird in the past and also now. Namely SELECT schema.table.col1, table.col2 FROM schema.table is consistent and qualified.

Are there any other side effects of this change that we should be aware of?

Fixes catches and understands structs better but also differently.

Pull Request checklist

  • Please confirm you have completed any of the necessary steps below.

  • Included test cases to demonstrate any code changes, which may be one or more of the following:

    • [x].yml rule test cases in test/fixtures/rules/std_rule_cases.

Notes:

  • Typing is off references = list(sc.recursive_crawl("object_reference")) have methods like is_qualified but are typed as List[BaseSegment]. We can't run any sensible guard statement like assert isintanceof(ref, (A,B,C)) cause ObjectRefs are reimplmented in various dialects directly from BaseSegment
  • The Rule Inheritance feels like a bad idea Rule_L028(Rule_L020) - in this particular case it serve very little other than to make things hard to read and to make method/call signatures weird
  • Issue with creating more complex tree elements occurs again. Table Str_ref may be quoted or naked depending on the query and it is hard to construct the correct tree types. Without correct types how will future Rules fix things correctly
  • Inheritance essentially just does this block
if context.segment.is_type("select_statement"):
            select_info = get_select_statement_info(context.segment, context.dialect)
            if not select_info:
                return None

            # Work out if we have a parent select function
            parent_select = None
            for seg in reversed(context.parent_stack):
                if seg.is_type("select_statement"):
                    parent_select = seg
                    break

probably setting up an fn is more reusable. Also on a side note get_select_statement_info might be a good candidate for memo'ing

@codecov
Copy link

codecov bot commented Mar 2, 2022

Codecov Report

Merging #2757 (9b9565f) into main (78bb1bd) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##              main     #2757   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          163       163           
  Lines        12232     12264   +32     
=========================================
+ Hits         12232     12264   +32     
Impacted Files Coverage Δ
src/sqlfluff/rules/L028.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 78bb1bd...9b9565f. Read the comment docs.

Copy link
Member

@tunetheweb tunetheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not had too much of a look at the code, as I've a more fundamental question on the new attribute and whether it's even needed.

Love the fact you support STRUCTS now for this rule!

src/sqlfluff/core/rules/config_info.py Outdated Show resolved Hide resolved
src/sqlfluff/rules/L028.py Outdated Show resolved Hide resolved
test/fixtures/rules/std_rule_cases/L028.yml Outdated Show resolved Hide resolved
Comment on lines +47 to +34
fail_str: SELECT bar FROM my_tbl WHERE foo
fix_str: SELECT my_tbl.bar FROM my_tbl WHERE my_tbl.foo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh interesting that it affects WHERE clause too! Did it always do that? Do we need a simple "pass" test for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some clever stuff about managing self references in certain dialects.
Redshift can ref a created column.

That said

SELECT 
      a,
      b,
      a + b AS col_created_right_here,
      col_created_right_here + 1 AS sub_self_ref
  FROM tbl

Is also valid in Redshift, Snowflake and BQ
So.....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh that actually could be quite a large problem.

SELECT 
      tbl.a,
      tbl.b,
      tbl.a + tbl.b AS col_created_right_here,
      col_created_right_here + 1 AS sub_self_ref  -- noqa: disable=L028
  FROM tbl

This noqa: is very much intentional (I normally want qualified but this is actually a self ref and I know it), will it be correctly respected ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we recently discovered that noqa doesn't work in the YAML files.

My original comment was we have a fail and fix test case for WHERE reference, but no pass test case. Should we have one? In theory fail/fix should be sufficient but I always like to have a pass test case in these instances too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still wonder if we should have a simple pass_str test for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: noqa in YAML tests. This should now work correctly for the usual YAML tests (test/rules/yaml_test_cases_test.py::test__rule_test_case). Until recently, pass tests used a weird code path that skipped parts of the linter. As of a couple weeks ago, everything should work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill add a test for this.

Copy link
Contributor Author

@OTooleMichael OTooleMichael Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test added. for fail and pass w/ noqa.

Later I'd like to revisit this / make another rule for lateral column references - they cause huge problems.

Rule like lateral self references must begin with __self_

then there would be some interaction with this and other rules.

src/sqlfluff/rules/L028.py Outdated Show resolved Hide resolved
@tunetheweb
Copy link
Member

@OTooleMichael what's your thoughts on this change and v0.11.0? Think we're ready to go on that but happy to hold off if you think you'll get this finished out in next day or so?

@OTooleMichael
Copy link
Contributor Author

More

@OTooleMichael what's your thoughts on this change and v0.11.0? Think we're ready to go on that but happy to hold off if you think you'll get this finished out in next day or so?

I'll fix this in under 12 hours from now.
bigger deal is the bug fix to L049 which actually fixes to an unparsable output :) - It mangaled a tonne of my files

@tunetheweb
Copy link
Member

Cool. Will hold off for this then. And that other one!

@OTooleMichael
Copy link
Contributor Author

This should be ready

Copy link
Member

@tunetheweb tunetheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me now, but have some more comments for you.

@barrywhart could you give it a review too since you know the rules code better than me.

Comment on lines 23 to 24
structs which trigger false positives. It can be enabled with the
``force_enable = True`` flag.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you now support structs in this rule do we still need this note?
And the force_enable flag?

Also nit but if keeping it, can we update text to "This rule is disabled by default for BigQuery, Hive and Redshift due to their use of"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To what should we change the text?

I'd probably leave force enabled flag for now (issue for a bigger release) its a more serious change that could be breaking and probably outside the thought level I can give this before pushing out 0.11.1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

At the moment the text just mentions BigQuery, but it's three dialects this affects so think we should mention the other two: "This rule is disabled by default for BigQuery, Hive and Redshift due to their use of..."

Or alternatively should say "This rule is disabled by default for dialects like BigQuery due to their use of..." to avoid having to keep it in sync in future, but do think listing all three is more friendly to users.

src/sqlfluff/rules/L028.py Outdated Show resolved Hide resolved
src/sqlfluff/rules/L028.py Outdated Show resolved Hide resolved
Comment on lines +47 to +34
fail_str: SELECT bar FROM my_tbl WHERE foo
fix_str: SELECT my_tbl.bar FROM my_tbl WHERE my_tbl.foo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still wonder if we should have a simple pass_str test for this?

src/sqlfluff/rules/L028.py Show resolved Hide resolved
src/sqlfluff/rules/L028.py Outdated Show resolved Hide resolved
@OTooleMichael OTooleMichael requested a review from tunetheweb March 3, 2022 20:46
@tunetheweb tunetheweb merged commit 267b6bb into sqlfluff:main Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants