Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More dialect checking, fixes, inheritance cleanup #2942

Merged
merged 19 commits into from
Mar 30, 2022

Conversation

barrywhart
Copy link
Member

@barrywhart barrywhart commented Mar 29, 2022

Brief summary of the change made

  • Add back some runtime checks for replace() that were deleted in prior PR
  • Add a check that segment type matches base dialect when replacing a segment
  • Add a check that if we're replacing a segment that defines both match_grammar and parse_grammar, the replacement must define both of them or neither of them
  • Manual cleanup of dialects to use inheritance where appropriate

Are there any other side effects of this change that we should be aware of?

Pull Request checklist

  • Please confirm you have completed any of the necessary steps below.

  • Included test cases to demonstrate any code changes, which may be one or more of the following:

    • .yml rule test cases in test/fixtures/rules/std_rule_cases.
    • .sql/.yml parser test cases in test/fixtures/dialects (note YML files can be auto generated with tox -e generate-fixture-yml).
    • Full autofix test cases in test/fixtures/linter/autofix.
    • Other.
  • Added appropriate documentation for the change.

  • Created GitHub issues for any relevant followup/future enhancements if appropriate.

@barrywhart barrywhart marked this pull request as draft March 29, 2022 10:25
@@ -152,7 +152,28 @@ def replace(self, **kwargs: DialectElementType):
for n in kwargs:
if n not in self._library: # pragma: no cover
raise ValueError(f"{n!r} is not already registered in {self!r}")
self._library[n] = kwargs[n]

# To replace a segment, the replacement must either be a
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was in the segment() decorator I removed in the previous PR. That code is still useful, so I moved it to the still-used replace() function.

@barrywhart barrywhart changed the title Add back runtime checks for replace() that were deleted in prior PR More dialect checking, fixes, inheritance cleanup Mar 29, 2022
Comment on lines 165 to 169
if self._library[n].type != cls.type:
raise ValueError( # pragma: no cover
f"Cannot replace {n!r} because 'type' property does not "
f"match: {cls.type} != {self._library[n].type}"
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines are actually new. I had wondered if the type property was always consistent when replacing a parent segment, and I found several cases where it wasn't. This was presumably a mistake/accident. Several, but not all, of these mismatches were in the Exasol dialect.

Note that when using segment inheritance (now the recommended usual practice when replacing segments), the type property can be inherited and thus mismatches avoided.

type = "select_statement"
match_grammar = ansi_dialect.get_segment(
"UnorderedSelectStatementSegment"
).match_grammar.copy()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With inheritance, no need to copy if not modifying.

match_grammar = ansi_dialect.get_segment(
"WildcardExpressionSegment"
).match_grammar.copy(
match_grammar = ansi.WildcardExpressionSegment.match_grammar.copy(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR eliminates dialects using get_segment() in favor of direct attribute access (cleaner).

@@ -1038,13 +1031,10 @@ class SelectClauseSegment(BaseSegment):
enforce_whitespace_preceding_terminator=True,
)

parse_grammar = Ref("SelectClauseSegmentGrammar")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now inherited from the base class.

OneOf("PRECEDING", "FOLLOWING"),
),
)
_frame_extent = ansi.FrameClauseSegment._frame_extent
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed this is identical to ANSI, so I removed the duplication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it not get it automatically then through inheritance? So is this line needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you answer this one @barrywhart ? Other than that good to go I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's similar to a question you asked previously. Once the class has been created, FrameClauseSegment._frame_extent exists. But not during the class definition. On line 2754, we need to use it 3 times, so it's shorter to assign it to a variable rather than repeating the full reference 3 times:

OneOf(ansi.FrameClauseSegment._frame_extent, Sequence("BETWEEN", ansi.FrameClauseSegment._frame_extent, "AND", ansi.FrameClauseSegment._frame_extent)),

I think I shared a Stack Overflow link earlier -- if you go back and read that, they explain how this works. It's definitely different than most other OO languages like C++ and Java. The behavior is arguably "simpler", but makes code inside a class more verbose sometimes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK gotcha.

@@ -1570,16 +1563,6 @@ class AlterWarehouseStatementSegment(BaseSegment):
)


class CommentClauseSegment(BaseSegment):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire segment definition is identical to the base ANSI dialect, so I removed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has a different comment to show it's not used for views or tables. But yeah agreed not needed.

@@ -981,7 +972,7 @@ class InsertStatementSegment(BaseSegment):
https://spark.apache.org/docs/latest/sql-ref-syntax-dml-insert-overwrite-table.html
"""

type = "insert_table_statement"
type = "insert_statement"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed a type mismatch with the ANSI segment type. Did not use inheritance because ANSI segment uses parse_grammar but this one doesn't.

@codecov
Copy link

codecov bot commented Mar 29, 2022

Codecov Report

Merging #2942 (0d2e531) into main (1845353) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main     #2942   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          164       164           
  Lines        11892     11843   -49     
=========================================
- Hits         11892     11843   -49     
Impacted Files Coverage Δ
src/sqlfluff/core/dialects/base.py 100.00% <100.00%> (ø)
src/sqlfluff/dialects/dialect_ansi.py 100.00% <100.00%> (ø)
src/sqlfluff/dialects/dialect_bigquery.py 100.00% <100.00%> (ø)
src/sqlfluff/dialects/dialect_exasol.py 100.00% <100.00%> (ø)
src/sqlfluff/dialects/dialect_hive.py 100.00% <100.00%> (ø)
src/sqlfluff/dialects/dialect_mysql.py 100.00% <100.00%> (ø)
src/sqlfluff/dialects/dialect_oracle.py 100.00% <100.00%> (ø)
src/sqlfluff/dialects/dialect_postgres.py 100.00% <100.00%> (ø)
src/sqlfluff/dialects/dialect_redshift.py 100.00% <100.00%> (ø)
src/sqlfluff/dialects/dialect_snowflake.py 100.00% <100.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1845353...0d2e531. Read the comment docs.

@barrywhart barrywhart marked this pull request as ready for review March 29, 2022 12:51
@barrywhart barrywhart requested a review from tunetheweb March 29, 2022 12:51
@barrywhart barrywhart marked this pull request as draft March 29, 2022 14:09
@barrywhart barrywhart marked this pull request as ready for review March 29, 2022 14:14
src/sqlfluff/dialects/dialect_snowflake.py Outdated Show resolved Hide resolved
OneOf("PRECEDING", "FOLLOWING"),
),
)
_frame_extent = ansi.FrameClauseSegment._frame_extent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it not get it automatically then through inheritance? So is this line needed?

src/sqlfluff/dialects/dialect_redshift.py Show resolved Hide resolved
src/sqlfluff/dialects/dialect_snowflake.py Show resolved Hide resolved
@@ -1570,16 +1563,6 @@ class AlterWarehouseStatementSegment(BaseSegment):
)


class CommentClauseSegment(BaseSegment):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has a different comment to show it's not used for views or tables. But yeah agreed not needed.

src/sqlfluff/dialects/dialect_tsql.py Outdated Show resolved Hide resolved
src/sqlfluff/dialects/dialect_tsql.py Show resolved Hide resolved
@barrywhart
Copy link
Member Author

barrywhart commented Mar 29, 2022

@tunetheweb: Ok, I added type hints throughout the ANSI dialect, which allowed me to remove almost every use of # type: ignore in the other dialects. The only remaining instances are a handful of cases like the one below, all involving situations where we override the base's match_grammar.terminator. I consider that code hacky, so it makes sense to me the need for # type: ignore there. At some point, we could look into providing a core function to override the terminator more cleanly, and eliminate the need for these remaining uses of # type: ignore.

match_grammar.terminator = match_grammar.terminator.copy(  # type: ignore

@barrywhart
Copy link
Member Author

@tunetheweb: I added the test we discussed. See test/core/linter_test.py::test_require_match_parse_grammar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants