Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark3: Support for INSERT [TABLE] data manipulation statements #2290

Merged
merged 18 commits into from
Jan 20, 2022

Conversation

R7L208
Copy link
Contributor

@R7L208 R7L208 commented Jan 11, 2022

Brief summary of the change made

Fixes #1932

Are there any other side effects of this change that we should be aware of?

N/A. Opening Draft PR for feedback

Pull Request checklist

  • Please confirm you have completed any of the necessary steps below.

  • Included test cases to demonstrate any code changes, which may be one or more of the following:

    • .yml rule test cases in test/fixtures/rules/std_rule_cases.
    • .sql/.yml parser test cases in test/fixtures/dialects (note YML files can be auto generated with tox -e generate-fixture-yml).
    • Full autofix test cases in test/fixtures/linter/autofix.
    • Other.
  • Added appropriate documentation for the change.

  • Created GitHub issues for any relevant followup/future enhancements if appropriate.

@R7L208 R7L208 marked this pull request as draft January 11, 2022 21:31
@barrywhart
Copy link
Member

barrywhart commented Jan 11, 2022

This error:

image

means that your dialect changes have affected what the resulting parse output looks like. This isn't necessarily a problem with your code.

Rather, this is a check to ensure we don't accidentally change the parse output. To get past this build error, run the following command locally:

python test/generate_parse_fixture_yml.py

This will modify some (possibly a lot) of existing .yml files. You want to commit and push these changes. (I would also suggest reviewing these changes yourself, to see if the modifications make sense to you, based on your dialect changes.)

@R7L208
Copy link
Contributor Author

R7L208 commented Jan 11, 2022

There are a few issues I've found in this PR. Opened an issue #2298 for one and will open the other shortly.

The major problem I'm having is matching (in a non-brittle way) for the commented-out test case in test/fixtures/dialects/spark3/insert_into.sql

-- TODO
---- Insert Using a FROM Statement
--INSERT INTO students
--     FROM applicants SELECT name, address, id applicants WHERE qualified = true;
--     FROM applicants SELECT name, address, id WHERE qualified = true; 

Two things happening here:

  • From & Select are not in traditional order. I can't find a great way to match this with existing classes in ANSI dialect
  • The table reference (applicants) after the column identifiers is valid syntax for spark 🤷 I tested it and this ran ok. However, I don't believe the current implementation will match that given normal list of terminator values for a SELECT.

@barrywhart
Copy link
Member

You may not be able to reuse parts of the ANSI dialect. That's not necessarily bad, but it does mean the PR will be bigger.

I suggest that you initially focus on getting things to parse, even if it requires creating brand-new segments or copy/paste/modifying parts from the ANSI dialect. Once it's working, then you can look at possible ways to reduce the amount of new code (e.g. potentially by reusing more things from ANSI).

@codecov
Copy link

codecov bot commented Jan 11, 2022

Codecov Report

Merging #2290 (62e53f6) into main (d80a0fc) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##              main     #2290   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          161       161           
  Lines        11486     11490    +4     
=========================================
+ Hits         11486     11490    +4     
Impacted Files Coverage Δ
src/sqlfluff/dialects/dialect_spark3.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d80a0fc...62e53f6. Read the comment docs.

add overwrite test cases

rename file to better reflect usage
@R7L208 R7L208 changed the title [DRAFT] Spark3: Support for INSERT TABLE data manipulation statements Spark3: Support for INSERT TABLE data manipulation statements Jan 12, 2022
@R7L208 R7L208 changed the title Spark3: Support for INSERT TABLE data manipulation statements Spark3: Support for INSERT [TABLE] data manipulation statements Jan 12, 2022
@R7L208 R7L208 marked this pull request as ready for review January 12, 2022 12:15
src/sqlfluff/dialects/dialect_spark3.py Outdated Show resolved Hide resolved
src/sqlfluff/dialects/dialect_spark3.py Show resolved Hide resolved
src/sqlfluff/dialects/dialect_spark3.py Outdated Show resolved Hide resolved
src/sqlfluff/dialects/dialect_spark3.py Outdated Show resolved Hide resolved
Comment on lines 39 to 44
INSERT INTO students
FROM applicants
SELECT
--name, address, id
applicants
WHERE qualified = TRUE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? If selecting from applicants then how can you select applicants?

Suggested change
INSERT INTO students
FROM applicants
SELECT
--name, address, id
applicants
WHERE qualified = TRUE;
INSERT INTO students
FROM applicants
SELECT
name, address, id
WHERE qualified = TRUE;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's odd syntax but valid. Additional info on comment above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does that mean out of curiosity? Select all the columns out of that table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled this one from the Issue which was pulled from the docs. Seemed odd so ran it, tested it and it worked.

I ran it past a few other people on my team and think it's just a very, poorly named column alias rather than a table reference.

Going to remove it from the code and test cases for now. If it turns out that it's a table reference, it won't be hard to add back in when/if someone opens an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also the columns were commented out which they shouldn't have been. fixed that as well 😁

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well if not commented out then I'm guessing it just gives the last (id) column the alias applicants. Yeah super confusing. Glad we figured that out and removed it before merging!

test/fixtures/dialects/spark3/insert_table.sql Outdated Show resolved Hide resolved
Comment on lines 826 to 827
# TODO is there a less brittle way to do this using an existing class(es)?
# TODO if not, how to best construct a new class for `SELECT` match?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove these, as can't honestly think of a better way to do this, other than splitting up Select Segment hugely which seems overkill for this one usecase.

Suggested change
# TODO is there a less brittle way to do this using an existing class(es)?
# TODO if not, how to best construct a new class for `SELECT` match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@R7L208 R7L208 requested a review from tunetheweb January 20, 2022 20:27
@tunetheweb tunetheweb merged commit ce02feb into sqlfluff:main Jan 20, 2022
@R7L208 R7L208 deleted the r7l208/spark3-dml-insert branch August 30, 2022 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Insert overwrite has some parsing failures in Spark3 dialect
4 participants