Spark3: Support for `INSERT [TABLE]` data manipulation statements #2290

R7L208 · 2022-01-11T21:30:35Z

Brief summary of the change made

Fixes #1932

Are there any other side effects of this change that we should be aware of?

N/A. Opening Draft PR for feedback

Pull Request checklist

Please confirm you have completed any of the necessary steps below.
Included test cases to demonstrate any code changes, which may be one or more of the following:
- .yml rule test cases in test/fixtures/rules/std_rule_cases.
- .sql/.yml parser test cases in test/fixtures/dialects (note YML files can be auto generated with tox -e generate-fixture-yml).
- Full autofix test cases in test/fixtures/linter/autofix.
- Other.
Added appropriate documentation for the change.
Created GitHub issues for any relevant followup/future enhancements if appropriate.

barrywhart · 2022-01-11T21:41:52Z

This error:

means that your dialect changes have affected what the resulting parse output looks like. This isn't necessarily a problem with your code.

Rather, this is a check to ensure we don't accidentally change the parse output. To get past this build error, run the following command locally:

python test/generate_parse_fixture_yml.py

This will modify some (possibly a lot) of existing .yml files. You want to commit and push these changes. (I would also suggest reviewing these changes yourself, to see if the modifications make sense to you, based on your dialect changes.)

R7L208 · 2022-01-11T21:43:41Z

There are a few issues I've found in this PR. Opened an issue #2298 for one and will open the other shortly.

The major problem I'm having is matching (in a non-brittle way) for the commented-out test case in test/fixtures/dialects/spark3/insert_into.sql

-- TODO
---- Insert Using a FROM Statement
--INSERT INTO students
--     FROM applicants SELECT name, address, id applicants WHERE qualified = true;
--     FROM applicants SELECT name, address, id WHERE qualified = true;

Two things happening here:

From & Select are not in traditional order. I can't find a great way to match this with existing classes in ANSI dialect
The table reference (applicants) after the column identifiers is valid syntax for spark 🤷 I tested it and this ran ok. However, I don't believe the current implementation will match that given normal list of terminator values for a SELECT.

barrywhart · 2022-01-11T21:46:06Z

You may not be able to reuse parts of the ANSI dialect. That's not necessarily bad, but it does mean the PR will be bigger.

I suggest that you initially focus on getting things to parse, even if it requires creating brand-new segments or copy/paste/modifying parts from the ANSI dialect. Once it's working, then you can look at possible ways to reduce the amount of new code (e.g. potentially by reusing more things from ANSI).

codecov · 2022-01-11T21:54:36Z

Codecov Report

Merging #2290 (62e53f6) into main (d80a0fc) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main     #2290   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          161       161           
  Lines        11486     11490    +4     
=========================================
+ Hits         11486     11490    +4

Impacted Files	Coverage Δ
src/sqlfluff/dialects/dialect_spark3.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d80a0fc...62e53f6. Read the comment docs.

add overwrite test cases rename file to better reflect usage

…7l208/spark3-dml-insert

src/sqlfluff/dialects/dialect_spark3.py

tunetheweb · 2022-01-20T13:54:17Z

test/fixtures/dialects/spark3/insert_table.sql

+INSERT INTO students
+FROM applicants
+SELECT
+--name, address, id
+applicants
+WHERE qualified = TRUE;


Is this right? If selecting from applicants then how can you select applicants?

Suggested change

INSERT INTO students

FROM applicants

SELECT

--name, address, id

applicants

WHERE qualified = TRUE;

INSERT INTO students

FROM applicants

SELECT

name, address, id

WHERE qualified = TRUE;

It's odd syntax but valid. Additional info on comment above

What does that mean out of curiosity? Select all the columns out of that table?

I pulled this one from the Issue which was pulled from the docs. Seemed odd so ran it, tested it and it worked.

I ran it past a few other people on my team and think it's just a very, poorly named column alias rather than a table reference.

Going to remove it from the code and test cases for now. If it turns out that it's a table reference, it won't be hard to add back in when/if someone opens an issue.

also the columns were commented out which they shouldn't have been. fixed that as well 😁

Well if not commented out then I'm guessing it just gives the last (id) column the alias applicants. Yeah super confusing. Glad we figured that out and removed it before merging!

test/fixtures/dialects/spark3/insert_table.sql

Co-authored-by: Barry Pollard <barry_pollard@hotmail.com>

tunetheweb · 2022-01-20T19:41:42Z

src/sqlfluff/dialects/dialect_spark3.py

+                # TODO is there a less brittle way to do this using an existing class(es)?
+                # TODO if not, how to best construct a new class for `SELECT` match?


Let's remove these, as can't honestly think of a better way to do this, other than splitting up Select Segment hugely which seems overkill for this one usecase.

Suggested change

# TODO is there a less brittle way to do this using an existing class(es)?

# TODO if not, how to best construct a new class for `SELECT` match?

…7l208/spark3-dml-insert

initial commit for draft PR

909fc7a

R7L208 marked this pull request as draft January 11, 2022 21:31

R7L208 mentioned this pull request Jan 11, 2022

SQLFluff adds whitespace before Typed Date Literal #2289

Closed

add yml file

1b06f34

R7L208 added 3 commits January 11, 2022 18:01

initial implementation for insert into

ccd0348

update test files

977257d

add overwrite test cases rename file to better reflect usage

update docstring

4aaf776

R7L208 changed the title ~~[DRAFT] Spark3: Support for INSERT TABLE data manipulation statements~~ Spark3: Support for INSERT TABLE data manipulation statements Jan 12, 2022

R7L208 changed the title ~~Spark3: Support for INSERT TABLE data manipulation statements~~ Spark3: Support for INSERT [TABLE] data manipulation statements Jan 12, 2022

R7L208 marked this pull request as ready for review January 12, 2022 12:15

R7L208 and others added 5 commits January 15, 2022 15:48

deleted unused yml file

9889a5f

Merge branch 'main' into r7l208/spark3-dml-insert

bd3a68f

Merge remote-tracking branch 'origin/r7l208/spark3-dml-insert' into r…

38a24f5

…7l208/spark3-dml-insert

update test cases to remove todo and prevously unparsable code

e614128

Merge branch 'main' into r7l208/spark3-dml-insert

7775de4

tunetheweb requested changes Jan 20, 2022

View reviewed changes

R7L208 and others added 5 commits January 20, 2022 13:47

Rename InsertTableStatement

e4f6d07

Co-authored-by: Barry Pollard <barry_pollard@hotmail.com>

Remove reference of InsertTableStatement

7340d3d

Co-authored-by: Barry Pollard <barry_pollard@hotmail.com>

re-add insertTableSegment from ansi

a292fc6

sqlfluff fix/lint on test file

d52aba4

Merge branch 'main' into r7l208/spark3-dml-insert

dd06c5a

tunetheweb reviewed Jan 20, 2022

View reviewed changes

R7L208 added 2 commits January 20, 2022 15:04

remove second TableReference & TODOs

95cead0

Merge remote-tracking branch 'origin/r7l208/spark3-dml-insert' into r…

255b93a

…7l208/spark3-dml-insert

R7L208 requested a review from tunetheweb January 20, 2022 20:27

tunetheweb approved these changes Jan 20, 2022

View reviewed changes

Merge branch 'main' into r7l208/spark3-dml-insert

62e53f6

tunetheweb merged commit ce02feb into sqlfluff:main Jan 20, 2022

R7L208 deleted the r7l208/spark3-dml-insert branch August 30, 2022 19:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark3: Support for `INSERT [TABLE]` data manipulation statements #2290

Spark3: Support for `INSERT [TABLE]` data manipulation statements #2290

R7L208 commented Jan 11, 2022 •

edited

Loading

barrywhart commented Jan 11, 2022 •

edited

Loading

R7L208 commented Jan 11, 2022

barrywhart commented Jan 11, 2022

codecov bot commented Jan 11, 2022 •

edited

Loading

tunetheweb Jan 20, 2022

R7L208 Jan 20, 2022

tunetheweb Jan 20, 2022

R7L208 Jan 20, 2022

R7L208 Jan 20, 2022

tunetheweb Jan 20, 2022

tunetheweb Jan 20, 2022

R7L208 Jan 20, 2022

		# TODO is there a less brittle way to do this using an existing class(es)?
		# TODO if not, how to best construct a new class for `SELECT` match?

Spark3: Support for INSERT [TABLE] data manipulation statements #2290

Spark3: Support for INSERT [TABLE] data manipulation statements #2290

Conversation

R7L208 commented Jan 11, 2022 • edited Loading

Brief summary of the change made

Are there any other side effects of this change that we should be aware of?

Pull Request checklist

barrywhart commented Jan 11, 2022 • edited Loading

R7L208 commented Jan 11, 2022

barrywhart commented Jan 11, 2022

codecov bot commented Jan 11, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Spark3: Support for `INSERT [TABLE]` data manipulation statements #2290

Spark3: Support for `INSERT [TABLE]` data manipulation statements #2290

R7L208 commented Jan 11, 2022 •

edited

Loading

barrywhart commented Jan 11, 2022 •

edited

Loading

codecov bot commented Jan 11, 2022 •

edited

Loading