Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support inserting into table having CHECK constraint in Delta Lake #15396

Merged
merged 1 commit into from
Feb 6, 2023

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Dec 14, 2022

Description

Supported expressions are intentionally limited in this PR.
The 1st commit came from #14964. Please leave review comments for the 1st commit in #14964.

Release notes

(x) Release notes are required, with the following suggested text:

# Delta Lake
* Support inserting into tables having `CHECK` constraints. ({issue}`15396`)

Comment on lines 67 to 73
number
: MINUS? INTEGER_VALUE #integerLiteral
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about decimals and scientific notation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a TODO comment. Let me know if we should support them in v1.

@findepi
Copy link
Member

findepi commented Dec 16, 2022

What about LIKE and function calls, like substring?

@ebyhr ebyhr marked this pull request as ready for review December 19, 2022 01:42
@ebyhr
Copy link
Member Author

ebyhr commented Dec 19, 2022

What about LIKE and function calls, like substring?

@findepi Left a TODO comment in g4 file.

@ebyhr
Copy link
Member Author

ebyhr commented Dec 19, 2022

CI hit #13199

@findepi
Copy link
Member

findepi commented Dec 19, 2022

What about LIKE and function calls, like substring?

@findepi Left a TODO comment in g4 file.

Sounds good, unless mini-grammar is not the way to go at all.

Can we have alternative approach PR where we explore Coral-based translation and see how hard it would be to get it working?

@ebyhr
Copy link
Member Author

ebyhr commented Dec 19, 2022

@findepi Coral doesn't support parsing Spark SQL as far as I know. Are you suggesting to try Hive translation anyway?

@ebyhr ebyhr force-pushed the ebi/delta-spark-check-constraint-parser branch from b4fa35c to 31efb02 Compare December 19, 2022 21:53
@ebyhr
Copy link
Member Author

ebyhr commented Dec 19, 2022

Rebased on upstream to resolve conflicts.

@ebyhr ebyhr force-pushed the ebi/delta-spark-check-constraint-parser branch from 31efb02 to 596b07d Compare January 25, 2023 02:32
@ebyhr
Copy link
Member Author

ebyhr commented Jan 25, 2023

Rebased on upstream to include engine and SPI change.

@findepi
Copy link
Member

findepi commented Jan 25, 2023

@findepi Coral doesn't support parsing Spark SQL as far as I know.

let's add this as a commit message comment, covering alternatives we considered and rejected

Are you suggesting to try Hive translation anyway?

no

Rebased on upstream to include engine and SPI change.

feel free to squash the fixups

@ebyhr ebyhr force-pushed the ebi/delta-spark-check-constraint-parser branch from 596b07d to f792ef3 Compare January 25, 2023 22:31
@ebyhr
Copy link
Member Author

ebyhr commented Jan 25, 2023

@findepi Squashed commits and updated the commit message.

@ebyhr ebyhr self-assigned this Jan 26, 2023
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed Parsing (up to and excluding "SparkExpressionConverter.java").

I think it would be a good idea to have explicit parser tests for String literals (both positive and negative)

@ebyhr ebyhr force-pushed the ebi/delta-spark-check-constraint-parser branch from f792ef3 to caaa9a3 Compare January 30, 2023 05:34
@ebyhr
Copy link
Member Author

ebyhr commented Jan 30, 2023

Addressed comments.

CI hit #13199 and #14441

public void testUnsupportedStringLiteral()
{
assertParseFailure("r'raw literal'");
assertParseFailure("r\"'\\n' represents dnewline character.\"");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dnewline -> newline

what's unsupported here? the r prefix?
for assertParseFailure can you assert on the exact exception message being thrown?

return (SparkExpression) invokeParser(expressionPattern, SparkExpressionParser::standaloneExpression);
}
catch (Exception e) {
throw new ParsingException("Cannot parse Spark expression: " + expressionPattern);
Copy link
Member

@findepi findepi Feb 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw new ParsingException("Cannot parse Spark expression [%s]: %s".formatted(expressionPattern, firstNonNull(e.getMessage(), e)), e);

@@ -51,9 +52,15 @@ public static String toTrinoExpression(String sparkExpression)
}
}

private static SparkExpression createExpression(String expressionPattern)
@VisibleForTesting
static SparkExpression createExpression(String expressionPattern)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expressionPattern -> expression

(it's definitely not a pattern)

return new Formatter().process(expression, null);
}

public static class Formatter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private

@Override
protected String visitLogicalExpression(LogicalExpression node, Void context)
{
return "(%s %s %s)".formatted(process(node.getLeft(), context), node.getOperator().toString(), process(node.getRight(), context));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may be able to avoid some braces on the output by glueing together series of ANDs or ORs

@Override
        protected String visitLogicalExpression(LogicalExpression node, Void context)
        {
            return stream(flatten(node))
                    .map(expression -> process(expression, context))
                    .collect(Collectors.joining(node.getOperator().toString(), "(", ")"));
        }

        private Iterable<SparkExpression> flatten(LogicalExpression root)
        {
            return Traverser.<SparkExpression>forTree(node -> {
                        if (node instanceof LogicalExpression logicalExpression && logicalExpression.getOperator() == root.getOperator()) {
                            return ImmutableList.of(logicalExpression.getLeft(), logicalExpression.getRight());
                        }
                        return ImmutableList.of();
                    })
                    .depthFirstPreOrder(root);
        }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure it's worth it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It causes a stack overflow. Let me handle in follow-up.

@Override
protected String visitLongLiteral(LongLiteral node, Void context)
{
return String.valueOf(node.getValue());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd prefer Long.toString emphasizing the value is known to be long (and thus safe to output as trino exrepssion)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, Boolean.toString above

return new SparkExpressionBuilder().visit(tree);
}
catch (StackOverflowError e) {
throw new IllegalArgumentException("expression pattern is too large (stack overflow while parsing)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm "pattern"

import static org.assertj.core.api.AssertionsForClassTypes.assertThatThrownBy;
import static org.testng.Assert.assertEquals;

public class TestStringLiteral
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a parser test, not a StringLiteral test

TestSparkExpressions (or however the parsing class gets renamed)


import static io.trino.spi.StandardErrorCode.NOT_SUPPORTED;

public final class SparkExpressions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can call this class SparkExpressionParser?

the antlr generated class would want to be called eg SparkExpressionBaseParser
(just rename grammar file and update grammar SparkExpression; line)

assertThatThrownBy(() -> onDelta().executeQuery("INSERT INTO default." + tableName + " VALUES (" + invalidInput + ")"))
.hasMessageMatching("(?s).* CHECK constraint .* violated by row with values.*");
assertThatThrownBy(() -> onTrino().executeQuery("INSERT INTO delta.default." + tableName + " VALUES (" + invalidInput + ")"))
.hasMessageContaining("Check constraint violation");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the exact failure message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format is "Check constraint violation: " + constraint (constraint = Trino expression). It's tested in TestCheckConstraint.

Use ANTLR approach because LinkedIn Coral doesn't support
translating Spark SQL yet.

Additionally, extract relevant tests from TestDeltaLakeCheckConstraintsCompatibility.
@ebyhr ebyhr force-pushed the ebi/delta-spark-check-constraint-parser branch from caaa9a3 to 21734a0 Compare February 6, 2023 01:38
@ebyhr
Copy link
Member Author

ebyhr commented Feb 6, 2023

CI hit #14441

@ebyhr ebyhr merged commit 60a0ee9 into master Feb 6, 2023
@ebyhr ebyhr deleted the ebi/delta-spark-check-constraint-parser branch February 6, 2023 11:10
@ebyhr ebyhr mentioned this pull request Feb 6, 2023
@github-actions github-actions bot added this to the 407 milestone Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants