Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35069][SQL] TRANSFORM forbidden DISTICNT and ALL, also make the error clear #32149

Closed

Conversation

AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Apr 13, 2021

What changes were proposed in this pull request?

According to #29087 (comment), add UT in transform.sql

It seems that distinct is not recognized as a reserved word here

-- !query
explain extended SELECT TRANSFORM(distinct b, a, c)
                   USING 'cat' AS (a, b, c)
                 FROM script_trans
                 WHERE a <= 4
-- !query schema
struct<plan:string>
-- !query output
== Parsed Logical Plan ==
'ScriptTransformation [*], cat, [a#x, b#x, c#x], ScriptInputOutputSchema(List(),List(),None,None,List(),List(),None,None,false)
+- 'Project ['distinct AS b#x, 'a, 'c]
   +- 'Filter ('a <= 4)
      +- 'UnresolvedRelation [script_trans], [], false

== Analyzed Logical Plan ==
org.apache.spark.sql.AnalysisException: cannot resolve 'distinct' given input columns: [script_trans.a, script_trans.b, script_trans.c]; line 1 pos 34;
'ScriptTransformation [*], cat, [a#x, b#x, c#x], ScriptInputOutputSchema(List(),List(),None,None,List(),List(),None,None,false)
+- 'Project ['distinct AS b#x, a#x, c#x]
   +- Filter (a#x <= 4)
      +- SubqueryAlias script_trans
         +- View (`script_trans`, [a#x,b#x,c#x])
            +- Project [cast(a#x as int) AS a#x, cast(b#x as int) AS b#x, cast(c#x as int) AS c#x]
               +- Project [a#x, b#x, c#x]
                  +- SubqueryAlias script_trans
                     +- LocalRelation [a#x, b#x, c#x]

Hive's error
image

Why are the changes needed?

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added Ut

@AngersZhuuuu
Copy link
Contributor Author

FYI @cloud-fan

@SparkQA
Copy link

SparkQA commented Apr 13, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41869/

@SparkQA
Copy link

SparkQA commented Apr 13, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41869/



-- !query
SELECT TRANSFORM(distinct b, a, c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah the parser treats it as distinct as b.

Can we improve the parser to recognize distinct correctly and give a clear error message?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW why do we support TRANSFORM(a as b)? is it from Hive? Normal function parameters can't have alias.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW why do we support TRANSFORM(a as b)? is it from Hive? Normal function parameters can't have alias.

Hive not support this.

image

may be we should not use

namedExpressionSeq
    : namedExpression (',' namedExpression)*
    ;

(SELECT kind=TRANSFORM '(' namedExpressionSeq ')'
            | kind=MAP namedExpressionSeq
            | kind=REDUCE namedExpressionSeq)

maybe

expressionSeq 
    : expression(',' expression) 
    ;

(SELECT kind=TRANSFORM '(' expressionSeq ')'
            | kind=MAP expressionSeq
            | kind=REDUCE eExpressionSeq)

Copy link
Contributor

@cloud-fan cloud-fan Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, let's fix it as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, let's fix it as well.

Should I fix both issue in this pr or in separated PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's have 2 PRs. forbid distinct and forbid alias are two things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's have 2 PRs. forbid distinct and forbid alias are two things.

I have make a new ticket for this one. and changed the title.

How about current

@SparkQA
Copy link

SparkQA commented Apr 13, 2021

Test build #137289 has finished for PR 32149 at commit 118f242.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions github-actions bot added the SQL label Apr 13, 2021
@AngersZhuuuu AngersZhuuuu changed the title [SPARK-28227][SQL][FOLLOWUP] Add test case about transform expression with distinct [SPARK-31936][SQL][FOLLOWUP] TRANSFORM forbidden DISTICNT and ALL, also make the error clear Apr 14, 2021
@AngersZhuuuu AngersZhuuuu changed the title [SPARK-31936][SQL][FOLLOWUP] TRANSFORM forbidden DISTICNT and ALL, also make the error clear [SPARK-35069][SQL] TRANSFORM forbidden DISTICNT and ALL, also make the error clear Apr 14, 2021
@SparkQA
Copy link

SparkQA commented Apr 14, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41919/

@SparkQA
Copy link

SparkQA commented Apr 14, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41919/

@SparkQA
Copy link

SparkQA commented Apr 14, 2021

Test build #137340 has finished for PR 32149 at commit 542aa46.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 4ca9958 Apr 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants