Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13535] [SQL] Fix Analysis Exceptions when Using Backticks in Transform Clause #11415

Closed
wants to merge 4 commits into from

Conversation

gatorsmile
Copy link
Member

What changes were proposed in this pull request?

FROM
(FROM test SELECT TRANSFORM(key, value) USING 'cat' AS (`thing1` int, thing2 string)) t
SELECT thing1 + 1

This query returns an analysis error, like:

Failed to analyze query: org.apache.spark.sql.AnalysisException: cannot resolve '`thing1`' given input columns: [`thing1`, thing2]; line 3 pos 7
'Project [unresolvedalias(('thing1 + 1), None)]
+- SubqueryAlias t
   +- ScriptTransformation [key#2,value#3], cat, [`thing1`#6,thing2#7], HiveScriptIOSchema(List(),List(),Some(org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe),Some(org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe),List((field.delim,   )),List((field.delim,   )),Some(org.apache.hadoop.hive.ql.exec.TextRecordReader),Some(org.apache.hadoop.hive.ql.exec.TextRecordWriter),false)
      +- SubqueryAlias test
         +- Project [_1#0 AS key#2,_2#1 AS value#3]
            +- LocalRelation [_1#0,_2#1], [[1,1],[2,2],[3,3],[4,4],[5,5]]

The backpacks of thing1 should be cleaned before entering Parser/Analyzer. This PR fixes this issue.

How was this patch tested?

Added a test case and modified an existing test case

@gatorsmile
Copy link
Member Author

@hvanhovell @viirya Could you review the changes? Thanks!

@rxin
Copy link
Contributor

rxin commented Feb 28, 2016

I don't get it. Is the pull request about supporting backticks? Or change something to use backticks?

@gatorsmile
Copy link
Member Author

Before the PR fix, if we use backticks in Script Transform Outputs, Analyzer is unable to correctly process it. The root cause is from HiveQl. When we parse the plan, we do not remove the backticks. (Actually, in CatalystQl, we clean the backticks for all the attribute/alias names if users use it.)

In this example, before the PR fix, we do not clean backticks. The name of attributeReference is thing1. When attempting to analyze the node Project, Analyzer is unable to find it (i.e., thing1) from the child's output, in which the column name is thing1 instead of thing1, and thus, report the following error:

cannot resolve '`thing1`' given input columns: [`thing1`, thing2]; line 3 pos 7

The error message is pretty confusing. This message is issued from

a.failAnalysis(s"cannot resolve '${a.sql}' given input columns: [$from]")

The backstick in the first 'thing1' is added by the function call a.sql. The backstick in the second 'thing1' is part of the name. Thus, they look the same, but they are different in our plan.

@gatorsmile
Copy link
Member Author

This PR is to make it works. HiveQl cleans `. After the fix, users can use backticks in Script Transformation now. For example, the following query can be processed by Spark SQL.

FROM
(FROM test SELECT TRANSFORM(key, value) USING 'cat' AS (`thing1` int, thing2 string)) t
SELECT thing1 + 1

The issue was exposed when we generate SQL from the plan containing ScriptTransformation. This issue blocks another PR, which has not been submitted.

@SparkQA
Copy link

SparkQA commented Feb 28, 2016

Test build #52135 has finished for PR 11415 at commit 2357d90.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Feb 28, 2016

Can you update the title? The current title is fairly confusing, and I'm not sure if it accurately captures what you are fixing.

@gatorsmile
Copy link
Member Author

Sure, let me do it.

@gatorsmile gatorsmile changed the title [SPARK-13535] [SQL] Use Backtick in Script Transform Outputs [SPARK-13535] [SQL] Fix Analysis Exceptions when Using Backticks in Transform Clause Feb 28, 2016
@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 28, 2016

Test build #52136 has finished for PR 11415 at commit 2357d90.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 28, 2016

Test build #52138 has finished for PR 11415 at commit 2357d90.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@viirya
Copy link
Member

viirya commented Feb 29, 2016

@gatorsmile the failed test is not related to this, I have sumbitted a pr to fix it. Before it is merged, the test you ask to retest will be failed very frequently.

@gatorsmile
Copy link
Member Author

yeah, but this PR is so unlucky. 100% failure : )

@SparkQA
Copy link

SparkQA commented Feb 29, 2016

Test build #52169 has finished for PR 11415 at commit c1f31d1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

@cloud-fan Could you take a look at this? Thanks!

@gatorsmile
Copy link
Member Author

ping @cloud-fan : ) I think this is a simple fix. Could you review it? Another PR for converting Script Transform to SQL is waiting for this fix. Thank you!

@cloud-fan
Copy link
Contributor

LGTM. cc @hvanhovell

@gatorsmile
Copy link
Member Author

Thank you! @cloud-fan : )

@hvanhovell
Copy link
Contributor

LGTM

@hvanhovell
Copy link
Contributor

Merging this to master. Thanks!

@asfgit asfgit closed this in 9e01fe2 Mar 2, 2016
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
…ansform Clause

#### What changes were proposed in this pull request?
```SQL
FROM
(FROM test SELECT TRANSFORM(key, value) USING 'cat' AS (`thing1` int, thing2 string)) t
SELECT thing1 + 1
```
This query returns an analysis error, like:
```
Failed to analyze query: org.apache.spark.sql.AnalysisException: cannot resolve '`thing1`' given input columns: [`thing1`, thing2]; line 3 pos 7
'Project [unresolvedalias(('thing1 + 1), None)]
+- SubqueryAlias t
   +- ScriptTransformation [key#2,value#3], cat, [`thing1`apache#6,thing2#7], HiveScriptIOSchema(List(),List(),Some(org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe),Some(org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe),List((field.delim,	)),List((field.delim,	)),Some(org.apache.hadoop.hive.ql.exec.TextRecordReader),Some(org.apache.hadoop.hive.ql.exec.TextRecordWriter),false)
      +- SubqueryAlias test
         +- Project [_1#0 AS key#2,_2#1 AS value#3]
            +- LocalRelation [_1#0,_2#1], [[1,1],[2,2],[3,3],[4,4],[5,5]]
```

The backpacks of \`thing1\` should be cleaned before entering Parser/Analyzer. This PR fixes this issue.

#### How was this patch tested?

Added a test case and modified an existing test case

Author: gatorsmile <gatorsmile@gmail.com>

Closes apache#11415 from gatorsmile/scriptTransform.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants