Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3698][SQL] Correctly check case sensitivity in GetField #2543

Closed
wants to merge 2 commits into from

Conversation

cloud-fan
Copy link
Contributor

This PR is a follow up to #2382
It fix a bug when resolve something like a.b[0].c.d, #2382 only do case sensitive check when resolve Unresolved("a.b") to GetField(Attribute("a"), "b"), but not c and d.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

resolveNesting(nestedFields, a, resolver),
nestedFields.last)() // Preserve the case of the user's field access.
Some(aliased)
Some(Alias(nestedFields.foldLeft(a: Expression)(UnresolvedGetField), nestedFields.last)())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For something like a.b[0].c.d, the origin logic here only works for a and b. but not c and d. So I just simplified the logic here and let the ResolveGetField rule to do its job.

@liancheng
Copy link
Contributor

Would you mind to file a JIRA ticket for this PR?

@SparkQA
Copy link

SparkQA commented Sep 26, 2014

QA tests have started for PR 2543 at commit 5b0a2d0.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 26, 2014

QA tests have finished for PR 2543 at commit 5b0a2d0.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan changed the title [SQL] Correctly check case sensitivity in GetField [SPARK-3698][SQL] Correctly check case sensitivity in GetField Sep 26, 2014
override def toString = s"$child.${field.name}"
}

object GetField {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I think it might be clearer to keep the resolver logic in the Analyzer rule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to put this logic into Analyzer rule, but found some tests depend on GetField(child, fieldName), so I have to create this constructor of GetField. And these two are so similar, so I combine them together. Maybe I should fix those tests instead?

@marmbrus
Copy link
Contributor

Thanks for working on this! A few minor comments.

@cloud-fan
Copy link
Contributor Author

Hi @marmbrus , I have updated my PR according to your comments. Do you mind review it again?

@SparkQA
Copy link

SparkQA commented Oct 1, 2014

QA tests have started for PR 2543 at commit c5b9106.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 1, 2014

QA tests have finished for PR 2543 at commit c5b9106.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • println(s"Failed to load main class $childMainClass.")
    • case class GetField(child: Expression, fieldName: String, findField: StructField => Boolean = null)

@@ -73,7 +73,9 @@ case class GetItem(child: Expression, ordinal: Expression) extends Expression {
/**
* Returns the value of fields in the Struct `child`.
*/
case class GetField(child: Expression, fieldName: String) extends UnaryExpression {
case class GetField(child: Expression, fieldName: String, findField: StructField => Boolean = null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need findField anymore do we?

@cloud-fan
Copy link
Contributor Author

@marmbrus As commented before, I think we should handle GetItem and GetField differently even they are very similar. For GetItem, we need calculate the ordinal by evaluating an expression, for GetField, we can calculate the ordinal by searching the corrected StrcutField according to fieldName which I think should be done in analyze phase. It may be better to put StructFiled into GetFiled but not a simple fieldName.

@cloud-fan
Copy link
Contributor Author

Ping @marmbrus @liancheng I have finished the code locally, if you vote for UnresolvedGetField, I can push the code immediately.

@cloud-fan
Copy link
Contributor Author

Hi @marmbrus @liancheng, I think it's better to calculate the ordinal of GetField in analyze phase, and I have updated the code to introduce the UnresolvedGetField, and fix the case-sensitivity-check bug.
Please let me know if you have any questions. Thanks!

@marmbrus
Copy link
Contributor

Sorry for the delay merging this, but I have been concerned that we are adding unnecessary complexity to analysis by adding more types of expressions. I've built a simpler solution in #3724 based off the test case that you provided. If that looks reasonable to you I suggest we close this issue.

asfgit pushed a commit that referenced this pull request Dec 17, 2014
Based on #2543.

Author: Michael Armbrust <michael@databricks.com>

Closes #3724 from marmbrus/resolveGetField and squashes the following commits:

0a47aae [Michael Armbrust] Fix case insensitive resolution of GetField.
@asfgit asfgit closed this in ca12608 Dec 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants