Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order to check properly the limit size #23217

Closed
wants to merge 3 commits into from

Conversation

mgaido91
Copy link
Contributor

@mgaido91 mgaido91 commented Dec 4, 2018

What changes were proposed in this pull request?

The PR starts from the comment in the main one and it aims at:

  • simplifying the code for MapConcat;
  • be more precise in checking the limit size.

How was this patch tested?

existing tests

@mgaido91
Copy link
Contributor Author

mgaido91 commented Dec 4, 2018

cc @cloud-fan

if (key == null) {
throw new RuntimeException("Cannot use null as map key.")
}

val index = keyToIndex.getOrDefault(key, -1)
if (index == -1) {
if (withSizeCheck && size >= ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
Copy link
Contributor

@cloud-fan cloud-fan Dec 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should always check the size. Such a big map is very likely to cause problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this flag is just for perf reasons, we can skip the check in some conditions and I didn't want to introduce perf overhead if not needed. If we remove the flag we would do the comparison for each item, also when it is not needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, I'd like to avoid premature optimization. Actually how much perf this can save? This code block is already doing some heavy work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let me remove it then, thanks.

if (key == null) {
throw new RuntimeException("Cannot use null as map key.")
}

val index = keyToIndex.getOrDefault(key, -1)
if (index == -1) {
if (withSizeCheck && size >= ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
throw new RuntimeException(s"Unsuccessful attempt to concat maps with $size elements " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: concat -> build

@cloud-fan
Copy link
Contributor

thanks for the cleanup!

@SparkQA
Copy link

SparkQA commented Dec 4, 2018

Test build #99664 has finished for PR 23217 at commit 54f0f31.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 4, 2018

Test build #99670 has finished for PR 23217 at commit 38f3bfa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91
Copy link
Contributor Author

mgaido91 commented Dec 4, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Dec 4, 2018

Test build #99668 has finished for PR 23217 at commit 724db5c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 4, 2018

Test build #99680 has finished for PR 23217 at commit 38f3bfa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in 7143e9d Dec 5, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…perly the limit size

## What changes were proposed in this pull request?

The PR starts from the [comment](apache#23124 (comment)) in the main one and it aims at:
 - simplifying the code for `MapConcat`;
 - be more precise in checking the limit size.

## How was this patch tested?

existing tests

Closes apache#23217 from mgaido91/SPARK-25829_followup.

Authored-by: Marco Gaido <marcogaido91@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants