Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26318][SQL] Deprecate Row.merge #23271

Closed
wants to merge 1 commit into from

Conversation

KyleLi1985
Copy link
Contributor

@KyleLi1985 KyleLi1985 commented Dec 10, 2018

What changes were proposed in this pull request?

Deprecate Row.merge

How was this patch tested?

N/A

@dongjoon-hyun
Copy link
Member

ok to test

n = n + 1
}
}
new GenericRow(container)
Copy link
Member

@dongjoon-hyun dongjoon-hyun Dec 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @KyleLi1985 . Yep. It's a TODO item. I'm wondering if you have a real use case for this improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely, It is important

@dongjoon-hyun
Copy link
Member

cc @rxin and @cloud-fan

@SparkQA
Copy link

SparkQA commented Dec 10, 2018

Test build #99901 has finished for PR 23271 at commit 93c4af4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
val container = Array.ofDim[Any](number)
var n = 0
for (i <- 0 until size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to improve this more, you can try with a while loop here rather than a for loop, without creating the extra 0 until size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only primitively use size, subSize, and number information and control the container will improve the performance more.
up to
call 100000000 time Row.merge(row1) need 18064 millisecond
call 100000000 time Row.merge(rows:_*) need 25651 millisecond

@rxin
Copy link
Contributor

rxin commented Dec 10, 2018

Does anybody actually use this merge?

@cloud-fan
Copy link
Contributor

I have the same question. If it's not used by anyone, maybe we can deprecate it and remove it in the next release

@dongjoon-hyun
Copy link
Member

+1 for depreciation.

@HyukjinKwon
Copy link
Member

+1 for deprecation.

@viirya
Copy link
Member

viirya commented Dec 11, 2018

+1 for deprecation too.

@KyleLi1985
Copy link
Contributor Author

deprecate this function

@SparkQA
Copy link

SparkQA commented Dec 14, 2018

Test build #100155 has finished for PR 23271 at commit cc22a50.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@KyleLi1985
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 14, 2018

Test build #100156 has finished for PR 23271 at commit c300f43.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

ssSize -= 1
container(number) = rows(sSize)(ssSize)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just leave the function as was. It will be removed in the next release anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, it is not used before and is deprecated now. We don't need to do this optimization.

@HyukjinKwon
Copy link
Member

Let's target to deprecate the API in this PR.

@viirya
Copy link
Member

viirya commented Dec 15, 2018

And don't forget to update the PR title and description too.

@KyleLi1985 KyleLi1985 changed the title [SPARK-26318][SQL] Enhance function merge performance in Row [SPARK-26318][SQL] Enhance Row deprecate function merge Dec 16, 2018
@KyleLi1985
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 16, 2018

Test build #100205 has finished for PR 23271 at commit de49469.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -57,6 +57,8 @@ object Row {
/**
* Merge multiple rows into a single row, one after another.
*/

@deprecated("This method is deprecated and will be removed in future versions.", "2.5.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next release is 3.0.0

@cloud-fan
Copy link
Contributor

can we use "deprecate Row.merge" as title?

@KyleLi1985 KyleLi1985 changed the title [SPARK-26318][SQL] Enhance Row deprecate function merge [SPARK-26318][SQL] Deprecate Row.merge Dec 17, 2018
@KyleLi1985
Copy link
Contributor Author

Done

@KyleLi1985
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 17, 2018

Test build #100221 has finished for PR 23271 at commit ed55406.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Dec 17, 2018

retest this please.

@viirya
Copy link
Member

viirya commented Dec 17, 2018

Please update the PR description before merging.

@SparkQA
Copy link

SparkQA commented Dec 17, 2018

Test build #100230 has finished for PR 23271 at commit ed55406.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Dec 17, 2018

retest this please.

@KyleLi1985
Copy link
Contributor Author

retest this please.

I check the test output, the fail is strange.
Is there some trick things I should care about?

@SparkQA
Copy link

SparkQA commented Dec 17, 2018

Test build #100238 has finished for PR 23271 at commit ed55406.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@@ -57,6 +57,8 @@ object Row {
/**
* Merge multiple rows into a single row, one after another.
*/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this newline

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@SparkQA
Copy link

SparkQA commented Dec 17, 2018

Test build #100241 has finished for PR 23271 at commit ed55406.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need a note in the sql migration guide or something?

@@ -57,6 +57,8 @@ object Row {
/**
* Merge multiple rows into a single row, one after another.
*/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon
Copy link
Member

I think it's okay .. virtually no one uses it .. We could also leave a note when we actually remove it out. It's going to warn that it's deprecated during build or in IDE I guess anyway.

@SparkQA
Copy link

SparkQA commented Dec 19, 2018

Test build #100305 has finished for PR 23271 at commit ab19141.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Dec 19, 2018

Test build #100315 has finished for PR 23271 at commit ab19141.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@asfgit asfgit closed this in 04d8e3a Dec 20, 2018
holdenk pushed a commit to holdenk/spark that referenced this pull request Jan 5, 2019
## What changes were proposed in this pull request?
Deprecate Row.merge

## How was this patch tested?
N/A

Closes apache#23271 from KyleLi1985/master.

Authored-by: 李亮 <liang.li.work@outlook.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?
Deprecate Row.merge

## How was this patch tested?
N/A

Closes apache#23271 from KyleLi1985/master.

Authored-by: 李亮 <liang.li.work@outlook.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants