-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-7450 Use UNSAFE.getLong() to speed up BitSetMethods#anySet() #5897
Conversation
Can one of the admins verify this patch? |
Jenkins, add to whitelist and test this please. |
Merged build triggered. |
Merged build started. |
Test build #31800 has started for PR 5897 at commit |
Test build #31800 has finished for PR 5897 at commit
|
Merged build finished. Test FAILed. |
Test FAILed. |
@@ -71,7 +72,13 @@ public static boolean isSet(Object baseObject, long baseOffset, int index) { | |||
* Returns {@code true} if any bit is set. | |||
*/ | |||
public static boolean anySet(Object baseObject, long baseOffset, long bitSetWidthInBytes) { | |||
for (int i = 0; i <= bitSetWidthInBytes; i++) { | |||
long widthInLong = bitSetWidthInBytes / SIZE_OF_LONG; | |||
for (long i = 0; i <= widthInLong; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually we should use int instead of long for i. the reason is that JIT would inject a safepoint for loops over longs which prevents loop unrolling and incurs an extra check for every iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed the above comment.
Merged build triggered. |
@@ -71,7 +72,13 @@ public static boolean isSet(Object baseObject, long baseOffset, int index) { | |||
* Returns {@code true} if any bit is set. | |||
*/ | |||
public static boolean anySet(Object baseObject, long baseOffset, long bitSetWidthInBytes) { | |||
for (int i = 0; i <= bitSetWidthInBytes; i++) { | |||
long widthInLong = bitSetWidthInBytes / SIZE_OF_LONG; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
width needs to be an int too :)
Merged build started. |
Test build #31808 has started for PR 5897 at commit |
Test build #31808 has finished for PR 5897 at commit
|
Merged build finished. Test FAILed. |
Test FAILed. |
return true; | ||
} | ||
} | ||
for (int i = (int)(SIZE_OF_LONG * widthInLong); i < bitSetWidthInBytes; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this block now contains two for
loops. A bad merge conflict resolution, perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so.
The second loop is for the remaining bytes :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry for overlooking that. I guess I was thinking of optimizing for the case where the bitset width was a multiple of the word size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That case is covered by the first loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the 2nd loop? I think the assumption is that all bitsets are word-aligned. We should definitely document that though.
From failed test: Pretty sure the above is not caused by my change |
Merged build triggered. |
Merged build started. |
Test build #31809 has started for PR 5897 at commit |
Test build #31809 has finished for PR 5897 at commit
|
Merged build finished. Test FAILed. |
Test FAILed. |
sbt.ForkMain$ForkError: The code passed to failAfter did not complete within 60 seconds. I think the above was not related to my change. bq. This patch adds the following public classes (experimental) This change doesn't add any public class. |
Jenkins, retest this please. |
Test build #32026 has finished for PR 5897 at commit
|
Merged build finished. Test FAILed. |
Test FAILed. |
[error] /home/jenkins/workspace/SparkPullRequestBuilder@2/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveWindowFunctionQuerySuite.scala:769: not found: type HiveCompatibilitySuite I don't think the above error was related to my change. [info] Test org.apache.spark.unsafe.bitset.BitSetSuite.basicOps started |
Test build #771 has started for PR 5897 at commit |
* @return whether any bit in the BitSet is set | ||
*/ | ||
public boolean anySet() { | ||
return BitSetMethods.anySet(baseObject, baseOffset, numWords*BitSetMethods.WORD_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in an earlier comment, we can assume that BitSetMethods.anySet
will be called with a size that's word aligned, so for consistency with nextSetBit
I think we should change anySet
to accept a size that's measured in numbers of words. This would let us avoid the mod calculations or multiplications by word size.
Merged build triggered. |
Merged build started. |
Test build #32037 has started for PR 5897 at commit |
Test build #771 has finished for PR 5897 at commit
|
Test build #32037 has finished for PR 5897 at commit
|
Merged build finished. Test PASSed. |
Test PASSed. |
@tedyu also please file a JIRA that tracks this improvement and put it in the title. See how other PRs are opened. |
SPARK-7450 has been filed |
This looks fine; I have two minor nits because I'm really pedantic, but I'll just deal with it on merge. |
@@ -28,7 +28,7 @@ | |||
*/ | |||
public final class BitSetMethods { | |||
|
|||
private static final long WORD_SIZE = 8; | |||
static final long WORD_SIZE = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could remain private.
bq. I'm really pedantic I like that :-) |
Author: tedyu <yuzhihong@gmail.com> Closes #5897 from tedyu/master and squashes the following commits: 473bf9d [tedyu] Address Josh's review comments 1719c5b [tedyu] Correct upper bound in for loop b51dcaf [tedyu] Add unit test in BitSetSuite for BitSet#anySet() 83f9f87 [tedyu] Merge branch 'master' of github.com:apache/spark 817e3f9 [tedyu] Replace constant 8 with SIZE_OF_LONG 75a467b [tedyu] Correct offset for UNSAFE.getLong() 855374b [tedyu] Remove second loop since bitSetWidthInBytes is WORD aligned 093b7a4 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet() 63ee050 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet() 4ca0ef6 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet() 3e9b691 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet() (cherry picked from commit 88063c6) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
Merged to master and branch-1.4 (1.4.0). Thanks! |
Author: tedyu <yuzhihong@gmail.com> Closes apache#5897 from tedyu/master and squashes the following commits: 473bf9d [tedyu] Address Josh's review comments 1719c5b [tedyu] Correct upper bound in for loop b51dcaf [tedyu] Add unit test in BitSetSuite for BitSet#anySet() 83f9f87 [tedyu] Merge branch 'master' of github.com:apache/spark 817e3f9 [tedyu] Replace constant 8 with SIZE_OF_LONG 75a467b [tedyu] Correct offset for UNSAFE.getLong() 855374b [tedyu] Remove second loop since bitSetWidthInBytes is WORD aligned 093b7a4 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet() 63ee050 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet() 4ca0ef6 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet() 3e9b691 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
Author: tedyu <yuzhihong@gmail.com> Closes apache#5897 from tedyu/master and squashes the following commits: 473bf9d [tedyu] Address Josh's review comments 1719c5b [tedyu] Correct upper bound in for loop b51dcaf [tedyu] Add unit test in BitSetSuite for BitSet#anySet() 83f9f87 [tedyu] Merge branch 'master' of github.com:apache/spark 817e3f9 [tedyu] Replace constant 8 with SIZE_OF_LONG 75a467b [tedyu] Correct offset for UNSAFE.getLong() 855374b [tedyu] Remove second loop since bitSetWidthInBytes is WORD aligned 093b7a4 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet() 63ee050 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet() 4ca0ef6 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet() 3e9b691 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
Author: tedyu <yuzhihong@gmail.com> Closes apache#5897 from tedyu/master and squashes the following commits: 473bf9d [tedyu] Address Josh's review comments 1719c5b [tedyu] Correct upper bound in for loop b51dcaf [tedyu] Add unit test in BitSetSuite for BitSet#anySet() 83f9f87 [tedyu] Merge branch 'master' of github.com:apache/spark 817e3f9 [tedyu] Replace constant 8 with SIZE_OF_LONG 75a467b [tedyu] Correct offset for UNSAFE.getLong() 855374b [tedyu] Remove second loop since bitSetWidthInBytes is WORD aligned 093b7a4 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet() 63ee050 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet() 4ca0ef6 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet() 3e9b691 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
@JoshRosen
Please take a look