-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24452][SQL][Core] Avoid possible overflow in int add or multiple #21481
Conversation
Test build #91404 has finished for PR 21481 at commit
|
cc @cloud-fan |
Hey @kiszk, thanks for tracking this down. This change looks good to me. I have a couple of questions, mostly aimed towards figuring out how we can categorically solve this problem:
I think we should definitely backport this fix, at least to 2.3 and possibly earlier. |
+1 on @JoshRosen 's ideas. I've already seen several overflow fixes from @kiszk , it will be good if we have some tools to check it, even we need to run the tool manually. One idea may be to force to use java 8's safe math functions: https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html#addExact-int-int- |
Good questions. For 2, at first I found one of these issues when I looked at a file. Then, I ran For 3, here is my thought. I will look at SpotBugs after my presentation at SAIS will be finished :) |
@JoshRosen @cloud-fan I have just apply |
is findBugs available for scala code as well? |
Since it is Java bytecode analysis, it is available for Scala code, too. |
Since I found an plug-in for maven, I will also include a patch to add findBugs/SpotBugs into maven in this PR. |
Let's merge this as-is and do the build improvements in a separate PR. That's important because we may want to backport the overflow fix to maintenance branches and may want to do so independent of the build changes. |
Thank you for your comment. I will create another PR for integrating findBugs/SpotBugs into maven. |
@@ -703,7 +703,7 @@ public boolean append(Object kbase, long koff, int klen, Object vbase, long voff | |||
// must be stored in the same memory page. | |||
// (8 byte key length) (key) (value) (8 byte pointer to next value) | |||
int uaoSize = UnsafeAlignedOffset.getUaoSize(); | |||
final long recordLength = (2 * uaoSize) + klen + vlen + 8; | |||
final long recordLength = (2L * uaoSize) + (long)klen + (long)vlen + 8L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to cast everything to long? I think long + int + int + int
is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. It was too conservative. (2L * uaoSize) + klen + vlen + 8
can generate LMUL
or LADD
as follows:
LDC 2
ILOAD 9
I2L
LMUL
ILOAD 4
I2L
LADD
ILOAD 8
I2L
LADD
LDC 8
LADD
LSTORE 10
@@ -41,7 +41,7 @@ | |||
@Override | |||
public UnsafeRow appendRow(Object kbase, long koff, int klen, | |||
Object vbase, long voff, int vlen) { | |||
final long recordLength = 8 + klen + vlen + 8; | |||
final long recordLength = 8L + (long)klen + (long)vlen + 8L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@kiszk when you were running findBugs locally, did you find more overflow bugs that are not present in this PR? Let's put all discovered overflow bugs in this PR and have another PR to integrate findBugs with maven. Thanks! |
Test build #91717 has finished for PR 21481 at commit
|
@cloud-fan addressed all of the possible integer overflows detected by SpotBugs. |
Test build #91922 has finished for PR 21481 at commit
|
thanks, merging to master/2.3! |
This PR fixes possible overflow in int add or multiply. In particular, their overflows in multiply are detected by [Spotbugs](https://spotbugs.github.io/) The following assignments may cause overflow in right hand side. As a result, the result may be negative. ``` long = int * int long = int + int ``` To avoid this problem, this PR performs cast from int to long in right hand side. Existing UTs. Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #21481 from kiszk/SPARK-24452. (cherry picked from commit 90da7dc) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This PR fixes possible overflow in int add or multiply. In particular, their overflows in multiply are detected by Spotbugs
The following assignments may cause overflow in right hand side. As a result, the result may be negative.
To avoid this problem, this PR performs cast from int to long in right hand side.
How was this patch tested?
Existing UTs.