-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-4222 [CORE] use readFully in FixedLengthBinaryRecordReader #3093
SPARK-4222 [CORE] use readFully in FixedLengthBinaryRecordReader #3093
Conversation
@@ -115,7 +115,7 @@ private[spark] class FixedLengthBinaryRecordReader | |||
if (currentPosition < splitEnd) { | |||
// setup a buffer to store the record | |||
val buffer = recordValue.getBytes | |||
fileInputStream.read(buffer, 0, recordLength) | |||
fileInputStream.readFully(buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, but this also doesn't check how many bytes were actually read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, but readFully will either block until the full number of bytes is available, or throw an error:
http://docs.oracle.com/javase/6/docs/api/java/io/DataInput.html#readFully(byte[])
Can one of the admins verify this patch? |
FWIW, we got bit by this over in the Thunder project, where we'd been using this class before Jeremy contributed it into Spark. The circumstance where it came up was doing big reads from S3, where the underlying buffer size was something like 16kb? I forget exactly... anyway, when the records didn't align with this buffer, badness happened. Here's the equivalent change I made over there: The previous version on that patch shows my previous fix, which does check the number of bytes returned, before I realized that readFully() was a thing. Both that fix and the one here do resolve the original problem we encountered, though I'm having some trouble coming up with a clean unit test for it... |
Good catch, thanks! Can you check that this is the only version of read() in that code? |
BTW it would be good to open a JIRA issue for this on https://issues.apache.org/jira/browse/SPARK but unfortunately ASF JIRA seems to be down at the moment. |
Jenkins, this is ok to test |
Test build #22904 has started for PR 3093 at commit
|
Test build #22904 has finished for PR 3093 at commit
|
Test PASSed. |
Thanks @mateiz - Oops, I actually did open https://issues.apache.org/jira/browse/SPARK-4222 for this, then (inexplicably...) left it off the commit message. Not sure offhand of how best to address that - would be happy to close this PR and open a new one with a proper message, let me know if that's how you'd like to proceed. This was indeed the only use of read() in the FixedLengthBinary* files. |
@industrial-sloth just change the PR title to "SPARK-4222 [CORE] ..." since that title will be the commit message when merged. |
Thanks @srowen! Done. |
Cool, thanks. Will merge this soon. |
replaces the existing read() call with readFully().