Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13537][SQL] Fix readBytes in VectorizedPlainValuesReader #11418

Closed
wants to merge 2 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Feb 28, 2016

JIRA: https://issues.apache.org/jira/browse/SPARK-13537

What changes were proposed in this pull request?

In readBytes of VectorizedPlainValuesReader, we use buffer[offset] to access bytes in buffer. It is incorrect because offset is added with Platform.BYTE_ARRAY_OFFSET when initialization. We should fix it.

How was this patch tested?

ParquetHadoopFsRelationSuite sometimes (depending on the randomly generated data) will be failed by this bug. After applying this, the test can be passed.

I added a test to ParquetHadoopFsRelationSuite with the data which will fail without this patch.

The error exception:

[info] ParquetHadoopFsRelationSuite:
[info] - test all data types - StringType (440 milliseconds)
[info] - test all data types - BinaryType (434 milliseconds)
[info] - test all data types - BooleanType (406 milliseconds)
20:59:38.618 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 2597.0 (TID 67966)
java.lang.ArrayIndexOutOfBoundsException: 46
at org.apache.spark.sql.execution.datasources.parquet.VectorizedPlainValuesReader.readBytes(VectorizedPlainValuesReader.java:88)

@viirya
Copy link
Member Author

viirya commented Feb 28, 2016

cc @nongli @rxin

@SparkQA
Copy link

SparkQA commented Feb 28, 2016

Test build #52142 has finished for PR 11418 at commit 44f5c41.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 28, 2016

Test build #52143 has finished for PR 11418 at commit 1b09304.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nongli
Copy link
Contributor

nongli commented Feb 29, 2016

LGTM

Thanks for fixing this. Just out of curiosity, how did you find this initially?

@viirya
Copy link
Member Author

viirya commented Feb 29, 2016

I saw the failure in #11415 jenkins test report. Then I rerun the test locally to find the problematic data and do debugging with it.

@rxin
Copy link
Contributor

rxin commented Feb 29, 2016

Thanks - I've merged this in master.

@asfgit asfgit closed this in 6dfc4a7 Feb 29, 2016
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
JIRA: https://issues.apache.org/jira/browse/SPARK-13537

## What changes were proposed in this pull request?

In readBytes of VectorizedPlainValuesReader, we use buffer[offset] to access bytes in buffer. It is incorrect because offset is added with Platform.BYTE_ARRAY_OFFSET when initialization. We should fix it.

## How was this patch tested?

`ParquetHadoopFsRelationSuite` sometimes (depending on the randomly generated data) will be [failed](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52136/consoleFull) by this bug. After applying this, the test can be passed.

I added a test to `ParquetHadoopFsRelationSuite` with the data which will fail without this patch.

The error exception:

    [info] ParquetHadoopFsRelationSuite:
    [info] - test all data types - StringType (440 milliseconds)
    [info] - test all data types - BinaryType (434 milliseconds)
    [info] - test all data types - BooleanType (406 milliseconds)
    20:59:38.618 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 2597.0 (TID 67966)
    java.lang.ArrayIndexOutOfBoundsException: 46
	at org.apache.spark.sql.execution.datasources.parquet.VectorizedPlainValuesReader.readBytes(VectorizedPlainValuesReader.java:88)

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#11418 from viirya/fix-readbytes.
@viirya viirya deleted the fix-readbytes branch December 27, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants