Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4520] [SQL] This pr fixes the ArrayIndexOutOfBoundsException as r... #4148

Closed
wants to merge 1 commit into from

Conversation

dhruvasood
Copy link

...aised in SPARK-4520.

The exception is thrown only for a thrift generated parquet file. The array element schema name is assumed as "array" as per ParquetAvro but for thrift generated parquet files, it is array_name + "_tuple". This leads to missing child of array group type and hence when the parquet rows are being materialized leads to the exception.

…s raised in SPARK-4520.

The exception is thrown only for a thrift generated
parquet file. The array element schema name is assumed
as "array" as per ParquetAvro but for thrift generated
parquet files, it is array_name + "_tuple". This leads
to missing child of array group type and hence when the
parquet rows are being materialized leads to the exception.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@liancheng
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Feb 5, 2015

Test build #26798 has started for PR 4148 at commit c5ccde8.

  • This patch merges cleanly.

@liancheng
Copy link
Contributor

Double checked parquet-avro, parquet-protobuf, and parquet-thrift. This solution LGTM as long as Jenkins agrees. Also, I think this worth a JIRA ticket for Parquet.

@liancheng
Copy link
Contributor

Would like to add that https://github.com/apache/incubator-parquet-format/pull/17 is trying to add specs for Parquet LIST and MAP annotations. The Thrift _tuple issue is also mentioned.

@SparkQA
Copy link

SparkQA commented Feb 5, 2015

Test build #26798 has finished for PR 4148 at commit c5ccde8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26798/
Test PASSed.

asfgit pushed a commit that referenced this pull request Feb 5, 2015
…s r...

...aised in SPARK-4520.

The exception is thrown only for a thrift generated parquet file. The array element schema name is assumed as "array" as per ParquetAvro but for thrift generated parquet files, it is array_name + "_tuple". This leads to missing child of array group type and hence when the parquet rows are being materialized leads to the exception.

Author: Sadhan Sood <sadhan@tellapart.com>

Closes #4148 from sadhan/SPARK-4520 and squashes the following commits:

c5ccde8 [Sadhan Sood] [SPARK-4520] [SQL] This pr fixes the ArrayIndexOutOfBoundsException as raised in SPARK-4520.

(cherry picked from commit dba98bf)
Signed-off-by: Cheng Lian <lian@databricks.com>
@asfgit asfgit closed this in dba98bf Feb 5, 2015
@liancheng
Copy link
Contributor

Merged this to master and branch-1.3. Thanks!

PS: My first merge as a committer :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants