Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5441][pyspark] Make SerDeUtil PairRDD to Python conversions more robust #4236

Closed
wants to merge 2 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Jan 28, 2015

SerDeUtil.pairRDDToPython and SerDeUtil.pythonToPairRDD now both support empty RDDs by checking the result of take(1) instead of calling first which throws an exception.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@pwendell
Copy link
Contributor

Hey thanks for this - mind adding a regression test that fails on the old code?

@ghost
Copy link
Author

ghost commented Jan 28, 2015

I've added two regression tests which I made sure failed beforehand and succeed now.

@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@SparkQA
Copy link

SparkQA commented Jan 28, 2015

Test build #26243 has started for PR 4236 at commit a531c0c.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 28, 2015

Test build #26243 has finished for PR 4236 at commit a531c0c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26243/
Test PASSed.

@JoshRosen
Copy link
Contributor

Thanks for adding tests. This looks good to me, so I'm going to merge it into master (1.3.0) and mark it for later backport into branch-1.2 (I'd commit it now, but we're in the middle of the 1.2.1 voting period right now, so we've placed a hold on merging in that branch until the vote passes).


import java.io.{ByteArrayOutputStream, DataOutputStream}

import org.apache.spark.{SharedSparkContext, SparkContext}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: we usually place the Spark imports in their own section, separate from third-party library imports like Scalatest. I'll just fix this myself on merge, but I thought I'd mention it for future patches.

@asfgit asfgit closed this in e023112 Jan 28, 2015
asfgit pushed a commit that referenced this pull request Feb 17, 2015
…re robust

SerDeUtil.pairRDDToPython and SerDeUtil.pythonToPairRDD now both support empty RDDs by checking the result of take(1) instead of calling first which throws an exception.

Author: Michael Nazario <mnazario@palantir.com>

Closes #4236 from mnazario/feature/empty-first and squashes the following commits:

a531c0c [Michael Nazario] Added regression tests for SPARK-5441
e3b2fb6 [Michael Nazario] Added acceptance of the empty case
@JoshRosen
Copy link
Contributor

I've cherry-picked this to branch-1.2 (1.2.2).

markhamstra pushed a commit to markhamstra/spark that referenced this pull request Feb 24, 2015
…re robust

SerDeUtil.pairRDDToPython and SerDeUtil.pythonToPairRDD now both support empty RDDs by checking the result of take(1) instead of calling first which throws an exception.

Author: Michael Nazario <mnazario@palantir.com>

Closes apache#4236 from mnazario/feature/empty-first and squashes the following commits:

a531c0c [Michael Nazario] Added regression tests for SPARK-5441
e3b2fb6 [Michael Nazario] Added acceptance of the empty case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants