-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-5441][pyspark] Make SerDeUtil PairRDD to Python conversions more robust #4236
Conversation
Can one of the admins verify this patch? |
Hey thanks for this - mind adding a regression test that fails on the old code? |
I've added two regression tests which I made sure failed beforehand and succeed now. |
Jenkins, this is ok to test. |
Test build #26243 has started for PR 4236 at commit
|
Test build #26243 has finished for PR 4236 at commit
|
Test PASSed. |
Thanks for adding tests. This looks good to me, so I'm going to merge it into |
|
||
import java.io.{ByteArrayOutputStream, DataOutputStream} | ||
|
||
import org.apache.spark.{SharedSparkContext, SparkContext} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: we usually place the Spark imports in their own section, separate from third-party library imports like Scalatest. I'll just fix this myself on merge, but I thought I'd mention it for future patches.
…re robust SerDeUtil.pairRDDToPython and SerDeUtil.pythonToPairRDD now both support empty RDDs by checking the result of take(1) instead of calling first which throws an exception. Author: Michael Nazario <mnazario@palantir.com> Closes #4236 from mnazario/feature/empty-first and squashes the following commits: a531c0c [Michael Nazario] Added regression tests for SPARK-5441 e3b2fb6 [Michael Nazario] Added acceptance of the empty case
I've cherry-picked this to |
…re robust SerDeUtil.pairRDDToPython and SerDeUtil.pythonToPairRDD now both support empty RDDs by checking the result of take(1) instead of calling first which throws an exception. Author: Michael Nazario <mnazario@palantir.com> Closes apache#4236 from mnazario/feature/empty-first and squashes the following commits: a531c0c [Michael Nazario] Added regression tests for SPARK-5441 e3b2fb6 [Michael Nazario] Added acceptance of the empty case
SerDeUtil.pairRDDToPython and SerDeUtil.pythonToPairRDD now both support empty RDDs by checking the result of take(1) instead of calling first which throws an exception.