Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2951] [PySpark] support unpickle array.array for Python 2.6 #2365

Closed
wants to merge 4 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Sep 11, 2014

Pyrolite can not unpickle array.array which pickled by Python 2.6, this patch fix it by extend Pyrolite.

There is a bug in Pyrolite when unpickle array of float/double, this patch workaround it by reverse the endianness for float/double. This workaround should be removed after Pyrolite have a new release to fix this issue.

I had send an PR to Pyrolite to fix it: irmen/Pyrolite#11

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have started for PR 2365 at commit 60e4e2f.

  • This patch merges cleanly.

@davies davies changed the title [SPARK-2951] support unpickle array.array for Python 2.6 [SPARK-2951] [PySpark] support unpickle array.array for Python 2.6 Sep 11, 2014
@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have started for PR 2365 at commit 60e4e2f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 12, 2014

QA tests have finished for PR 2365 at commit 60e4e2f.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ArrayConstructor extends net.razorvine.pickle.objects.ArrayConstructor

@SparkQA
Copy link

SparkQA commented Sep 12, 2014

QA tests have finished for PR 2365 at commit 60e4e2f.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ArrayConstructor extends net.razorvine.pickle.objects.ArrayConstructor

@JoshRosen
Copy link
Contributor

Maybe we should wait a couple of days to hear back from the Pyrolite folks and see if they will cut a new release.

} else if (args.length == 2 && args(1).isInstanceOf[String]) {
val typecode = args(0).asInstanceOf[String].charAt(0)
val data: String = args(1).asInstanceOf[String]
println(typecode, machineCodes(typecode), data.length, data.toList)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this debugging statement?

@SparkQA
Copy link

SparkQA commented Sep 13, 2014

QA tests have started for PR 2365 at commit f44f771.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 13, 2014

QA tests have finished for PR 2365 at commit f44f771.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ArrayConstructor extends net.razorvine.pickle.objects.ArrayConstructor
    • class Dummy(object):

@mattf
Copy link
Contributor

mattf commented Sep 13, 2014

Maybe we should wait a couple of days to hear back from the Pyrolite folks and see if they will cut a new release.

+1

unless there's agreed upon urgency, it'd be better to fix the dep instead of working around it

@JoshRosen
Copy link
Contributor

@mattf I just found out that this is blocking #2378, which is blocking other MLlib Python API patches, so I'm going to consider merging this now...

@mattf
Copy link
Contributor

mattf commented Sep 14, 2014

where's the message that was sent to the pyrolite folks?

it looks like SPARK-2378 s targetted for 1.2, so it has a bit of time

@mattf
Copy link
Contributor

mattf commented Sep 14, 2014

if you do end up merging, what do you think about logging an issue for fixing up the workaround once pyrolite is update?

@davies
Copy link
Contributor Author

davies commented Sep 15, 2014

I had created https://issues.apache.org/jira/browse/SPARK-3524 to track this.

@mattf
Copy link
Contributor

mattf commented Sep 15, 2014

thanks, +1, lgtm

@jkbradley
Copy link
Member

LGTM Ran relevant python unit tests with no problems.

@JoshRosen
Copy link
Contributor

I'm going to merge this now.

As a reference / side note, http://bugs.python.org/issue2389 provides some good context for why Python 2.6's array pickling exposes the machine's endianness; Python 2.7 uses a different pickling strategy that doesn't go through this code path.

@asfgit asfgit closed this in da33acb Sep 16, 2014
davies added a commit to davies/spark that referenced this pull request Dec 10, 2014
Pyrolite can not unpickle array.array which pickled by Python 2.6, this patch fix it by extend Pyrolite.

There is a bug in Pyrolite when unpickle array of float/double, this patch workaround it by reverse the endianness for float/double. This workaround should be removed after Pyrolite have a new release to fix this issue.

I had send an PR to Pyrolite to fix it:  irmen/Pyrolite#11

Author: Davies Liu <davies.liu@gmail.com>

Closes apache#2365 from davies/pickle and squashes the following commits:

f44f771 [Davies Liu] enable tests about array
3908f5c [Davies Liu] Merge branch 'master' into pickle
c77c87b [Davies Liu] cleanup debugging code
60e4e2f [Davies Liu] support unpickle array.array for Python 2.6

Conflicts:
	core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala
	python/pyspark/tests.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants