You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from isarnproject.sketches.udaf.tdigest import *
from random import gauss
from pyspark.sql.types import *
data = sc.parallelize([[gauss(0,1)] for x in xrange(1000)]).toDF(StructType([StructField("x", DoubleType())]))
agg = data.agg(tdigestDoubleUDAF("x"))
td = agg.first()[0]
And you go ahead to broadcast it to the executors: sc.broadcast(td)
The following error appears:
Traceback (most recent call last):
File "/usr/lib/spark/python/pyspark/broadcast.py", line 83, in dump
pickle.dump(value, f, 2)
File "python/isarnproject/sketches/udt/tdigest.py", line 144, in __reduce__
AttributeError: 'int' object has no attribute 'self'
Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-631244049223455329.py", line 367, in <module>
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-631244049223455329.py", line 360, in <module>
exec(code, _zcUserQueryNameSpace)
File "<stdin>", line 1, in <module>
File "/usr/lib/spark/python/pyspark/context.py", line 802, in broadcast
return Broadcast(self, value, self._pickled_broadcast_vars)
File "/usr/lib/spark/python/pyspark/broadcast.py", line 74, in __init__
self._path = self.dump(value, f)
File "/usr/lib/spark/python/pyspark/broadcast.py", line 90, in dump
raise pickle.PicklingError(msg)
PicklingError: Could not serialize broadcast: AttributeError: 'int' object has no attribute 'self'
If you have a TDigest, for example:
And you go ahead to broadcast it to the executors:
sc.broadcast(td)
The following error appears:
This looks like it's due to a bug in the code of the TDigest class reduce method that is used for serialization, where there is a missing comma after maxDiscrete : https://github.com/isarn/isarn-sketches-spark/blob/develop/python/isarnproject/sketches/udt/tdigest.py#L144
The text was updated successfully, but these errors were encountered: