-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TDigest __repr__ not in line with constructor #22
Comments
I think this also causes an issue with Spark broadcasting feature. from random import gauss, randint
from isarnproject.sketches.spark.tdigest import *
data = spark.createDataFrame([[randint(1,10),gauss(0,1)] for x in range(1000)])
udf1 = tdigestIntUDF("_1", maxDiscrete = 25)
udf2 = tdigestDoubleUDF("_2", compression = 0.5)
agg = data.agg(udf1, udf2).first()
td = agg[0]
td_broadcast = spark.sparkContext.broadcast(td)
td_broadcast.value Results in:
|
interesting, I'm sure I can make it conform to a parsable constructor expression |
I believe removing the def __repr__(self):
return "TDigest(%s, %s, %s, %s)" % \
(repr(self.compression), repr(self.maxDiscrete), repr(self._cent), repr(self._mass)) |
Closing with #23 - thanks @JonathanTaws ! |
While testing with the new release, I found that it's still not working with the |
I added #25 and published it as 0.5.2, thanks! |
Thanks, all working properly now. |
I am trying to save the
TDigest
object (in Python) to a format that I can use to recreate it. In the past (versionisarn-sketches-spark_2.11:0.3.1-sp2.2-py2.7
), I was able to access the below parameters and save these, and then recreate aTDigest
by calling the constructor with those parameters.https://github.com/isarn/isarn-sketches-spark/blob/v0.3.1/python/isarnproject/sketches/udt/tdigest.py#L115
With the latest version, aside from the renaming of some of parameters, the constructor for
TDigest
does not accept the same parameters:isarn-sketches-spark/python/isarnproject/sketches/spark/tdigest.py
Line 226 in e7d3136
The
__repr__
representation includes thenclusters
parameter, which is not in the constructor signature (rightfully), meaning I can't use the__repr__
string to construct a new object (e.g. by usingeval(repr(tdigest))
) without some hacking around.isarn-sketches-spark/python/isarnproject/sketches/spark/tdigest.py
Lines 239 to 241 in e7d3136
The text was updated successfully, but these errors were encountered: