Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: cdfInverse does not accept a value of 1 as q #21

Closed
JonathanTaws opened this issue Jan 20, 2021 · 6 comments
Closed

Python: cdfInverse does not accept a value of 1 as q #21

JonathanTaws opened this issue Jan 20, 2021 · 6 comments

Comments

@JonathanTaws
Copy link
Contributor

The cdfInverse method is supposed to accept qq values in range [0,1] as explained in the doc (note that the doc mentions q, while qq is the actual parameter used):

def cdfInverse(self, qq):
"""
Given a value q on [0,1], return the value x such that CDF(x) = q.
Returns NaN for any q > 1 or < 0, or if this TDigest is empty.
"""

However, if I give the value 1, I get an IndexError: list index out of range. The error comes from here:

How to reproduce (using the example on the github page https://github.com/isarn/isarn-sketches-spark#sketch-a-numeric-column-python):

from random import gauss, randint
from isarnproject.sketches.spark.tdigest import *
data = spark.createDataFrame([[randint(1,10),gauss(0,1)] for x in range(1000)])
udf1 = tdigestIntUDF("_1", maxDiscrete = 25)
udf2 = tdigestDoubleUDF("_2", compression = 0.5)
agg = data.agg(udf1, udf2).first()

td = agg[0]
td.cdfInverse(1)

Result:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/tmp/spark-2bfe3869-9322-41af-b0e2-5a289d198492/userFiles-ebc89fbe-3b18-4851-9923-cc04eca13d6d/org.isarnproject_isarn-sketches-spark_2.12-0.5.0-sp3.0.jar/isarnproject/sketches/spark/tdigest.py", line 318, in cdfInverse
IndexError: list index out of range

This code runs isarn-sketches-spark_2.12:0.5.0-sp3.0. However, in version isarn-sketches-spark_2.11:0.3.1-sp2.2-py2.7, the cdfInverse used to accept a value of 1 and run with it.

@erikerlandson
Copy link
Member

yikes, I thought I tested that

@JonathanTaws
Copy link
Contributor Author

yikes, I thought I tested that

For some reason I tend to find the edge cases!

@erikerlandson
Copy link
Member

I'll try to make some time to fix this over the weekend, thanks for reporting!

erikerlandson added a commit to erikerlandson/isarn-sketches-spark that referenced this issue Jan 24, 2021
@JonathanTaws
Copy link
Contributor Author

JonathanTaws commented Jan 26, 2021 via email

@erikerlandson
Copy link
Member

erikerlandson commented Jan 26, 2021

Yes, I would have released it already but I lost my login to Sonatype and they're being weirdly unresponsive helping me restore it 😕

If I can't get it resolved soon, I may migrate the build to github or bintray

@erikerlandson
Copy link
Member

@JonathanTaws I got my sonatype creds sorted out again, new release is available:
"org.isarnproject" %% "isarn-sketches-spark" % "0.5.1-sp3.0"

JonathanTaws pushed a commit to JonathanTaws/isarn-sketches-spark that referenced this issue Jan 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants