-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: return entire array if no outliers are found #181
base: master
Are you sure you want to change the base?
Conversation
am I misreading or doesn't this return the entire array if everything is an outlier? if nothing if an outlier, then this function still returns the entire array. I think it's actually somewhat valid to return an invalid adjusted ur when everything is an outlier (because it's very unreliable if everything is an outlier), but open to opinions. |
i don't see how that's possible with how interquartile range was implemented, could you provide an example? |
I mean, I'm just reading the code as written: arr_without_outliers = [x for x in arr if lower_limit < x < upper_limit]
return arr if not arr_without_outliers else arr_without_outliers this returns the original array |
put another way: this function in master already does what this pr says, which is return the entire array when there are no outliers. so something is being lost in translation here |
skipping semantics, as it is in master an if that's an intended outcome then you can close the pr, per personal preference i need it to return the same value as |
do you have an example? I can't reproduce this. If this is the case, that's definitely a bug, like you mentioned. |
it usually happens at very low cvUR, one example would be this replay it has lower limit of 2 and upper limit of 2 so another option could be including the boundary with |
That's a case where everything is an outlier, not nothing. fwiw, I get r = cg.ReplayMap(1711983, 33839341)
print(cg.ur(r)) # nan
print(cg.ur(r, adjusted=True)) # error
I'm fine with including the endpoints for outlier filtering to avoid this in the case of 0 iqr. We also probably want to raise an error when calculating the ur of a replay with < 3 hits (I thought we already did, to be honest). |
Huh, weird. I ran your script and could not reproduce the results on circleguard
|
fix in the
filter_outliers
method, with previous behavior if no outliers were found, an empty array would have been returned. this results in adjusted ur calculation beingNone
in some replays (while normal ur calculation would be fine)this conflicts with the usage and documentation of the method, which states that it'd return the array with the outliers removed.
to fix this, we simply return the original array if no outliers were found.