-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Patch rank() uint64 behavior #14935
BUG: Patch rank() uint64 behavior #14935
Conversation
23844bb
to
3f432b7
Compare
@@ -104,6 +104,21 @@ cdef _take_2d_int64(ndarray[int64_t, ndim=2] values, | |||
result[i, j] = values[i, indexer[i, j]] | |||
return result | |||
|
|||
cdef _take_2d_uint64(ndarray[uint64_t, ndim=2] values, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why float64/int64/object are here either, they should also be in algos_take_helper.pxi.in
can you move there (could be in same PR or another ok)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Done.
@@ -286,6 +301,80 @@ def rank_1d_int64(object in_arr, ties_method='average', ascending=True, | |||
return ranks | |||
|
|||
|
|||
def rank_1d_uint64(object in_arr, ties_method='average', ascending=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same we need to create a algos_rank_helper.pxi.in and move all of the rank routines (and then add uint64). prob easiest to move, then add (in another PR), but up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Done.
|
||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice doc-strings!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 : can't really patch if you don't know what you're patching 😄
@@ -584,6 +609,8 @@ def rank(values, axis=0, method='average', na_option='keep', | |||
f, values = _get_data_algo(values, _rank2d_functions) | |||
ranks = f(values, axis=axis, ties_method=method, | |||
ascending=ascending, na_option=na_option, pct=pct) | |||
else: | |||
raise TypeError("Array with ndim > 2 are not supported.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is now.
Current coverage is 84.66% (diff: 100%)@@ master #14935 diff @@
==========================================
Files 144 144
Lines 51047 51056 +9
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43216 43225 +9
Misses 7831 7831
Partials 0 0
|
812b062
to
e6f42ed
Compare
def rank_1d_{{name}}(object in_arr, bint retry=1, ties_method='average', | ||
ascending=True, na_option='keep', pct=False): | ||
{{else}} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prob could just have the same signature for object (and just ignore that arg no?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, this is fine.
np.putmask(ranks, mask, np.nan) | ||
return ranks | ||
{{else}} | ||
# py2.5/win32 hack, can't pass i8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
are_diff(util.get_value_at(sorted_data, i + 1), val)): | ||
{{elif name == 'float64'}} | ||
if i == n - 1 or sorted_data[i + 1] != val: | ||
{{else}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be nogil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only for non-object
but agreed. Done.
e6f42ed
to
e76b9d4
Compare
@jreback : Addressed all comments, and both Appveyor and Travis are passing. Ready to merge if there are no other concerns. |
@@ -366,6 +366,9 @@ def factorize(values, sort=False, order=None, na_sentinel=-1, size_hint=None): | |||
if isinstance(values, Index): | |||
uniques = values._shallow_copy(uniques, name=None) | |||
elif isinstance(values, Series): | |||
# TODO: This constructor is bugged for uint's, especially | |||
# np.uint64 due to overflow. Test this for uint behavior | |||
# once constructor has been fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meaning where is the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok sure
if not ascending: | ||
_as = _as[:, ::-1] | ||
|
||
{{if name == 'generic'}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe change generic -> object so this is not necessary? (in a higher level definition)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah...that makes sense. I haven't been giving too much thought to changing things, but as Travis and Appveyor are happy right now, it couldn't hurt to try.
{{py: | ||
|
||
# name ctype pos_nan_value neg_nan_value | ||
dtypes = [('generic', 'object', 'Infinity()', 'NegInfinity()'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here I mean, I don't think we use generic generically (and instead use object) nomenclature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, done.
Adds uint64 ranking functions to algos.pyx to allow for proper ranking with uint64. Also introduces partial patch for factorize() by adding uint64 hashtables and vectors for usage. However, this patch is only partial because the larger bug of non-support for uint64 in Index has not been fixed. Also patches bug in UInt64HashTable that had an erroneous null condition that was caught during testing and was hence removed.
e76b9d4
to
2598cea
Compare
lgtm. ping on green. |
@jreback : Everything is green now. Ready to merge if there are no other concerns. |
thanks! |
Adds `uint64` ranking functions to `algos.pyx` to allow for proper ranking with `uint64`. Also introduces partial patch for `factorize()` by adding `uint64` hashtables and vectors for usage. However, this patch is only partial because the larger bug of non- support for `uint64` in `Index` has not been fixed (**UPDATE**: tackled in pandas-dev#14937): ~~~python >>> from pandas import Index, np >>> Index(np.array([2**63], dtype=np.uint64)) Int64Index([-9223372036854775808], dtype='int64') ~~~ Also patches a bug in `UInt64HashTable` from pandas-dev#14915 that had an erroneous null condition that was caught during testing and was hence removed. Author: gfyoung <gfyoung17@gmail.com> Closes pandas-dev#14935 from gfyoung/core-algorithms-uint64-two and squashes the following commits: 2598cea [gfyoung] BUG: Patch rank() uint64 behavior
1) Introduces and propagates `UInt64Index`, an index specifically for `uint`. xref #14935 2) <strike> Patches bug from #14916 that makes `maybe_convert_objects` robust against the known `numpy` bug that `uint64` cannot be compared to `int64`. This bug was caught during testing of `UInt64Index`. </strike> **UPDATE**: Patched in #14951 Author: gfyoung <gfyoung17@gmail.com> Closes #14937 from gfyoung/create-uint64-index and squashes the following commits: 8ab6fbd [gfyoung] ENH: Create and propagate UInt64Index
1) Introduces and propagates `UInt64Index`, an index specifically for `uint`. xref pandas-dev#14935 2) <strike> Patches bug from pandas-dev#14916 that makes `maybe_convert_objects` robust against the known `numpy` bug that `uint64` cannot be compared to `int64`. This bug was caught during testing of `UInt64Index`. </strike> **UPDATE**: Patched in pandas-dev#14951 Author: gfyoung <gfyoung17@gmail.com> Closes pandas-dev#14937 from gfyoung/create-uint64-index and squashes the following commits: 8ab6fbd [gfyoung] ENH: Create and propagate UInt64Index
Adds
uint64
ranking functions toalgos.pyx
to allow for proper ranking withuint64
.Also introduces partial patch for
factorize()
by addinguint64
hashtables and vectors forusage. However, this patch is only partial because the larger bug of non-support for
uint64
in
Index
has not been fixed (UPDATE: tackled in #14937):Also patches a bug in
UInt64HashTable
from #14915 that had an erroneous null condition that was caught during testing and was hence removed.Note there is overlap with #14934 with the implementation of
is_signed_integer_dtype
andis_unsigned_integer_dtype
. That PR should be merged before this one.