BUG: Patch rank() uint64 behavior #14935

gfyoung · 2016-12-21T04:25:45Z

Adds uint64 ranking functions to algos.pyx to allow for proper ranking with uint64.

Also introduces partial patch for factorize() by adding uint64 hashtables and vectors for
usage. However, this patch is only partial because the larger bug of non-support for uint64
in Index has not been fixed (UPDATE: tackled in #14937):

>>> from pandas import Index, np
>>> Index(np.array([2**63], dtype=np.uint64))
Int64Index([-9223372036854775808], dtype='int64')

Also patches a bug in UInt64HashTable from #14915 that had an erroneous null condition that was caught during testing and was hence removed.

Note there is overlap with #14934 with the implementation of is_signed_integer_dtype and is_unsigned_integer_dtype. That PR should be merged before this one.

jreback · 2016-12-21T11:37:55Z

pandas/algos.pyx

@@ -104,6 +104,21 @@ cdef _take_2d_int64(ndarray[int64_t, ndim=2] values,
            result[i, j] = values[i, indexer[i, j]]
    return result

+cdef _take_2d_uint64(ndarray[uint64_t, ndim=2] values,


not sure why float64/int64/object are here either, they should also be in algos_take_helper.pxi.in

can you move there (could be in same PR or another ok)

Makes sense. Done.

jreback · 2016-12-21T11:38:39Z

pandas/algos.pyx

@@ -286,6 +301,80 @@ def rank_1d_int64(object in_arr, ties_method='average', ascending=True,
        return ranks


+def rank_1d_uint64(object in_arr, ties_method='average', ascending=True,


same we need to create a algos_rank_helper.pxi.in and move all of the rank routines (and then add uint64). prob easiest to move, then add (in another PR), but up to you.

Indeed. Done.

jreback · 2016-12-21T11:39:29Z

pandas/core/algorithms.py


+    Parameters


nice doc-strings!

👍 : can't really patch if you don't know what you're patching 😄

jreback · 2016-12-21T11:39:41Z

pandas/core/algorithms.py

@@ -584,6 +609,8 @@ def rank(values, axis=0, method='average', na_option='keep',
        f, values = _get_data_algo(values, _rank2d_functions)
        ranks = f(values, axis=axis, ties_method=method,
                  ascending=ascending, na_option=na_option, pct=pct)
+    else:
+        raise TypeError("Array with ndim > 2 are not supported.")


test for this?

There is now.

codecov-io · 2016-12-21T12:31:54Z

Current coverage is 84.66% (diff: 100%)

Merging #14935 into master will increase coverage by <.01%

@@             master     #14935   diff @@
==========================================
  Files           144        144          
  Lines         51047      51056     +9   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43216      43225     +9   
  Misses         7831       7831          
  Partials          0          0

Powered by Codecov. Last update d7e8f31...2598cea

gfyoung · 2016-12-23T17:53:31Z

@jreback : With #14934 merged, this is ready to be merged now.

jreback · 2016-12-23T17:57:28Z

pandas/src/algos_rank_helper.pxi.in

+def rank_1d_{{name}}(object in_arr, bint retry=1, ties_method='average',
+                     ascending=True, na_option='keep', pct=False):
+{{else}}
+


prob could just have the same signature for object (and just ignore that arg no?)

nvm, this is fine.

jreback · 2016-12-23T17:58:23Z

pandas/src/algos_rank_helper.pxi.in

+        np.putmask(ranks, mask, np.nan)
+        return ranks
+    {{else}}
+    # py2.5/win32 hack, can't pass i8


remove this comment

jreback · 2016-12-23T17:58:58Z

pandas/src/algos_rank_helper.pxi.in

+            are_diff(util.get_value_at(sorted_data, i + 1), val)):
+        {{elif name == 'float64'}}
+        if i == n - 1 or sorted_data[i + 1] != val:
+        {{else}}


this could be nogil

Only for non-object but agreed. Done.

gfyoung · 2016-12-24T10:23:35Z

@jreback : Addressed all comments, and both Appveyor and Travis are passing. Ready to merge if there are no other concerns.

jreback · 2016-12-24T14:55:08Z

pandas/core/algorithms.py

@@ -366,6 +366,9 @@ def factorize(values, sort=False, order=None, na_sentinel=-1, size_hint=None):
    if isinstance(values, Index):
        uniques = values._shallow_copy(uniques, name=None)
    elif isinstance(values, Series):
+        # TODO: This constructor is bugged for uint's, especially
+        # np.uint64 due to overflow. Test this for uint behavior
+        # once constructor has been fixed.


meaning where is the issue?

@jreback : I'm handling that TODO right now in #14937. I'm literally tackling all of these problems simultaneously.

jreback · 2016-12-24T14:57:04Z

pandas/src/algos_rank_helper.pxi.in

+    if not ascending:
+        _as = _as[:, ::-1]
+
+    {{if name == 'generic'}}


maybe change generic -> object so this is not necessary? (in a higher level definition)

Yeah...that makes sense. I haven't been giving too much thought to changing things, but as Travis and Appveyor are happy right now, it couldn't hurt to try.

jreback · 2016-12-24T14:57:57Z

pandas/src/algos_rank_helper.pxi.in

+{{py:
+
+# name ctype pos_nan_value neg_nan_value
+dtypes = [('generic', 'object', 'Infinity()', 'NegInfinity()'),


here I mean, I don't think we use generic generically (and instead use object) nomenclature

Adds uint64 ranking functions to algos.pyx to allow for proper ranking with uint64. Also introduces partial patch for factorize() by adding uint64 hashtables and vectors for usage. However, this patch is only partial because the larger bug of non-support for uint64 in Index has not been fixed. Also patches bug in UInt64HashTable that had an erroneous null condition that was caught during testing and was hence removed.

jreback · 2016-12-24T20:43:36Z

lgtm. ping on green.

gfyoung · 2016-12-24T21:24:51Z

@jreback : Everything is green now. Ready to merge if there are no other concerns.

jreback · 2016-12-24T22:04:48Z

thanks!

Adds `uint64` ranking functions to `algos.pyx` to allow for proper ranking with `uint64`. Also introduces partial patch for `factorize()` by adding `uint64` hashtables and vectors for usage. However, this patch is only partial because the larger bug of non- support for `uint64` in `Index` has not been fixed (**UPDATE**: tackled in pandas-dev#14937): ~~~python >>> from pandas import Index, np >>> Index(np.array([2**63], dtype=np.uint64)) Int64Index([-9223372036854775808], dtype='int64') ~~~ Also patches a bug in `UInt64HashTable` from pandas-dev#14915 that had an erroneous null condition that was caught during testing and was hence removed. Author: gfyoung <gfyoung17@gmail.com> Closes pandas-dev#14935 from gfyoung/core-algorithms-uint64-two and squashes the following commits: 2598cea [gfyoung] BUG: Patch rank() uint64 behavior

1) Introduces and propagates `UInt64Index`, an index specifically for `uint`. xref #14935 2) <strike> Patches bug from #14916 that makes `maybe_convert_objects` robust against the known `numpy` bug that `uint64` cannot be compared to `int64`. This bug was caught during testing of `UInt64Index`. </strike> **UPDATE**: Patched in #14951 Author: gfyoung <gfyoung17@gmail.com> Closes #14937 from gfyoung/create-uint64-index and squashes the following commits: 8ab6fbd [gfyoung] ENH: Create and propagate UInt64Index

1) Introduces and propagates `UInt64Index`, an index specifically for `uint`. xref pandas-dev#14935 2) <strike> Patches bug from pandas-dev#14916 that makes `maybe_convert_objects` robust against the known `numpy` bug that `uint64` cannot be compared to `int64`. This bug was caught during testing of `UInt64Index`. </strike> **UPDATE**: Patched in pandas-dev#14951 Author: gfyoung <gfyoung17@gmail.com> Closes pandas-dev#14937 from gfyoung/create-uint64-index and squashes the following commits: 8ab6fbd [gfyoung] ENH: Create and propagate UInt64Index

gfyoung mentioned this pull request Dec 21, 2016

ENH: Create and propagate UInt64Index #14937

Closed

gfyoung force-pushed the core-algorithms-uint64-two branch from 23844bb to 3f432b7 Compare December 21, 2016 09:13

jreback added Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations labels Dec 21, 2016

jreback reviewed Dec 21, 2016

View reviewed changes

gfyoung force-pushed the core-algorithms-uint64-two branch 2 times, most recently from 812b062 to e6f42ed Compare December 23, 2016 14:11

jreback requested changes Dec 23, 2016

View reviewed changes

gfyoung force-pushed the core-algorithms-uint64-two branch from e6f42ed to e76b9d4 Compare December 24, 2016 08:35

jreback reviewed Dec 24, 2016

View reviewed changes

gfyoung force-pushed the core-algorithms-uint64-two branch from e76b9d4 to 2598cea Compare December 24, 2016 20:14

jreback added this to the 0.20.0 milestone Dec 24, 2016

jreback approved these changes Dec 24, 2016

View reviewed changes

jreback closed this in b1dc9e4 Dec 24, 2016

gfyoung deleted the core-algorithms-uint64-two branch December 24, 2016 22:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Patch rank() uint64 behavior #14935

BUG: Patch rank() uint64 behavior #14935

gfyoung commented Dec 21, 2016 •

edited

Loading

jreback Dec 21, 2016

gfyoung Dec 22, 2016

jreback Dec 21, 2016

gfyoung Dec 22, 2016

jreback Dec 21, 2016

gfyoung Dec 21, 2016

jreback Dec 21, 2016

gfyoung Dec 22, 2016

codecov-io commented Dec 21, 2016 •

edited

Loading

gfyoung commented Dec 23, 2016

jreback Dec 23, 2016

jreback Dec 23, 2016

jreback Dec 23, 2016

gfyoung Dec 24, 2016

jreback Dec 23, 2016

gfyoung Dec 24, 2016

gfyoung commented Dec 24, 2016

jreback Dec 24, 2016

gfyoung Dec 24, 2016

jreback Dec 24, 2016

jreback Dec 24, 2016

gfyoung Dec 24, 2016

jreback Dec 24, 2016

gfyoung Dec 24, 2016

jreback commented Dec 24, 2016

gfyoung commented Dec 24, 2016

jreback commented Dec 24, 2016

		@@ -286,6 +301,80 @@ def rank_1d_int64(object in_arr, ties_method='average', ascending=True,
		return ranks


		def rank_1d_uint64(object in_arr, ties_method='average', ascending=True,

BUG: Patch rank() uint64 behavior #14935

BUG: Patch rank() uint64 behavior #14935

Conversation

gfyoung commented Dec 21, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Dec 21, 2016 • edited Loading

Current coverage is 84.66% (diff: 100%)

gfyoung commented Dec 23, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung commented Dec 24, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 24, 2016

gfyoung commented Dec 24, 2016

jreback commented Dec 24, 2016

gfyoung commented Dec 21, 2016 •

edited

Loading

codecov-io commented Dec 21, 2016 •

edited

Loading