Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Patch rank() uint64 behavior #14935

Closed

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented Dec 21, 2016

Adds uint64 ranking functions to algos.pyx to allow for proper ranking with uint64.

Also introduces partial patch for factorize() by adding uint64 hashtables and vectors for
usage. However, this patch is only partial because the larger bug of non-support for uint64
in Index has not been fixed (UPDATE: tackled in #14937):

>>> from pandas import Index, np
>>> Index(np.array([2**63], dtype=np.uint64))
Int64Index([-9223372036854775808], dtype='int64')

Also patches a bug in UInt64HashTable from #14915 that had an erroneous null condition that was caught during testing and was hence removed.

Note there is overlap with #14934 with the implementation of is_signed_integer_dtype and is_unsigned_integer_dtype. That PR should be merged before this one.

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations labels Dec 21, 2016
@@ -104,6 +104,21 @@ cdef _take_2d_int64(ndarray[int64_t, ndim=2] values,
result[i, j] = values[i, indexer[i, j]]
return result

cdef _take_2d_uint64(ndarray[uint64_t, ndim=2] values,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why float64/int64/object are here either, they should also be in algos_take_helper.pxi.in

can you move there (could be in same PR or another ok)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Done.

@@ -286,6 +301,80 @@ def rank_1d_int64(object in_arr, ties_method='average', ascending=True,
return ranks


def rank_1d_uint64(object in_arr, ties_method='average', ascending=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same we need to create a algos_rank_helper.pxi.in and move all of the rank routines (and then add uint64). prob easiest to move, then add (in another PR), but up to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Done.


Parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice doc-strings!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 : can't really patch if you don't know what you're patching 😄

@@ -584,6 +609,8 @@ def rank(values, axis=0, method='average', na_option='keep',
f, values = _get_data_algo(values, _rank2d_functions)
ranks = f(values, axis=axis, ties_method=method,
ascending=ascending, na_option=na_option, pct=pct)
else:
raise TypeError("Array with ndim > 2 are not supported.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is now.

@codecov-io
Copy link

codecov-io commented Dec 21, 2016

Current coverage is 84.66% (diff: 100%)

Merging #14935 into master will increase coverage by <.01%

@@             master     #14935   diff @@
==========================================
  Files           144        144          
  Lines         51047      51056     +9   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43216      43225     +9   
  Misses         7831       7831          
  Partials          0          0          

Powered by Codecov. Last update d7e8f31...2598cea

@gfyoung gfyoung force-pushed the core-algorithms-uint64-two branch 2 times, most recently from 812b062 to e6f42ed Compare December 23, 2016 14:11
@gfyoung
Copy link
Member Author

gfyoung commented Dec 23, 2016

@jreback : With #14934 merged, this is ready to be merged now.

def rank_1d_{{name}}(object in_arr, bint retry=1, ties_method='average',
ascending=True, na_option='keep', pct=False):
{{else}}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob could just have the same signature for object (and just ignore that arg no?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, this is fine.

np.putmask(ranks, mask, np.nan)
return ranks
{{else}}
# py2.5/win32 hack, can't pass i8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

are_diff(util.get_value_at(sorted_data, i + 1), val)):
{{elif name == 'float64'}}
if i == n - 1 or sorted_data[i + 1] != val:
{{else}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be nogil

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only for non-object but agreed. Done.

@gfyoung
Copy link
Member Author

gfyoung commented Dec 24, 2016

@jreback : Addressed all comments, and both Appveyor and Travis are passing. Ready to merge if there are no other concerns.

@@ -366,6 +366,9 @@ def factorize(values, sort=False, order=None, na_sentinel=-1, size_hint=None):
if isinstance(values, Index):
uniques = values._shallow_copy(uniques, name=None)
elif isinstance(values, Series):
# TODO: This constructor is bugged for uint's, especially
# np.uint64 due to overflow. Test this for uint behavior
# once constructor has been fixed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meaning where is the issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback : I'm handling that TODO right now in #14937. I'm literally tackling all of these problems simultaneously.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sure

if not ascending:
_as = _as[:, ::-1]

{{if name == 'generic'}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change generic -> object so this is not necessary? (in a higher level definition)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah...that makes sense. I haven't been giving too much thought to changing things, but as Travis and Appveyor are happy right now, it couldn't hurt to try.

{{py:

# name ctype pos_nan_value neg_nan_value
dtypes = [('generic', 'object', 'Infinity()', 'NegInfinity()'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here I mean, I don't think we use generic generically (and instead use object) nomenclature

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, done.

Adds uint64 ranking functions to algos.pyx
to allow for proper ranking with uint64.

Also introduces partial patch for factorize()
by adding uint64 hashtables and vectors for
usage. However, this patch is only partial
because the larger bug of non-support for uint64
in Index has not been fixed.

Also patches bug in UInt64HashTable that had an
erroneous null condition that was caught during
testing and was hence removed.
@jreback jreback added this to the 0.20.0 milestone Dec 24, 2016
@jreback
Copy link
Contributor

jreback commented Dec 24, 2016

lgtm. ping on green.

@gfyoung
Copy link
Member Author

gfyoung commented Dec 24, 2016

@jreback : Everything is green now. Ready to merge if there are no other concerns.

@jreback jreback closed this in b1dc9e4 Dec 24, 2016
@jreback
Copy link
Contributor

jreback commented Dec 24, 2016

thanks!

@gfyoung gfyoung deleted the core-algorithms-uint64-two branch December 24, 2016 22:05
ShaharBental pushed a commit to ShaharBental/pandas that referenced this pull request Dec 26, 2016
Adds `uint64` ranking functions to `algos.pyx` to allow for proper
ranking with `uint64`.    Also introduces partial patch for
`factorize()` by adding `uint64` hashtables and vectors for  usage.
However, this patch is only partial because the larger bug of non-
support for `uint64`  in `Index` has not been fixed (**UPDATE**:
tackled in pandas-dev#14937):  ~~~python  >>> from pandas import Index, np  >>>
Index(np.array([2**63], dtype=np.uint64))
Int64Index([-9223372036854775808], dtype='int64')  ~~~    Also patches
a bug in `UInt64HashTable` from pandas-dev#14915 that had an erroneous null
condition that was caught during testing and was hence removed.

Author: gfyoung <gfyoung17@gmail.com>

Closes pandas-dev#14935 from gfyoung/core-algorithms-uint64-two and squashes the following commits:

2598cea [gfyoung] BUG: Patch rank() uint64 behavior
jreback pushed a commit that referenced this pull request Jan 17, 2017
1) Introduces and propagates `UInt64Index`, an index specifically for
`uint`.  xref #14935    2) <strike> Patches bug from #14916 that makes
`maybe_convert_objects` robust against the known `numpy` bug that
`uint64` cannot be compared to `int64`.  This bug was caught during
testing of `UInt64Index`. </strike>    **UPDATE**: Patched in #14951

Author: gfyoung <gfyoung17@gmail.com>

Closes #14937 from gfyoung/create-uint64-index and squashes the following commits:

8ab6fbd [gfyoung] ENH: Create and propagate UInt64Index
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
1) Introduces and propagates `UInt64Index`, an index specifically for
`uint`.  xref pandas-dev#14935    2) <strike> Patches bug from pandas-dev#14916 that makes
`maybe_convert_objects` robust against the known `numpy` bug that
`uint64` cannot be compared to `int64`.  This bug was caught during
testing of `UInt64Index`. </strike>    **UPDATE**: Patched in pandas-dev#14951

Author: gfyoung <gfyoung17@gmail.com>

Closes pandas-dev#14937 from gfyoung/create-uint64-index and squashes the following commits:

8ab6fbd [gfyoung] ENH: Create and propagate UInt64Index
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants