Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series no longer returns float64 #510

Closed
craustin opened this issue Dec 20, 2011 · 5 comments
Closed

Series no longer returns float64 #510

craustin opened this issue Dec 20, 2011 · 5 comments
Labels
Milestone

Comments

@craustin
Copy link

Is this desired?

import numpy as np
from pandas import Series
s2 = Series({'A': np.float64(5.0), 'B': np.float64(0.0)})
print type(s2['A'])

(type 'float')

In 0.4.0, this returned numpy.float64. That seems more expected.

@wesm
Copy link
Member

wesm commented Dec 20, 2011

Well, in theory it shouldn't matter. Under the hood you have a C double and the question is "which kind of box was it put in"? With some of the performance work recently I needed a fast generic __getitem__ for ndarrays, which can be found here:

https://github.com/wesm/pandas/blob/master/pandas/src/util.pxd#L12

cdef inline object get_value_at(ndarray arr, object loc):
    cdef:
        Py_ssize_t i, sz
        void* data_ptr
    if is_float_object(loc):
        casted = int(loc)
        if casted == loc:
            loc = casted
    i = <Py_ssize_t> loc
    sz = cnp.PyArray_SIZE(arr)

    if i < 0:
        i += sz
    elif i >= sz:
        raise IndexError('index out of bounds')
    data_ptr = cnp.PyArray_GETPTR1(arr, i)
    return cnp.PyArray_GETITEM(arr, data_ptr)

It turned out (and I noticed this when I was doing it) that PyArray_GETITEM wants to box float64 objects as Python floats. Since float64 inherits from float:


In [9]: np.float64.mro()                                                                  
Out[9]:                                                                                   
[numpy.float64,                                                                           
 numpy.floating,                                                                          
 numpy.inexact,                                                                           
 numpy.number,                                                                            
 numpy.generic,                                                                           
 float,                                                                                   
 object]

I decided I was willing to live with this for the speed gains from the above function. Was it a source of bugs? just curious

@craustin
Copy link
Author

Gotcha. It causes only a few issues for us because 1. / x == inf if x is a float64 and a ZeroDivisionError if x is a float. We can workaround in the instances where we expect this to happen.

@wesm
Copy link
Member

wesm commented Dec 21, 2011

Craig, I went back and looked at this and I figured out the right way to use the NumPy C API. The current git master returns float64 as before and the performance is about the same, within say 50 nanoseconds, perfectly acceptable

@wesm wesm closed this as completed Dec 21, 2011
@craustin
Copy link
Author

Great Wes, thanks.

@wesm
Copy link
Member

wesm commented Dec 21, 2011

As an aside, another reason to use float64 is that it avoids entering Python's scalar value memory allocation nightmare (where internal "free lists" can end up consuming a lot of memory). This is particularly problematic when reading lots of stuff from the database (since the DB drivers convert first to Python float/int, which are then converted to NumPy arrays). Not sure how much you have looked into this

dan-nadler pushed a commit to dan-nadler/pandas that referenced this issue Sep 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants