Series no longer returns float64 #510

craustin · 2011-12-20T16:47:22Z

Is this desired?

import numpy as np
from pandas import Series
s2 = Series({'A': np.float64(5.0), 'B': np.float64(0.0)})
print type(s2['A'])

(type 'float')

In 0.4.0, this returned numpy.float64. That seems more expected.

wesm · 2011-12-20T20:03:58Z

Well, in theory it shouldn't matter. Under the hood you have a C double and the question is "which kind of box was it put in"? With some of the performance work recently I needed a fast generic __getitem__ for ndarrays, which can be found here:

https://github.com/wesm/pandas/blob/master/pandas/src/util.pxd#L12

cdef inline object get_value_at(ndarray arr, object loc):
    cdef:
        Py_ssize_t i, sz
        void* data_ptr
    if is_float_object(loc):
        casted = int(loc)
        if casted == loc:
            loc = casted
    i = <Py_ssize_t> loc
    sz = cnp.PyArray_SIZE(arr)

    if i < 0:
        i += sz
    elif i >= sz:
        raise IndexError('index out of bounds')
    data_ptr = cnp.PyArray_GETPTR1(arr, i)
    return cnp.PyArray_GETITEM(arr, data_ptr)

It turned out (and I noticed this when I was doing it) that PyArray_GETITEM wants to box float64 objects as Python floats. Since float64 inherits from float:


In [9]: np.float64.mro()                                                                  
Out[9]:                                                                                   
[numpy.float64,                                                                           
 numpy.floating,                                                                          
 numpy.inexact,                                                                           
 numpy.number,                                                                            
 numpy.generic,                                                                           
 float,                                                                                   
 object]

I decided I was willing to live with this for the speed gains from the above function. Was it a source of bugs? just curious

craustin · 2011-12-20T20:07:39Z

Gotcha. It causes only a few issues for us because 1. / x == inf if x is a float64 and a ZeroDivisionError if x is a float. We can workaround in the instances where we expect this to happen.

wesm · 2011-12-21T03:57:15Z

Craig, I went back and looked at this and I figured out the right way to use the NumPy C API. The current git master returns float64 as before and the performance is about the same, within say 50 nanoseconds, perfectly acceptable

craustin · 2011-12-21T15:23:22Z

Great Wes, thanks.

wesm · 2011-12-21T15:46:57Z

As an aside, another reason to use float64 is that it avoids entering Python's scalar value memory allocation nightmare (where internal "free lists" can end up consuming a lot of memory). This is particularly problematic when reading lots of stuff from the database (since the DB drivers convert first to Python float/int, which are then converted to NumPy arrays). Not sure how much you have looked into this

wesm added a commit that referenced this issue Dec 21, 2011

BUG: return NumPy scalars from Series, same speed as before, GH #510

96375da

wesm closed this as completed Dec 21, 2011

craustin mentioned this issue Dec 21, 2011

Certain DataFrame cannot be pickled to string #511

Closed

dan-nadler pushed a commit to dan-nadler/pandas that referenced this issue Sep 23, 2019

Add multi columns support (pandas-dev#510)

94a90b8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Series no longer returns float64 #510

Series no longer returns float64 #510

craustin commented Dec 20, 2011

wesm commented Dec 20, 2011

craustin commented Dec 20, 2011

wesm commented Dec 21, 2011

craustin commented Dec 21, 2011

wesm commented Dec 21, 2011

Series no longer returns float64 #510

Series no longer returns float64 #510

Comments

craustin commented Dec 20, 2011

wesm commented Dec 20, 2011

craustin commented Dec 20, 2011

wesm commented Dec 21, 2011

craustin commented Dec 21, 2011

wesm commented Dec 21, 2011