-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Series no longer returns float64 #510
Comments
Well, in theory it shouldn't matter. Under the hood you have a C double and the question is "which kind of box was it put in"? With some of the performance work recently I needed a fast generic https://github.com/wesm/pandas/blob/master/pandas/src/util.pxd#L12 cdef inline object get_value_at(ndarray arr, object loc):
cdef:
Py_ssize_t i, sz
void* data_ptr
if is_float_object(loc):
casted = int(loc)
if casted == loc:
loc = casted
i = <Py_ssize_t> loc
sz = cnp.PyArray_SIZE(arr)
if i < 0:
i += sz
elif i >= sz:
raise IndexError('index out of bounds')
data_ptr = cnp.PyArray_GETPTR1(arr, i)
return cnp.PyArray_GETITEM(arr, data_ptr) It turned out (and I noticed this when I was doing it) that
I decided I was willing to live with this for the speed gains from the above function. Was it a source of bugs? just curious |
Gotcha. It causes only a few issues for us because 1. / x == inf if x is a float64 and a ZeroDivisionError if x is a float. We can workaround in the instances where we expect this to happen. |
Craig, I went back and looked at this and I figured out the right way to use the NumPy C API. The current git master returns float64 as before and the performance is about the same, within say 50 nanoseconds, perfectly acceptable |
Great Wes, thanks. |
As an aside, another reason to use float64 is that it avoids entering Python's scalar value memory allocation nightmare (where internal "free lists" can end up consuming a lot of memory). This is particularly problematic when reading lots of stuff from the database (since the DB drivers convert first to Python float/int, which are then converted to NumPy arrays). Not sure how much you have looked into this |
Is this desired?
import numpy as np
from pandas import Series
s2 = Series({'A': np.float64(5.0), 'B': np.float64(0.0)})
print type(s2['A'])
(type 'float')
In 0.4.0, this returned numpy.float64. That seems more expected.
The text was updated successfully, but these errors were encountered: