-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GroupBy on DatetimeIndex with float32 values VERY slow #2772
Comments
dtype support is not fully there yet, see #2708, support for algos (e.g. pad,backfill,take) is only there for float64,int64,int32,object,bool,datetime64[ns]), have to define for other types (which the PR will do, for the most part) |
can you post code to generate your frame, |
Okay great, I will look into #2708 Code to generate a: In [29]: N = 1000
In [30]: some_nums = lambda n: np.random.uniform(0, 2, size=n)
In [31]: a = pd.DataFrame({'a': some_nums(N), 'b': some_nums(N)}).astype(np.float32)
In [32]: %timeit a.groupby(level=0).last()
1 loops, best of 3: 201 ms per loop
In [33]: a = a.astype(np.float64)
In [34]: %timeit a.groupby(level=0).last()
1000 loops, best of 3: 244 us per loop |
in dtypes branch (after groupby cythonized for float32)
|
I am reopening because it needs a vbench to be written |
this is done already (in dtypes branch :), have group_first_float32,group_last_float32 (added some for reindexing as well) |
sweet thanks! |
closed via #2708 (vbench exists) |
I have a DataFrame with a DatetimeIndex and two float32 columns.
Either way, the result of the groupby is all float64s. I would lilke to preserve float32 dtypes if possible.
Also, there are other operations (resample, shift) that are also very slow on float32 data but I'm pretty sure this is related.
The text was updated successfully, but these errors were encountered: