-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: Move boxing logic to BlockManager #12752
Conversation
@@ -1528,7 +1552,9 @@ def _try_coerce_result(self, result): | |||
result = result.astype('m8[ns]') | |||
result[mask] = tslib.iNaT | |||
elif isinstance(result, np.integer): | |||
result = lib.Timedelta(result) | |||
result = self._box_func(result) | |||
elif convert_float and isinstance(result, np.float): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed to coerce .quantile
result. Any problem if we coerce float
to datetime/timedelta
always?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I understand why you need the convert_float
kw? (I see you are using it below)
return self._na_value | ||
|
||
if com.is_list_like(qs): | ||
values = [_quantile(values, x * 100, **kwargs) for x in qs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fixes: #12093 I think? (eg. now we are doing quantileing per block and not per-column)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately no. rolling_quantile
uses separate cython logic which raises.
https://github.com/pydata/pandas/blob/master/pandas/algos.pyx#L1762
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, meant #11623
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Currently NOT because DataFrame.quantile
use Series.quantile
(rather than added Block
logic). Will take a look.
@sinhrks this is awesome!!!!! 2 small comments. |
@@ -131,6 +133,8 @@ def get_values(self, dtype=None): | |||
return an internal format, currently just the ndarray | |||
this is often overriden to handle to_dense like operations | |||
""" | |||
if dtype == object: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
com.is_object_dtype(dtype)
and return self.values.astype(object)
cce686f
to
0936441
Compare
for (_, vals) in data.iteritems()] | ||
# unable to use DataFrame.apply, becasuse data may be empty | ||
result = dict(_quantile(s) for (_, s) in data.iteritems()) | ||
result = self._constructor(result, columns=data.columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, this is still going to have the perf issues. can you do this as block based? (or save for diff PR)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that will fix the perf issue (as you already handle the 2-d blocks ok in internals/quantile
e8714b8
to
f299fe4
Compare
# ToDo: Temp logic to avoid GH 12619 and GH 12772 | ||
# which affects to DatetimeBlockTZ_try_coerce_result for np.ndarray | ||
if isinstance(result, np.ndarray) and values.ndim > 0: | ||
result = self._holder(result, tz='UTC') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok for now. yeah trying to avoid check like this! thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but actually could/should this logic actually be in _try_coerce_result
for DatetimeTZBlock
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally yes, but to avoid any side effect ATM.
029b698
to
b01608d
Compare
jreback@b7ec843 going to merge this, then can come back and fix the frame to use blocks. This is a bit of a special internal method as it is a reduction which we normally don't pass thru to the block manager (so using this |
CLN: clean up quantile & move to BlockManager closes pandas-dev#12741 closes pandas-dev#12772 closes pandas-dev#12469 closes pandas-dev#12752
thanks @sinhrks I may do a PR to make the reductions be a little nicer. |
map
related boxingquantile
git diff upstream/master | flake8 --diff
Series.asobject
to return boxedobject
valuescommon/i8_boxer()
,Series._maybe_box()
(only used by.quantile()
)If base direction looks OK, going to add some tests based on the spec.