-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: aggregation.transform #36478
CLN: aggregation.transform #36478
Conversation
|
||
# combine results | ||
if len(results) == 0: | ||
raise ValueError("Transform function failed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this tested? is this new?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you should just let concat work and bubble up if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not new, this is hit in tests.frame.apply.test_frame_transform.test_transform_bad_dtype
. Removing these two lines results in the ValueError "No objects to concatenate" rather than "Transform function failed". I'm okay with either, slight preference for the current behavior (which is the same as this PR). Let me know if you prefer "No objects to concatenate" and can change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm yeah I think we should change this, maybe to something a bit more userfriendly like
the input transform did not contain any transform functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you thinking of the case where an aggregator is used? If we detect the function not transforming, we'll raise here:
try:
results[name] = transform(colg, how, 0, *args, **kwargs)
except Exception as e:
if str(e) == "Function did not transform":
raise e
before we get to this point. The raising of the code highlighted above should only occur if all the key/value pairs of the dictionary entirely failed - e.g. trying to take a product of strings or lambda x: raise ValueError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, no this is the case of an empty list or dict right? (of functions that are supplied for aggregation), IOW its user visible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see! I did not realize this line would be the one that raised for an empty list or dict. I've added that check before getting to this section of the code with the message "no results" (this is the message from 1.1 - but can change), along with tests for this.
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
@jreback Tests added and pass; responses to your questions are above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments, ping on green.
|
||
# combine results | ||
if len(results) == 0: | ||
raise ValueError("Transform function failed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm yeah I think we should change this, maybe to something a bit more userfriendly like
the input transform did not contain any transform functions
@@ -56,15 +62,16 @@ def test_transform_list(axis, float_frame, ops, names): | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
|
|||
def test_transform_dict(axis, float_frame): | |||
@pytest.mark.parametrize("argtype", [dict, Series]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use box rather than argtype
@@ -45,12 +51,13 @@ def test_transform_list(string_series, ops, names): | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
|
|||
def test_transform_dict(string_series): | |||
@pytest.mark.parametrize("argtype", [dict, Series]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
…ansform_cleanup � Conflicts: � pandas/core/aggregation.py
…ansform_cleanup � Conflicts: � doc/source/whatsnew/v1.2.0.rst
pandas/core/aggregation.py
Outdated
# Check for missing columns on a frame | ||
cols = sorted(set(func.keys()) - set(obj.columns)) | ||
if len(cols) > 0: | ||
raise SpecificationError(f"Column(s) {cols} do not exist") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better rename cols
to missing_cols
. Even better if extract function _check_missing_columns(obj, func)
.
pandas/core/aggregation.py
Outdated
if len(func) == 0: | ||
raise ValueError("no results") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that it matters much, but probably it will be better to move this check on top of the function, before any iteration over func
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, thanks.
pandas/core/aggregation.py
Outdated
except Exception as err: | ||
if str(err) == "Function did not transform" or str(err) == "no results": | ||
raise err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be reasonable to create a named exception rather than testing the error message match?
""" | ||
from pandas.core.reshape.concat import concat | ||
|
||
if obj.ndim != 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe extract method _is_series(obj)
. I noticed that this dimensions check is run in another function, but there is a local variable is_series
.
Thanks for reviewing @ivanovmg. I don't necessarily disagree with your suggestions, but I think they are outside the scope of this PR. |
…ansform_cleanup � Conflicts: � pandas/core/aggregation.py
@jreback - This is ready for another review. Failure is unrelated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also pls rebase
pandas/core/aggregation.py
Outdated
""" | ||
Compute transform in the case of a dict-like func | ||
""" | ||
from pandas.core.reshape.concat import concat | ||
|
||
if len(func) == 0: | ||
raise ValueError("no results") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this raised to? is it user facing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - this is raised directly to user. The error type and message agrees with 1.1.x, however it wasn't tested until this PR. Can change/improve if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i think this needs to be more explict on what is happening, e.g. an empty function specification was provided. otherwise lgtm.
thanks @rhshadrach very nice |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Followup to #35964