Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: aggregation.transform #36478

Merged
merged 14 commits into from
Oct 6, 2020
Merged

Conversation

rhshadrach
Copy link
Member

  • closes CLN: Followup to 35964 #36330
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Followup to #35964

  • Moved whatsnew to reshaping
  • Use is_list_like/is_dict_like in aggregate.transform
  • Broke out dict-like and str/callable computations to own functions to split up the transform function
  • Added/refined typing

@rhshadrach rhshadrach added Apply Apply, Aggregate, Transform, Map Clean labels Sep 19, 2020
pandas/core/aggregation.py Show resolved Hide resolved

# combine results
if len(results) == 0:
raise ValueError("Transform function failed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this tested? is this new?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you should just let concat work and bubble up if needed

Copy link
Member Author

@rhshadrach rhshadrach Sep 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not new, this is hit in tests.frame.apply.test_frame_transform.test_transform_bad_dtype. Removing these two lines results in the ValueError "No objects to concatenate" rather than "Transform function failed". I'm okay with either, slight preference for the current behavior (which is the same as this PR). Let me know if you prefer "No objects to concatenate" and can change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yeah I think we should change this, maybe to something a bit more userfriendly like
the input transform did not contain any transform functions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you thinking of the case where an aggregator is used? If we detect the function not transforming, we'll raise here:

try:
    results[name] = transform(colg, how, 0, *args, **kwargs)
except Exception as e:
    if str(e) == "Function did not transform":
        raise e

before we get to this point. The raising of the code highlighted above should only occur if all the key/value pairs of the dictionary entirely failed - e.g. trying to take a product of strings or lambda x: raise ValueError.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, no this is the case of an empty list or dict right? (of functions that are supplied for aggregation), IOW its user visible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see! I did not realize this line would be the one that raised for an empty list or dict. I've added that check before getting to this section of the code with the message "no results" (this is the message from 1.1 - but can change), along with tests for this.

pandas/core/aggregation.py Show resolved Hide resolved
@rhshadrach
Copy link
Member Author

/azp run

@azure-pipelines
Copy link
Contributor

Azure Pipelines successfully started running 1 pipeline(s).

@rhshadrach
Copy link
Member Author

/azp run

@azure-pipelines
Copy link
Contributor

Azure Pipelines successfully started running 1 pipeline(s).

@rhshadrach
Copy link
Member Author

@jreback Tests added and pass; responses to your questions are above.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments, ping on green.


# combine results
if len(results) == 0:
raise ValueError("Transform function failed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yeah I think we should change this, maybe to something a bit more userfriendly like
the input transform did not contain any transform functions

@@ -56,15 +62,16 @@ def test_transform_list(axis, float_frame, ops, names):
tm.assert_frame_equal(result, expected)


def test_transform_dict(axis, float_frame):
@pytest.mark.parametrize("argtype", [dict, Series])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use box rather than argtype

@@ -45,12 +51,13 @@ def test_transform_list(string_series, ops, names):
tm.assert_frame_equal(result, expected)


def test_transform_dict(string_series):
@pytest.mark.parametrize("argtype", [dict, Series])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@jreback jreback added this to the 1.2 milestone Sep 22, 2020
Comment on lines 474 to 477
# Check for missing columns on a frame
cols = sorted(set(func.keys()) - set(obj.columns))
if len(cols) > 0:
raise SpecificationError(f"Column(s) {cols} do not exist")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better rename cols to missing_cols. Even better if extract function _check_missing_columns(obj, func).

Comment on lines 484 to 485
if len(func) == 0:
raise ValueError("no results")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that it matters much, but probably it will be better to move this check on top of the function, before any iteration over func.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, thanks.

Comment on lines 492 to 494
except Exception as err:
if str(err) == "Function did not transform" or str(err) == "no results":
raise err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be reasonable to create a named exception rather than testing the error message match?

"""
from pandas.core.reshape.concat import concat

if obj.ndim != 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe extract method _is_series(obj). I noticed that this dimensions check is run in another function, but there is a local variable is_series.

@rhshadrach
Copy link
Member Author

Thanks for reviewing @ivanovmg. I don't necessarily disagree with your suggestions, but I think they are outside the scope of this PR.

@rhshadrach
Copy link
Member Author

@jreback The diff on this is getting to be too complex (and honestly started that way). I've opened #36618 as a precursor.

…ansform_cleanup

� Conflicts:
�	pandas/core/aggregation.py
@rhshadrach
Copy link
Member Author

@jreback - This is ready for another review. Failure is unrelated.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also pls rebase

"""
Compute transform in the case of a dict-like func
"""
from pandas.core.reshape.concat import concat

if len(func) == 0:
raise ValueError("no results")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this raised to? is it user facing?

Copy link
Member Author

@rhshadrach rhshadrach Oct 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - this is raised directly to user. The error type and message agrees with 1.1.x, however it wasn't tested until this PR. Can change/improve if desired.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i think this needs to be more explict on what is happening, e.g. an empty function specification was provided. otherwise lgtm.

@jreback jreback merged commit ebd9906 into pandas-dev:master Oct 6, 2020
@jreback
Copy link
Contributor

jreback commented Oct 6, 2020

thanks @rhshadrach very nice

@rhshadrach rhshadrach deleted the transform_cleanup branch October 11, 2020 13:22
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Clean
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CLN: Followup to 35964
3 participants