Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): add agg alias for aggregate #4765

Merged
merged 1 commit into from
Nov 3, 2022
Merged

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Nov 3, 2022

Add an agg alias for aggregate. I am really tired of typing out the entire word.

@cpcloud cpcloud added this to the 4.0.0 milestone Nov 3, 2022
@cpcloud cpcloud added ux User experience related issues feature Features or general enhancements labels Nov 3, 2022
Copy link
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@cpcloud cpcloud enabled auto-merge (rebase) November 3, 2022 19:57
@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2022

Test Results

         6 files           6 suites   6m 39s ⏱️
  3 258 tests   3 232 ✔️   26 💤 0
19 494 runs  19 338 ✔️ 156 💤 0

Results for commit b26b139.

@cpcloud cpcloud merged commit 907583f into ibis-project:master Nov 3, 2022
@jcmkk3
Copy link

jcmkk3 commented Nov 3, 2022

Looks like you already merged this and I understand that agg has precedence in the pydata ecosystem, but I think that my favorite verb for this operation is rollup as used by arquero. I can no longer find the reference, but I'm sure that I remember Jeffrey Heer saying that the verbs came out of academic user studies done for data wrangler (which became Trifacta).

Beyond rollup evoking a better visual image of how the operation changes the data shape, it also happens to have a really nice visual alignment with some of the other common verbs.

t.rollup
t.select
t.mutate
t.filter

It is a slippery slope boxing yourself into trying to keep names the same length, of course.

Another shortened verb that has some precedence is sel for select as used in xarray.

I mostly just geek out about this stuff. Not proposing that you change your mind on any of these names.

@ogrisel
Copy link
Contributor

ogrisel commented Nov 4, 2022

Personally I prefer explicit full English words in APIs.

I also like the symmetry between aggregate and mutate.

Also, I tend to dislike aliases because then it breaks the community in term of programming style and the extra alternatives are confusing for newcomers.

@cpcloud
Copy link
Member Author

cpcloud commented Nov 4, 2022

Looks like you already merged this and I understand that agg has precedence in the pydata ecosystem, but I think that my favorite verb for this operation is rollup as used by arquero. I can no longer find the reference, but I'm sure that I remember Jeffrey Heer saying that the verbs came out of academic user studies done for data wrangler (which became Trifacta).

The only issue with rollup is that it has a very specific meaning to the business intelligence community: it's a special case of GROUP BY GROUPING SETS that is extremely common in BI workflows.

We have (the rather ancient!) #550 open for this functionality :)

Beyond rollup evoking a better visual image of how the operation changes the data shape, it also happens to have a really nice visual alignment with some of the other common verbs.

t.rollup
t.select
t.mutate
t.filter

It is a slippery slope boxing yourself into trying to keep names the same length, of course.

I definitely appreciate this, but, yeah it seems like it'll be a hard corner to get out of if we go down this route.

Another shortened verb that has some precedence is sel for select as used in xarray.

I mostly just geek out about this stuff. Not proposing that you change your mind on any of these names.

Not opposed to sel, but we should probably deprecate projection/to_projection before introducing another alias.

@cpcloud
Copy link
Member Author

cpcloud commented Nov 4, 2022

Personally I prefer explicit full English words in APIs.

I sympathize with this for sure: when writing pipelines being explicit made things easier to understand. There's probably an analogy (and a similar tension with iteration speed and convenience) with CI and production shell script arguments here.

I also like the symmetry between aggregate and mutate.

There's always the possibility of mut I guess :)

Also, I tend to dislike aliases because then it breaks the community in term of programming style and the extra alternatives are confusing for newcomers.

Yeah, I understand this too. The way we've currently been thinking about this is to pick a primary API and remove all references to the secondary API in documentation and tests but not actually remove the secondary API to avoid breaking existing code that uses it. Optionally we can add a deprecation warning if we actually do intend to remove it at some point.

@cpcloud cpcloud deleted the add-agg branch November 4, 2022 11:56
@ogrisel
Copy link
Contributor

ogrisel commented Nov 4, 2022

There's always the possibility of mut I guess :)

Please no!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements ux User experience related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants