Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for .groupby aggregation patterns #24

Open
deppen8 opened this issue Feb 26, 2019 · 4 comments · May be fixed by #54
Open

Check for .groupby aggregation patterns #24

deppen8 opened this issue Feb 26, 2019 · 4 comments · May be fixed by #54
Labels
new check New check for the linter

Comments

@deppen8
Copy link
Owner

deppen8 commented Feb 26, 2019

Not even sure if this can be implemented, but maybe with a clever regular expression, it could. See the flashcard for some more details.

@deppen8 deppen8 added the new check New check for the linter label Feb 26, 2019
@simchuck
Copy link
Collaborator

simchuck commented Mar 7, 2019

Both of the anti-patterns have df.groupby('group_col')['agg_col'] ... , which we should be able to identify with the following check:

if isinstance(node.func.value, ast.Subscript) and 
              node.func.value.value.func.attr == 'group_by'

(AST chains can get crazy)

Do you see any cases where this pattern wouldn't work (i.e., other groupby anti-patterns that don't use slicing? or examples where slicing the groupby object would be acceptable?)

@simchuck
Copy link
Collaborator

simchuck commented Mar 8, 2019

Turns out the AST syntax is actually more complex when the .agg() method is chained. But the fundamental pattern should still be the same -- we want to check for groupby() methods that also use slicing.

Unless there are cases where that syntax might be appropriate.

@deppen8
Copy link
Owner Author

deppen8 commented Mar 8, 2019

Unless there are cases where that syntax might be appropriate.

This is my fear. I am having second thoughts about this because groupby is EXTREMELY common in pandas code and I am afraid there are probably lots of cases I am not considering.

I am going to suggest that we should hold off on implementing this until we have a better survey of groupby in the wild. I will open an issue where we can collect examples.

@simchuck
Copy link
Collaborator

simchuck commented Mar 8, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new check New check for the linter
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants