-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Examples: Example with pandas for split apply combine #753
Examples: Example with pandas for split apply combine #753
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the example. Looks good. Mostly stylistic feedback and questions :)
Oh, and can you add the image too, that way it can also be displayed in the README please :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise I got these pandas warnings:
hamilton/examples/pandas/split-apply-combine/my_functions.py:51: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df["Tax Rate"] = output["Tax Rate"]
/hamilton/examples/pandas/split-apply-combine/my_functions.py:69: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df["Tax Credit"] = output["Tax Credit"]
/hamilton/examples/pandas/split-apply-combine/my_functions.py:51: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df["Tax Rate"] = output["Tax Rate"]
I sugget doing pd.concat([df, output], axis=1)
instead
Co-authored-by: Stefan Krawczyk <stefan@dagworks.io>
Co-authored-by: Stefan Krawczyk <stefan@dagworks.io>
I also got them ! It was my plan to fix it today ! |
Fixed ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful. I think this looks good to me. CC @elijahbenizzy ?
We'll squash merge this one so will take the PR description largely.
Otherwise not sure what's up with circleci. It should have run -- only thing I'd need to check is the pre-commit check.
@nhuray okay pre-commit failed locally for me:
So if you run that locally it should fix/flag things to fix. Thanks! |
@skrawcz I removed the I think the PR is ready to be merged now ! Thanks for the review and suggestions. |
Hi everyone,
This is my first Pull Request to contribute to the project 🎉
As a beginner I had issue reading the documentation about how to implement a simple split / apply / combine on a pandas DataFrame.
So after discussing it on Slack with @skrawcz and @elijahbenizzy, I decided to investigate how things works creating a simple code example.
Changes
The changes are located in
examples/pandas/split-apply-combine
(can be renamed).This is the files I added:
README.md
: to explain the examplemy_functions.py
: module when all the transformation functions have been implementedmy_wrapper.py
: module when I just define a class with static methods wrapping the Hamilton drivermy_script.py
: module defining the input for the example and run the wrapper to execute the dataflowScreenshots
Here is the screenshots of the console when we run the example:
And this is the DAG generated:
How I tested this
I just run it locally in Debug to ensure the output expected is rendered and the DAG generated.