-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pandas comparison #2378
Add pandas comparison #2378
Conversation
Thank you for this PR
Thank you for spotting them.
Sure. Also maybe add information how to create this example mini-table for
I would try to finalize this PR, and then ask on #data in Slack if people have suggestions. |
Thanks. While you're at it, can you fix two issues with the previous PR? These are 1) add a link to this page in the sidebar, and 2) check that in the generated HTML docs tables are rendered correctly (by running |
Both are already fixed on master. We only need to update the title to include pandas |
Pandas - Add new section for indexing - Split into common vs. group/agg - Add join operations Dplyr and Stata - Fixed typos and reformated table
@bkamins Thanks for the feedback. I have revamped it further with a whole new subsection for indexing and joins. |
Co-authored-by: Bogumił Kamiński <bkamins@sgh.waw.pl>
Co-authored-by: Bogumił Kamiński <bkamins@sgh.waw.pl>
The latest commit includes
|
Summary of last commit:
P.S. I haven't been focusing on dplyr/stats sections. Once the pandas part is good, we can sync those up. I don't have access to Stata though so I will need some help by then. |
Co-authored-by: Bogumił Kamiński <bkamins@sgh.waw.pl>
…Frames.jl into tk/pandas-comparison
I will squash the PR to a single commit when merging. |
Please let me know when you are done with simplifying this PR and then I will do a review. |
Yes, I'm done now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good except for some minor comments that I think should not be problematic.
@KrainskiL - please run these examples and comment if you see something that confuses you as a user. |
I've got 3 comments:
Of course this is due to additional
|
Co-authored-by: Bogumił Kamiński <bkamins@sgh.waw.pl>
@KrainskiL Thanks for the suggestions.
Is it really? In my use cases, I never select a single column and still want a data frame as a result. |
I think we can skip comment 2 and 3 by @KrainskiL, but comment 1. should be fixed I think as indeed it is confusing (either by moving |
Thank you for working on it. Let us get the ball rolling and merge this PR. If we find anything to be fixed let us just open a PR (and also - as discussed - please open atomic PRs for the "debatable" things) |
@bkamins I have a second thought about the first two forms, which mutate the existing data frame. Pandas returns a copy and so it works better with |
You mean that |
Well, pandas does make a copy. Same behavior for
|
Yes - and I have checked that it is a copy indeed (so there is no column aliasing - as what you show does not check for this). Can you then please - as usual 😄 - add a fix to the examples in a separate PR? |
This PR adds a new section to compare with Python pandas.
Couple things:
Disclaimer: I'm not really proficient with pandas so if anyone can suggest better ways to write the sample code, please feel free to suggest below.