-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow no rowkey in unstack #2995
Conversation
For duplicates, I think the dplyr default of making lists (rather than throwing an error) is a bit questionable. I like the idea of having to explicitly provide a method for handling that case. Ideally, however, it could also accept an aggregation function. By that logic, |
Agreed. A policy in DataFrames.jl is to never produce a warning (like
API design is orthogonal to performance. Let us make a list of things we want. I understand it is:
The question is now about API. we could either:
The problem we have is that we cannot remove In general I think adding these features gets us super close to resolving long standing #1181. CC @nalimilan |
Actually maybe |
Apologies, I wasn't aware of the existing allowduplicates behavior. Given that, I think option 1 makes sense.
Note there's the question of how to handle non-duplicates. I think these should be |
Agreed. In DataFrames.jl, unless there is a strong reason, we try to produce type stable columns. |
@nalimilan - any opinion on this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay!
I'd say that allowduplicates=identity
and so on would be OK. That would be more convenient than having to do allowduplicates=true, duplicate_fun=identity
.
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Thank you! |
Fixes #2994
It turns out that when I have re-written the
unstack
implementation to improve its performance we get handling of no row-key for free. So I have added it.The question is if we want to add another option for
allowduplicates
likeallowduplicates=:vector
, in which case we would store a list of duplicate values (a ladplyr
in #2994 example). This would be a new feature (and probably go to 1.4 release as what I make here is a patch)