-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow scalar broadcasting into an empty data frame #1890
Conversation
I have added new broadcasting rules for an empty |
As far as I can tell, everything looks great. Thanks for taking the time to add those changes! I think this makes DataFrames more flexible. |
@nalimilan + @quinnj : you might find the last commit interesting, see e693124 I also checked that The problem is present in Julia 1.0 only. Do you have any idea what could cause it? CC @vtjnash - it seems to me like a bug in Julia 1.0. |
Just as a reference this is a MWE of what was not working under Julia 1.0:
|
Fixes #1893. |
@nalimilan This should be good for a review. I have gotten rid of "speculative code (the possible bug in Base was worked around by a cleaner code). These are the only problems reported with 0.19 that needed fixing and are:
|
I'm not sure that's a great idea. That would introduce an inconsistency between 0-rows data frames depending on whether they have 0 columns or more. I think it's fine to allow for some flexibility for 0-column data frames in cases that would otherwise throw errors; but introducing inconsistencies is problematic. And indeed it makes the description of the rules more complex.
This sounds OK as long as 0-column and multi-column data frames behave consistently.
+1 |
@nalimilan - your comments are along my initial thoughts. Let me give an extended explanation of the reason for the change and I would appreciate your feedback. The problem is that: 1) users want what I have implemented here; 2) currently in
and Also in the past |
AFAICT that's a very different thing, since it doesn't conflict with the general behavior when there are columns: we just allow something that would otherwise fail (because the length of the new vector isn't
Yes, that's a bit annoying. But |
I agree (and that is why initially I wanted to create @itsdfish + @grahamgill : can you please comment how much you want 0-column data frame to be special and why? We have a tension here between consistency (which was our original design intention) and usability (this is what you wanted). |
@bkamins, from time to time, I find myself adding scalars to empty DataFrames when restructuring data. Wrapping scalars with |
@itsdfish thank you for the feedback. I think we can wait one or two days for other feedback and then decide. If there are no new voices I will implement the following rule:
@nalimilan I guess this is the invariant you wanted - right?. This will mean that we cannot broadcast vectors of positive length into For the
|
@nalimilan - I have slept over this and I am still unsure what is best 😢 (I include both
|
I'd suggest I'd also be OK with throwing errors when attempting to use |
@nalimilan Thank you for a prompt response. You are a robot 😄. Here are my comments to your suggestions (retaining your proposal, but showing the problematic cases):
must fail (it is a
also should fail. So essentially we have a choice:
The last condition (length and dimensionality) is relevant as this is what Base currently allows you to do:
|
@itsdfish + @grahamgill: I am leaning towards the second option (this is consistent with Base). You should get all you wanted initially except. For
(also @nalimilan - just please confirm that this is exactly what we want 😄 as this is tricky, because it will lead to |
@nalimilan - just to expand on my last comment. |
@bkamins, I'll defer to your recommendation. That looks good. |
If a new column is created by broadcasting a scalar into a data frame with existing columns, then the new column necessarily has as many rows as the others. If that can also work when there are 0 rows, that's what interests me, because it eliminates a lot of edge case checking. If the data frame has both 0 columns and 0 rows, then I'd prefer consistency, with the resulting data frame also having 0 rows. |
Thanks for the summary. I prefer the second option. I didn't realise I was stirring the pot so much with my initial question in #1889
|
Thank you both for the feedback. Really appreciated. @nalimilan - so we go your way I think broadcasting to |
Sounds good! |
I was talking with @mbauman about this and what I will implement is:
Thanks to all who contributed to the discussion. |
Broadcasting rules: Round 4. (and I hope the last round). What we do:
(and as a side effect - we need less code to express this) @nalimilan - this should be good for a final review. |
Co-Authored-By: Milan Bouchet-Valat <nalimilan@club.fr>
CI passes (we have only Coverage decrease as usual 😞) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small suggestion.
Co-Authored-By: Milan Bouchet-Valat <nalimilan@club.fr>
Thank you. I am going to merge it today and make a release (unless you see that we should wait - e.g. for #1887). |
Fine with me. |
Fix #1889
@grahamgill - in tests you can see what will be allowed and what will be still disallowed.