-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPR: Warn about Series.to_csv signature alignment #21868
Conversation
4dc9a6f
to
8dda35d
Compare
Codecov Report
@@ Coverage Diff @@
## master #21868 +/- ##
==========================================
+ Coverage 91.91% 91.91% +<.01%
==========================================
Files 164 164
Lines 49992 50008 +16
==========================================
+ Hits 45951 45967 +16
Misses 4041 4041
Continue to review full report at Codecov.
|
My understanding from #19745 was that the differences were limited ( Couldn't we change the first argument to Similarly, couldn't we set the default value of I know we would still not catch the ordering change, but
Apart from the comment above, my main problem with this PR is that it seems to enforce (at least if one wants to suppress the warning) passing |
pandas/core/series.py
Outdated
encoding=encoding, compression=compression, | ||
date_format=date_format, decimal=decimal) | ||
|
||
if path is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no valid call that fails this test, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by this question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz As far as I can tell, the default value of path
is None
, so if path
is not explicitly assigned another value, the branch is not executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking this over, wouldn't it be better to be even more conservative here? One of my gripes with the inconsistency between Series.to_csv
and DataFrame.to_csv
is that I am not always completely aware whether a given call will return a Series
or a DataFrame
. Thus, I may inadvertently think that I am working with a DataFrame
and pass the path_or_buf
keyword argument, when I am in fact working with a Series
. In that case, the warning would not trigger, but I would get unexpected behaviour due to the difference in default value of header
.
@toobaz : The original PR IMO is only a band-aid to the larger issue that If your main issue with this PR is that people need to pass in a keyword arg for A single keyword inconvenience is worth it for all of the keyword arguments and awkwardness that we will resolve with the fact that the functions are not synchronized, both for end users and us as maintainers. |
pandas/core/series.py
Outdated
|
||
# Result is only a string if no path provided, otherwise None. | ||
result = df.to_csv(**to_csv_kwargs) | ||
|
||
if path is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If path
is not None
, None
is implicitly returned. But in that case, the value of result
is also None
. So there is no reason to test the value of path
, it suffices to return result
in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I must be missing something obvious, but can you provide a valid example of calling this method with path=None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should think
import pandas as pd
pd.DataFrame({"a": [1, 2]}).to_csv()
would be a valid call, and would print the contents to screen. Maybe I am wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is indeed a perfectly valid way to use path=None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was indeed missing something obvious ;-)
Regarding silencing the warning, perhaps one could add to the warning message that it can be silenced by explicitly casting the |
We might as well state that we will completely remove |
Again, having a seldomly used function is not a good argument, to me at least, to make it more annoying to use. |
Removing
Note that the third option is simply the user implementing the fix by hand, which is why it is future-proof and does not emit warnings. I think those all seem like reasonable options for a short deprecation period. |
So let me meet you guys half-way:
How does that sound? |
I think that's reasonable... I just think that moving |
@toobaz : I'm not sure I fully understand you here. What do you mean by "moving path as last argument" ? It sounds like you're just repeating what I did currently in this PR? |
My point is: if a given thing works now and will work in the future in the same way, no deprecation period should force users to do things otherwise. In this case, the "thing" is "passing a filename as first positional argument", which (correct me if I'm wrong) works exactly as it should, and will. Contrast this with the warning for |
My idea is that def to_csv(self, path_or_buf=None,
[rest of signature],
path=None)
if path is not None:
# emit warning about change of argument name
assert(path_or_buf is None), "You cannot pass both 'path' and 'path_or_buf'."
path_or_buf = path
[...] where This will be complemented with the warning for if isinstance(sep, bool):
# emit warning about the change of arguments ordering
raise ValueError This way, we inform user of the old |
Hmm...I see...IMO that's bending over backwards a little too much duplicating (and bloating the signature) like that. I would leave it at warning regarding the |
Just to be sure we are on the same page: it would obviously be a temporary measure, |
OK. Would your comment change if I proposed the following rather than the above?
Concerning the possibility that |
@toobaz : Ah...that I think we can do. Nice! |
I meant the second except |
I'm not sure I understand. No behaviour which passes more than one positional argument ought to be legitimate both now and in the future (since df.to_csv("test.csv", "y", header=False) would not be the same before as after, and therefore should at least emit a warning.
I'm not sure I understand how you would easily go about emitting a warning by checking the type of the second positional argument. How do you know which arguments were passed as positional? But if there is an easy way, I am very interested in seeing it! |
Passing positional arguments itself is. So we will not forbid it (if we can). Or in other terms: if people want to use positional arguments now and in the future, making them change their calls once is acceptable; twice is not.
As I explained above, apart from the first two arguments, which work the same in the old and new signature ( |
I see, so you would simply check the type of Since the decorator seems out of the question, I would support your solution. It has the drawback that it, instead of providing the user with a warning ahead of time, will directly break most old code relying on positional arguments. It also will not catch edge cases with poor code, like df.to_csv("test.csv", "y", header=False) |
How to detect the old signature and what to do when we detect it are two different problems. Once more: we can temporarily support the old signature and the new, it's just a matter of reordering (conditional on dtype of |
Again, edge cases: df.to_csv("test.csv", "y", header=False) Since python is weakly and dynamically typed, you cannot reliably infer which signature the user had in mind by looking at the type of the second positional argument — one-character strings are legitimate values for both As far as I can tell, you have to choose between "more than one positional argument will cause a non-disableable warning during deprecation period" and "some edge cases may not trigger warnings". |
In your reply, you are again mixing detection and consequences, let's try to keep the two separated ;-) Anyway, about detection: I know we would not catch But again, I really don't think we care. And again, this is a matter of how we catch the wrong ordering... ... however we do it, we can reorder if we want. |
Not really, my point was precisely that one cannot reliably detect which signature the user has in mind, and therefore one should choose a consequence which takes this lack of reliability into account. That is, detection and consequences are related. For clarity, I even broke the reply up into two paragraphs, one for detection and one for consequences.
I don't follow — wouldn't those two tests produce the same exact outcome in this case? I.e. classify the call as following the new signature?
This is a fair point, and why I said I would be in favor of you solution.
No, you cannot reorder correctly if you cannot catch correctly. That's why I would prefer a catch-all warning or a decorator-based warning, letting the user fix the problem on their end. |
You're right. It seems to me like the only thing missing is a warning to those users who pass more than a single positional argument and pass a value to df.to_csv("test.csv", False, header=False) should emit a warning in my opinion, whereas calls such as df.to_csv("test.csv", index=False, header=False) should not. This is easy to do with a decorator, but I cannot come up with any other simple solution. |
What I was mentioning would be to change the signature to Assuming we don't want to do this, then I think we agree on not considering |
... which is an argument for reordering (or for the implicit signature). The change that really can't be breaking is the one after the deprecation period. |
@toobaz : I'm not sure what you mean. The one that can't be breaking is the deprecation. The one after the deprecation period is breaking. That's why we warn users about it first. That's why I'm not in favor of breaking changes the signature at this current time. As for |
I mean that the ideal deprecation cycle will let you change your code at any time during the deprecation period, with warnings urging you to do so. As I said, we shouldn't feel obliged to support positional indexing at all, it's a choice. But supporting both styles is technically feasible, with absolute certainty, and is the only way to guarantee a smooth deprecation cycle to whoever uses positional indexing.
If your main argument is "I'm writing this PR, my tastes matter", then... OK, I'm out of arguments. |
@toobaz : Actually, I'm just expressing my personal opinion, but as I said earlier, this conversation deadlocked previously #19745 (comment). If you think that doing argument introspection is feasible (and not overly messy), then go for it. In fact, you can modify #19745 if you want to illustrate your point. Let's make something concrete out of this conversation before it drags out forever in theory world 🙂 |
We're pushing to merge #21896 over this one. |
Warns about aligning
Series.to_csv
's signature with that ofDataFrame.to_csv
's.In anticipation, we have moved
DataFrame.to_csv
togeneric.py
so that we can later delete theSeries.to_csv
implementation, and allow it to adoptDataFrame
'sto_csv
due to inheritance.Closes #19745.
cc @dahlbaek