Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Deprecate Series.to_csv signature #21896

Merged
merged 1 commit into from
Aug 13, 2018

Conversation

toobaz
Copy link
Member

@toobaz toobaz commented Jul 13, 2018

Just a proof of concept for discussion (misses docs, whatsnew, new tests). Based on #21868

@dahlbaek @gfyoung

@pep8speaks
Copy link

pep8speaks commented Jul 13, 2018

Hello @toobaz! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 02, 2018 at 20:52 Hours UTC

@gfyoung gfyoung added IO CSV read_csv, to_csv Compat pandas objects compatability with Numpy or Python functions labels Jul 13, 2018
@dahlbaek
Copy link
Contributor

This is the solution that feels the most "just get it done" to me, while not breaking any "correctly written" code (as far as I can tell...?).

European data

"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a few lines from the pull request of @gfyoung are missing here, something like

    from pandas.core.frame import DataFrame

    df = self if isinstance(self, DataFrame) else DataFrame(self)

If I am not mistaken, the purpose of those lines is to prepare for this method to be called by Series.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(added)

tupleize_cols = False

from pandas.io.formats.csvs import CSVFormatter
formatter = CSVFormatter(self, path_or_buf,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you agree with to comment above, self should be replaced by df here.

@toobaz
Copy link
Member Author

toobaz commented Jul 13, 2018

while not breaking any "correctly written"

Do you have in mind "uncorrectly written" code it could break?

(EDIT: OK: if you mean "code which does not raise errors", I understand)

@dahlbaek
Copy link
Contributor

dahlbaek commented Jul 13, 2018

Just the example of passing "y" instead of True as a positional argument to index.

(EDIT: Precisely that)

@codecov
Copy link

codecov bot commented Jul 14, 2018

Codecov Report

Merging #21896 into master will decrease coverage by <.01%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21896      +/-   ##
==========================================
- Coverage   92.06%   92.05%   -0.01%     
==========================================
  Files         169      169              
  Lines       50694    50709      +15     
==========================================
+ Hits        46671    46681      +10     
- Misses       4023     4028       +5
Flag Coverage Δ
#multiple 90.46% <83.33%> (-0.01%) ⬇️
#single 42.31% <33.33%> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/core/frame.py 97.25% <ø> (-0.02%) ⬇️
pandas/core/generic.py 96.49% <100%> (+0.01%) ⬆️
pandas/core/series.py 93.72% <75%> (-0.4%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7725fa0...e69c5ca. Read the comment docs.

@toobaz
Copy link
Member Author

toobaz commented Jul 14, 2018

Just the example of passing "y" instead of True as a positional argument to index.

Right, the previous version was wrong and incomplete. Should be fixed now, thanks.

In [2]: s = pd.Series(list('abcd'))

In [3]: s.to_csv('/tmp/test1.csv', 'y')
/home/nobackup/repo/pandas/pandas/core/series.py:3812: UserWarning: Argument 'header' has changed default value to True: please pass an explicit value to suppress this warning
  warnings.warn("Argument 'header' has changed default value to "

In [4]: cat /tmp/test1.csv
y0
0ya
1yb
2yc
3yd

In [5]: s.to_csv('/tmp/test1.csv', True)
/usr/bin/ipython3:1: FutureWarning: The signature of `Series.to_csv` was  aligned to that of `DataFrame.to_csv`. Note that the order of arguments changed, and the new one has 'sep' in first place, for which "True" is not a valid value. The old order will cease to be supported in a future version. Please refer to the documentation for `DataFrame.to_csv` when updating your function calls.
  #! /bin/sh
/home/nobackup/repo/pandas/pandas/core/series.py:3812: UserWarning: Argument 'header' has changed default value to True: please pass an explicit value to suppress this warning
  warnings.warn("Argument 'header' has changed default value to "

In [6]: cat /tmp/test1.csv
,0
0,a
1,b
2,c
3,d

@jreback
Copy link
Contributor

jreback commented Jul 14, 2018

is this replacing #21868 . - haven't been following 2 closely @toobaz @gfyoung

@gfyoung
Copy link
Member

gfyoung commented Jul 14, 2018

@jreback : No. Actually would like your or @jorisvandenbossche input, as the conversations have produced three possible fixes:

#19745 (@dahlbaek)
#21868 (@gfyoung)
#21896 (@toobaz)

Just need to resolve this before the conversations drag out forever 🙂

@jreback
Copy link
Contributor

jreback commented Jul 16, 2018

#21896 and #21868 (this one) look very similar at first glance, what is the substantive diff?

I prefer either of these over #19745 which I find much harder to read.

@gfyoung
Copy link
Member

gfyoung commented Jul 16, 2018

@jreback : @toobaz does major argument introspection by changing the signature to Series(*args, **kwargs). I (#21868) kept things a little simpler by adding only **kwargs to the current signature and doing argument introspection on **kwargs.

@toobaz
Copy link
Member Author

toobaz commented Jul 16, 2018

@toobaz does major argument introspection

Yes; one additional important change is that this PR tries to support both the old and the new positional signature (where "tries" means that it will never know for sure if some user had the idea to pass index="y" in place of index=True).

@gfyoung
Copy link
Member

gfyoung commented Jul 19, 2018

@jreback @jorisvandenbossche : Any further thoughts on #21896 and #21868? I think @toobaz, @dahlbaek , and I have exhausted our own thoughts 😄 , so fresh eyes would be good.

@gfyoung
Copy link
Member

gfyoung commented Jul 23, 2018

@jreback @jorisvandenbossche : Another friendly ping on this.

@jreback
Copy link
Contributor

jreback commented Jul 25, 2018

I like this version. I think bending over to give a helpful error message is pretty useful to the user. This is not completely crazy and doesn't add a huge amount of code. I think a couple of TODO's were left in the code.

@toobaz toobaz force-pushed the to_csv_unify_proof branch 2 times, most recently from a17429b to 7c5e18d Compare July 25, 2018 15:22
@gfyoung gfyoung added this to the 0.24.0 milestone Jul 25, 2018
@gfyoung
Copy link
Member

gfyoung commented Jul 25, 2018

@toobaz : Don't forget whatsnew entry as well.

@toobaz
Copy link
Member Author

toobaz commented Jul 25, 2018

@toobaz : Don't forget whatsnew entry as well.

Tests are missing too, but I tought that maybe the simplest thing would be for you to take my code in your PR. Otherwise I can do the opposite (not sure when I'll find the time though).

@gfyoung
Copy link
Member

gfyoung commented Jul 25, 2018

@toobaz : I can push to those changes to you branch. You have the changes, but just need the finishing touches, which I can add here.

@toobaz
Copy link
Member Author

toobaz commented Jul 25, 2018

@toobaz : I can push to your branch if that works.

Sure, feel free (I saw now only that you closed the other one)

@gfyoung gfyoung changed the title Proof of concept for #19715 based on #21868 DEPR: Deprecate Series.to_csv signature Jul 25, 2018
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only looking at this PR (I didn't follow all the other discussions), this looks good to me


# This entire method is going to be removed, so the shared docstring
# mechanism is overkill
to_csv.__doc__ = generic.NDFrame.to_csv.__doc__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not safe for builds without docstrings I think?
I also don't understand why using the shared docstring mechanism is overkill here? (it can simply be used to add a docstring from somewhere else, you can use without substituting values or so) Using that will simply be a one-line as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not safe for builds without docstrings I think?

Didn't know about those (link?) In general if func has no docstring, func.__doc__ is just None... but again, I might be missing something.

Using that will simply be a one-line as well.

Wouldn't that mean changing pandas/core/generic.py to insert the docstring inside _shared_docs (and undo after deprecation)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, forget about the comment about builds without docstring, I always forget the exact semantics: they don't remove the attribute, just set it to None. So it is things like to_csv.__doc__ = generic.NDFrame.to_csv.__doc__ + "appending" that then does not work.

Anyhow, you can do a

@Appender(generic.NDFrame.to_csv.__doc__)

which will do the same but with our machinery (and then you don't need the comment, 2 lines saved :-))

if kwargs.get("header", None) is None:
warnings.warn("The signature of `Series.to_csv` was aligned "
"to that of `DataFrame.to_csv`, and argument "
"'header' will change its default value to "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"change its default value to True" -> "change its default value from False to True" ?

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test all cases of the deprecation warnings? (maybe I missed some, but I don't see any tests for the positional args case)

if arg == "path":
kwargs = dict(path=path, header=False)
elif arg == "header":
kwargs = dict(path_or_buf=path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this case raise the warning? (as path_or_buf is the future keyword?)

Copy link
Member

@gfyoung gfyoung Jul 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because header will default to True in the future (instead of False as it was the case in Series.to_csv). Thus, users who don't specify the header argument will be "surprised" when that change in the default results in different output.

Thus, we are warning when header is not passed in explicitly to let them know that the default is changing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yes, missed that one

compressed = os.path.getsize(filename)
getattr(obj, method)(filename, compression=None)
uncompressed = os.path.getsize(filename)
assert uncompressed > compressed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would those cases raise any warnings?
(and same for the others below)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer as above.

Copy link
Member

@jorisvandenbossche jorisvandenbossche Jul 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toobaz can you add a comment explaining that? EDIT: I see now you already did :-)

@@ -383,6 +383,7 @@ Deprecations
- :meth:`MultiIndex.to_hierarchical` is deprecated and will be removed in a future version (:issue:`21613`)
- :meth:`Series.ptp` is deprecated. Use ``numpy.ptp`` instead (:issue:`21614`)
- :meth:`Series.compress` is deprecated. Use ``Series[condition]`` instead (:issue:`18262`)
- The signature in :meth:`Series.to_csv` has been deprecated. Please follow the signature in :meth:`DataFrame.to_csv` instead (:issue:`19715`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would give a bit more details here (or at least mention the most important ones: change in name of first argument + second positional argument is different, amongst other order changes)

if path is None:
return result
def to_csv(self, *args, **kwargs):
warning_klass = FutureWarning
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gfyoung I don't understand the need for this variable (and the following)

@Appender(generic.NDFrame.to_csv.__doc__)
def to_csv(self, *args, **kwargs):
warning_klass = FutureWarning
stack_level = 2
Copy link
Member Author

@toobaz toobaz Jul 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gfyoung I don't understand why we need those variables - they don't save code (actually the opposite) nor increase readability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I think you can just change this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use those variables in multiple places in this method to issue warnings about keyword arguments and the general deprecations. If you want to change this to copy and paste it everywhere, go ahead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to change this to copy and paste it everywhere, go ahead.

It's actually how it was, but OK, I will :-)

@toobaz
Copy link
Member Author

toobaz commented Aug 2, 2018

Rebased (on #22011 ) and added missing .. versionchanged:: 0.24.0 for header argument.

Lint complains with pandas/core/strings.py:296:80: E501 line too long (87 > 79 characters) but that's not a file I edited... it was unrelated

Copy link
Member

@gfyoung gfyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche jorisvandenbossche merged commit eb0ac54 into pandas-dev:master Aug 13, 2018
@jorisvandenbossche
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conform Series.to_csv to DataFrame.to_csv
6 participants