MAINT: Deprecate encoding from stata reader/writer #21400

bashtage · 2018-06-09T10:22:35Z

Deprecate the encoding parameter from all Stata reading and writing
methods and classes. The encoding depends only on the file format and
cannot be changed by users.

closes BUG: read_stata always uses 'utf8' #21244
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

bashtage · 2018-06-09T10:22:44Z

For 0.24

codecov · 2018-06-09T11:21:09Z

Codecov Report

Merging #21400 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #21400      +/-   ##
==========================================
+ Coverage   91.89%   91.89%   +<.01%     
==========================================
  Files         153      153              
  Lines       49596    49597       +1     
==========================================
+ Hits        45576    45577       +1     
  Misses       4020     4020

Flag	Coverage Δ
#multiple	`90.29% <100%> (ø)`	⬆️
#single	`41.86% <100%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.23% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 415012f...4c02ceb. Read the comment docs.

gfyoung · 2018-06-09T17:47:38Z

@bashtage : How come your PR is closing #21244, which is already closed?

gfyoung · 2018-06-09T17:48:11Z

doc/source/whatsnew/v0.24.0.txt

@@ -45,7 +45,7 @@ Other API Changes
 Deprecations
 ~~~~~~~~~~~~

-
+- :meth:`DataFrame.to_stata`, :meth:`read_stata`, :class:`StataReader` and :class:`StataWriter` have deprecated has deprecated ``encoding``.  The   encoding of a Stata dta file is determined by the file type and cannot be changed (:issue:`21244`).


encoding --> the encoding argument

"The encoding" --> "The encoding" (extra space).

gfyoung · 2018-06-09T17:50:35Z

pandas/tests/io/test_stata.py

        result = encoded.kreis1849[0]

        expected = raw.kreis1849[0]
        assert result == expected
        assert isinstance(result, compat.string_types)

        with tm.ensure_clean() as path:
-            encoded.to_stata(path, encoding='latin-1',
-                             write_index=False, version=version)
+            encoded.to_stata(path, write_index=False, version=version)


Can we make sure that the DeprecationWarning is issued when we pass in the encoding argument?

bashtage · 2018-06-09T18:25:15Z

It was the relevant issue. I had to defer the depreciation past the point release 23.1

…

On Sat, Jun 9, 2018, 18:48 gfyoung ***@***.***> wrote: @bashtage <https://github.com/bashtage> : How come your PR is closing #21244 <#21244>, which is already closed? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#21400 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFU5Ra9K3dl6eDqZu9n-2iAPKa8D6axCks5t7ApTgaJpZM4UhRhX> .

bashtage · 2018-06-09T20:33:06Z

I added a test and fixed the whatsnew

gfyoung · 2018-06-09T22:33:21Z

pandas/tests/io/test_stata.py

+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            encoded = read_stata(self.dta_encoding, encoding='latin-1')
+            assert len(w) == 1


(repost since GitHub hid it due to recent push): Hmmm...I was thinking that you use tm.assert_produces_warning or pytest.warns so that we explicitly ensure that the warning is produced.

Yes, I think this can be:

with tm.assert_produces_warning(FutureWarning): encoded = read_stata(self.dta_encoding, encoding='latin-1')

(and the same for the other occurrence)

Deprecate the encoding parameter from all Stata reading and writing methods and classes. The encoding depends only on the file format and cannot be changed by users.

gfyoung

@bashtage : For some reason, I missed the notification that you had patched the warnings setup in the tests. LGTM!

cc @jorisvandenbossche

jreback · 2018-06-12T00:05:51Z

thanks @bashtage

Deprecate the encoding parameter from all Stata reading and writing methods and classes. The encoding depends only on the file format and cannot be changed by users.

hmgaudecker · 2018-09-20T15:13:37Z

I cannot check right now, but I do not think that the sentence " The encoding depends only on the file format and cannot be changed by users." is quite true. If I remember correctly, pre-118 Stata files used, in an undocumented fashion, the native encoding, Latin-1 on Windows and MacRoman on OS X, I don't remember for Linux. Officially, special characters were not supported.

I think there is a case to be made for the encoding keyword, but with old versions dying out I am not sure whether it is worth the effort.

Deprecate the encoding parameter from all Stata reading and writing methods and classes. The encoding depends only on the file format and cannot be changed by users.

gfyoung added Deprecate Functionality to remove in pandas IO Stata read_stata, to_stata labels Jun 9, 2018

gfyoung reviewed Jun 9, 2018

View reviewed changes

bashtage force-pushed the deprecate-stata-encoding branch 2 times, most recently from dc00bd8 to 1e39bb2 Compare June 9, 2018 20:31

bashtage force-pushed the deprecate-stata-encoding branch from 1e39bb2 to f37e279 Compare June 9, 2018 22:26

gfyoung reviewed Jun 9, 2018

View reviewed changes

MAINT: Deprecate encoding from stata reader/writer

4c02ceb

Deprecate the encoding parameter from all Stata reading and writing methods and classes. The encoding depends only on the file format and cannot be changed by users.

bashtage force-pushed the deprecate-stata-encoding branch from f37e279 to 4c02ceb Compare June 10, 2018 09:56

gfyoung approved these changes Jun 11, 2018

View reviewed changes

jreback added this to the 0.24.0 milestone Jun 12, 2018

jreback merged commit 66d9b15 into pandas-dev:master Jun 12, 2018

jsexauer mentioned this pull request Jun 12, 2018

DEPR: Clean up list of deprecations from prior versions #6581

Closed

1 task

jreback mentioned this pull request Jun 12, 2018

BUG: Fix handling of encoding for the StataReader #21244 #21246

Closed

4 tasks

bashtage deleted the deprecate-stata-encoding branch September 20, 2018 15:49

TomAugspurger mentioned this pull request Nov 16, 2018

UnicodeDecodeError with Latin-1 characters in Stata files #23736

Closed

jreback mentioned this pull request Nov 20, 2019

DEPR: deprecations log for removed issues #13777

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: Deprecate encoding from stata reader/writer #21400

MAINT: Deprecate encoding from stata reader/writer #21400

bashtage commented Jun 9, 2018

bashtage commented Jun 9, 2018

codecov bot commented Jun 9, 2018 •

edited

Loading

gfyoung commented Jun 9, 2018

gfyoung Jun 9, 2018

gfyoung Jun 9, 2018

bashtage commented Jun 9, 2018 via email

bashtage commented Jun 9, 2018

gfyoung Jun 9, 2018

jorisvandenbossche Jun 10, 2018

gfyoung left a comment •

edited

Loading

jreback commented Jun 12, 2018

hmgaudecker commented Sep 20, 2018

MAINT: Deprecate encoding from stata reader/writer #21400

MAINT: Deprecate encoding from stata reader/writer #21400

Conversation

bashtage commented Jun 9, 2018

bashtage commented Jun 9, 2018

codecov bot commented Jun 9, 2018 • edited Loading

Codecov Report

gfyoung commented Jun 9, 2018

gfyoung Jun 9, 2018

Choose a reason for hiding this comment

gfyoung Jun 9, 2018

Choose a reason for hiding this comment

bashtage commented Jun 9, 2018 via email

bashtage commented Jun 9, 2018

gfyoung Jun 9, 2018

Choose a reason for hiding this comment

jorisvandenbossche Jun 10, 2018

Choose a reason for hiding this comment

gfyoung left a comment • edited Loading

Choose a reason for hiding this comment

jreback commented Jun 12, 2018

hmgaudecker commented Sep 20, 2018

codecov bot commented Jun 9, 2018 •

edited

Loading

gfyoung left a comment •

edited

Loading