-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: Deprecate encoding from stata reader/writer #21400
Conversation
For 0.24 |
Codecov Report
@@ Coverage Diff @@
## master #21400 +/- ##
==========================================
+ Coverage 91.89% 91.89% +<.01%
==========================================
Files 153 153
Lines 49596 49597 +1
==========================================
+ Hits 45576 45577 +1
Misses 4020 4020
Continue to review full report at Codecov.
|
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -45,7 +45,7 @@ Other API Changes | |||
Deprecations | |||
~~~~~~~~~~~~ | |||
|
|||
- | |||
- :meth:`DataFrame.to_stata`, :meth:`read_stata`, :class:`StataReader` and :class:`StataWriter` have deprecated has deprecated ``encoding``. The encoding of a Stata dta file is determined by the file type and cannot be changed (:issue:`21244`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
encoding
--> theencoding
argument- "The encoding" --> "The encoding" (extra space).
pandas/tests/io/test_stata.py
Outdated
result = encoded.kreis1849[0] | ||
|
||
expected = raw.kreis1849[0] | ||
assert result == expected | ||
assert isinstance(result, compat.string_types) | ||
|
||
with tm.ensure_clean() as path: | ||
encoded.to_stata(path, encoding='latin-1', | ||
write_index=False, version=version) | ||
encoded.to_stata(path, write_index=False, version=version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make sure that the DeprecationWarning
is issued when we pass in the encoding
argument?
It was the relevant issue. I had to defer the depreciation past the point
release 23.1
…On Sat, Jun 9, 2018, 18:48 gfyoung ***@***.***> wrote:
@bashtage <https://github.com/bashtage> : How come your PR is closing
#21244 <#21244>, which is
already closed?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21400 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFU5Ra9K3dl6eDqZu9n-2iAPKa8D6axCks5t7ApTgaJpZM4UhRhX>
.
|
dc00bd8
to
1e39bb2
Compare
I added a test and fixed the whatsnew |
1e39bb2
to
f37e279
Compare
pandas/tests/io/test_stata.py
Outdated
with warnings.catch_warnings(record=True) as w: | ||
warnings.simplefilter("always") | ||
encoded = read_stata(self.dta_encoding, encoding='latin-1') | ||
assert len(w) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(repost since GitHub hid it due to recent push): Hmmm...I was thinking that you use tm.assert_produces_warning or pytest.warns so that we explicitly ensure that the warning is produced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this can be:
with tm.assert_produces_warning(FutureWarning):
encoded = read_stata(self.dta_encoding, encoding='latin-1')
(and the same for the other occurrence)
Deprecate the encoding parameter from all Stata reading and writing methods and classes. The encoding depends only on the file format and cannot be changed by users.
f37e279
to
4c02ceb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bashtage : For some reason, I missed the notification that you had patched the warnings setup in the tests. LGTM!
thanks @bashtage |
Deprecate the encoding parameter from all Stata reading and writing methods and classes. The encoding depends only on the file format and cannot be changed by users.
I cannot check right now, but I do not think that the sentence " The encoding depends only on the file format and cannot be changed by users." is quite true. If I remember correctly, pre-118 Stata files used, in an undocumented fashion, the native encoding, Latin-1 on Windows and MacRoman on OS X, I don't remember for Linux. Officially, special characters were not supported. I think there is a case to be made for the encoding keyword, but with old versions dying out I am not sure whether it is worth the effort. |
Deprecate the encoding parameter from all Stata reading and writing methods and classes. The encoding depends only on the file format and cannot be changed by users.
Deprecate the encoding parameter from all Stata reading and writing
methods and classes. The encoding depends only on the file format and
cannot be changed by users.
read_stata
always uses 'utf8' #21244git diff upstream/master -u -- "*.py" | flake8 --diff