Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Allow IOErrors when attempting to retrieve default client encoding. #21531

Merged
merged 23 commits into from
Sep 19, 2018
Merged

BUG: Allow IOErrors when attempting to retrieve default client encoding. #21531

merged 23 commits into from
Sep 19, 2018

Conversation

JayOfferdahl
Copy link
Contributor

@JayOfferdahl JayOfferdahl commented Jun 18, 2018

When using mod_wsgi, access to sys.stdout is restricted by default. To handle this case, catch a base exception instead of a more specific AttributeError during this process to include the thrown IOError by mod_wsgi.

…ing.

  instead of a more specific AttributeError to include the thrown IOError.
@JayOfferdahl JayOfferdahl changed the title BUG: Except IOErrors when attempting to retrieve default client encoding. BUG: Allow IOErrors when attempting to retrieve default client encoding. Jun 18, 2018
@WillAyd
Copy link
Member

WillAyd commented Jun 18, 2018

  • Is this referencing a particular issue?
  • Is there any other option besides catching base Exception (tend to avoid this if we can)
  • Can you add tests?

@JayOfferdahl
Copy link
Contributor Author

  • Is this referencing a particular issue?
    • I haven't raised an issue for this as it just came up this morning.
  • Is there any other option besides catching base Exception (tend to avoid this if we can)
    • We could instead widen the exception clause to just AttributeError and IOError.
  • Can you add tests?
    • Will do.

@WillAyd
Copy link
Member

WillAyd commented Jun 18, 2018

OK thanks - if you could open an issue and reference it then that would be preferable (typically PRs are only allowed without an issue if they are DOC updates).

You'll also want a whatsnew note to describe the bug and reference the issue. Next minor release is 0.23.2 at this point

@codecov
Copy link

codecov bot commented Jun 18, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@4e0b636). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #21531   +/-   ##
=========================================
  Coverage          ?   92.18%           
=========================================
  Files             ?      169           
  Lines             ?    50739           
  Branches          ?        0           
=========================================
  Hits              ?    46774           
  Misses            ?     3965           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.59% <100%> (?)
#single 42.34% <0%> (?)
Impacted Files Coverage Δ
pandas/io/formats/console.py 75.75% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e0b636...81547da. Read the comment docs.

@@ -21,7 +21,7 @@ def detect_console_encoding():
encoding = None
try:
encoding = sys.stdout.encoding or sys.stdin.encoding
except AttributeError:
except Exception:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we just except both on AttributeError and IOError ? I generally try to avoid using the broad Exception unless we have good reason to.

@gfyoung gfyoung added Bug IO Data IO issues that don't fit into a more specific label labels Jun 18, 2018
JayOfferdahl and others added 3 commits June 19, 2018 22:19
)

* When using mod_wsgi, access to sys.stdout is restricted by default. To handle
  this case, catch IOErrors alongside the already handled AttributeErrors.
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok, getting to pass travis might be a bit tricky.

assert detect_console_encoding() == 'foo'


def test_detect_console_encoding_stdin(monkeypatch):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some comments on what is being tested here.

@@ -65,7 +65,7 @@ Bug Fixes
**I/O**

- Bug in :func:`read_csv` that caused it to incorrectly raise an error when ``nrows=0``, ``low_memory=True``, and ``index_col`` was not ``None`` (:issue:`21141`)
-
- Bug in :meth:`detect_client_encoding` where potential IOError goes unhandled when importing in a mod_wsgi process due to restricted access to stdout. (:issue:`21552`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks on IOError

@pep8speaks
Copy link

pep8speaks commented Jun 20, 2018

Hello @JayOfferdahl! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 23, 2018 at 03:18 Hours UTC

@JayOfferdahl
Copy link
Contributor Author

Thanks for the feedback, made some changes to tests to cover more cases and provide some insight as to what they're doing. Is there anything special I need to do to get this working with travis?

@jreback
Copy link
Contributor

jreback commented Jun 21, 2018

looks like it might be tricky to monkeypatch how you are doing it: https://travis-ci.org/pandas-dev/pandas/jobs/394566147

does this test run locally?

@JayOfferdahl
Copy link
Contributor Author

Interesting...it runs locally just fine...I'll probably need to patch stdout/stdin like the other tests and provide the property on that patched object.

@JayOfferdahl
Copy link
Contributor Author

I've updated the test to mock out stdout/stdin instead of trying to change their encoding attribute. I do the same in other tests so I'm assuming this will work fine with travis.

Apologies for the slow turn around on this one!

super(MockEncoding, self).__init__()
self.encoding = encoding

monkeypatch.setattr('sys.{}'.format(empty), MockEncoding(''))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance you saw the context manager for monkeypatch? Seems to be preferred for patching standard library items (may require a version bump for pytest dependency).

@JayOfferdahl
Copy link
Contributor Author

@WillAyd I've taken a look at the context manager and have updated these tests. Looks like I missed the 0.23.2 release...my own fault, of course. Should I put my note in the 0.24 whatsnew document? Not sure how you all handle bugs in non-minor release versions.

Side note: I did have to upgrade pytest to 3.6.3, I'm almost certain that this will break travis since it won't have that version (maybe I'm wrong) -- where does this get specified?

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JayOfferdahl no worries on timing - we tentatively have a 0.23.3 bug fix release that you can move the whatsnew entry to. Looks like you also have some LINTing errors to address

@jreback do you have any opposition to increasing the min version of pytest to 3.6? It's a relatively new release but the context manager for monkey patching I think is a useful and clean feature

['stdout', 'stdin']
])
def test_detect_console_encoding_from_stdout_stdin(monkeypatch, empty, filled):
# Ensures that when sys.stdout.encoding or sys.stdin.encoding is used when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can add a reference to the issue for this and the other new tests, i.e. # GH 21552

lambda: 'ascii'
])
def test_detect_console_encoding_fallback_to_locale(monkeypatch, stdEncoding):
monkeypatch.setattr('locale.getpreferredencoding', lambda: 'foo')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can put this in the context manager, no?

@jreback
Copy link
Contributor

jreback commented Jul 13, 2018

no problem bumping pytest version (separate PR)

@JayOfferdahl
Copy link
Contributor Author

Inline with the class name gave the same error...ended up putting this in the docstring. This should still be just as easy to discover, I assume.


class MockEncoding(object):
"""
TODO(py27): replace with mock
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see the LINT errors before but maybe you got it mixed up with the one I've commented on below? I still think inline with the class block opening would be preferable. Certainly a nit, but I haven't seen TODOs embedded in docstrings like this before

@property
def encoding(self):
return raiseOrReturn(self.val)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current LINT error is because you are missing a blank line here

def encoding(self):
return raiseOrReturn(self.val)

def raiseOrReturn(val):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer PEP8 naming, so let's call this raise_or_return

assert detect_console_encoding() == filled


@pytest.mark.parametrize('stdEncoding', [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 so std_encoding or simply encoding

* It might be apparent what I styleguide I use!!
@JayOfferdahl
Copy link
Contributor Author

Changes made. I was using the regular pylint package and not flake8, which explains the difference. Apologies, and thanks for spending so much time on this, I really appreciate it.

return raise_or_return(self.val)


def raise_or_return(val):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved as an instance method of MockEncoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also using it down below to test when defaulting to system default -- it covers the same goal of throwing an exception like above here. I could make it a static method of MockEncoding, but I thought using that object for both cases wouldn't make sense since the encoding property is only being consumed from accesses by stdout or stdin. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd did you see this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a class method just so all of the relevant constructs stay together

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 minor comments

return raise_or_return(self.val)


def raise_or_return(val):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a class method just so all of the relevant constructs stay together


@pytest.mark.parametrize('std,locale', [
[MockEncoding('ascii'), lambda: 'ascii'],
[MockEncoding('ascii'), lambda: raise_or_return(Exception)],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of mixing usage of raise_or_return here for each parameter just do it in the body of the function; should work for all parameters since in case of a string it returns the string anyway

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm so long as it goes green. @jreback

@WillAyd
Copy link
Member

WillAyd commented Aug 13, 2018

@JayOfferdahl can you rebase?

@JayOfferdahl
Copy link
Contributor Author

Of course, is the general work flow to just squash all of my changes down into one commit and rebase on top of master or should I merge master into this?

@WillAyd
Copy link
Member

WillAyd commented Aug 16, 2018

Merge master in

@@ -673,7 +673,7 @@ I/O

- :func:`read_html()` no longer ignores all-whitespace ``<tr>`` within ``<thead>`` when considering the ``skiprows`` and ``header`` arguments. Previously, users had to decrease their ``header`` and ``skiprows`` values on such tables to work around the issue. (:issue:`21641`)
- :func:`read_excel()` will correctly show the deprecation warning for previously deprecated ``sheetname`` (:issue:`17994`)
-
- Bug in :meth:`detect_client_encoding` where potential ``IOError`` goes unhandled when importing in a mod_wsgi process due to restricted access to stdout. (:issue:`21552`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move to 0.23.5? @jreback

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a regression? If not, I would probably stay with 0.24.0 unless there is a compelling reason to backport-patch this bug.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm @jreback

@JayOfferdahl
Copy link
Contributor Author

Any update here?

Copy link
Member

@gfyoung gfyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that this fell through the cracks! This looks good to merge from me as well.

cc @jreback

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have other console tests that can go in here?

@jreback jreback added this to the 0.24.0 milestone Sep 8, 2018
@jreback
Copy link
Contributor

jreback commented Sep 18, 2018

this is fine. can you rebase, and ping on green (@WillAyd / @gfyoung ok to merge then)

@WillAyd WillAyd merged commit bf29988 into pandas-dev:master Sep 19, 2018
@WillAyd
Copy link
Member

WillAyd commented Sep 19, 2018

Thanks @JayOfferdahl !

@JayOfferdahl
Copy link
Contributor Author

Thank you, and thanks for the help throughout the process!

@JayOfferdahl JayOfferdahl deleted the get-client-encoding-on-mod-wsgi branch September 19, 2018 20:18
aeltanawy pushed a commit to aeltanawy/pandas that referenced this pull request Sep 20, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: mod_wsgi restricting access to stdout throws unhandled IOError.
5 participants