Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unicode message in exceptions #2348

Conversation

daspecster
Copy link
Contributor

@daspecster daspecster commented Sep 19, 2016

See #2346

@dhermes looking at the magic methods and playing with __bytes__ and __unicode__, the solutions I had felt really hacky. i.e. checking for six.PY3 in __str__ and then just calling self.__bytes__() instead.

I'm not sure if it's good form to use __future__ here but it seems to be a valid way to solve the issue.

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Sep 19, 2016
@dhermes
Copy link
Contributor

dhermes commented Sep 19, 2016

@daspecster Ping me when Travis passing? (I'm working on release notes right now so a bit scatterbrained.)

@tseaver
Copy link
Contributor

tseaver commented Sep 19, 2016

I'm not in favor of the from __future__ import unicode_literals wallpaper. Let's do the check for unicode ourselves.

@daspecster
Copy link
Contributor Author

@tseaver ok, can do.

@dhermes
Copy link
Contributor

dhermes commented Sep 19, 2016

@daspecster Agree with @tseaver. That won't help detect bytes passed in, just enforce it on whatever we type.

@daspecster
Copy link
Contributor Author

Here's a cleaner way to handle it, however when running the follow test with python 2.7, you get an exception thrown by the print function.

exception_test.py

from google.cloud import exceptions
try:
    a = u'That\u2019s all we know'
    print(a)
    raise exceptions.NotFound(a)
except exceptions.GoogleCloudError as e:
    print(e)

Output:

╰─$ python test.py
That’s all we know
Traceback (most recent call last):
  File "test.py", line 7, in <module>
    print(e)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 8: ordinal not in range(128)

If that is not something we're concerned with then we are good to go.
Otherwise I can detect the python version(if six.PY2) and then convert accordingly.

exception = self._callFUT(response, content)
self.assertTrue(isinstance(exception, NotFound))

self.assertEqual(exception.message, u'That\u2019s all we know.')

This comment was marked as spam.

@dhermes
Copy link
Contributor

dhermes commented Sep 19, 2016

@daspecster You shouldn't have to branch for six.PY2. Hopefully there is a way to hook into __str__ for our exception classes. (DERP) Putting the following in foo.py

from __future__ import print_function


class NotFound(Exception):

    def __init__(self, message):
        super(NotFound, self).__init__(message)
        self.message = message
        self._errors = ()

    def __str__(self):
        return '%d %s' % (404, self.message)


VAL = u'it\u2019s all good'
print(VAL)

print('=' * 40)

EXC = NotFound(VAL)
print(EXC)

we see the failure

$ python2.7 foo.py
it’s all good
========================================
Traceback (most recent call last):
  File "foo.py", line 27, in <module>
    print(EXC)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)
$ python3.4 foo.py
it’s all good
========================================
404 it’s all good
$ python3.5 foo.py
it’s all good
========================================
404 it’s all good

NOTE This min-repro was too min.

Putting the following in foo.py

from __future__ import print_function
print(u'it\u2019s all good')

it works as expected:

$ python2.7 foo.py
it’s all good
$ python3.4 foo.py
it’s all good
$ python3.5 foo.py
it’s all good

@dhermes
Copy link
Contributor

dhermes commented Sep 19, 2016

OK, here is the problem: In Python 2, the result of __str__ can't be of type unicode if it can't be cast back to str/bytes using the ascii encoding. So we could use _helpers._to_bytes

@daspecster
Copy link
Contributor Author

daspecster commented Sep 19, 2016

I was thinking about _to_bytes but it just seems like there should be a clean way to do this. _to_bytes > unicode_literals for sure though.

Should we used _to_bytes in __bytes__ and then just call __bytes__ from __str__?
Seems..not right?

Since the distinction between string and unicode has been done away with in Python 3, __unicode__ is gone and __bytes__ (which behaves similarly to __str__ and __unicode__ in 2.7) exists for a new built-in for constructing byte arrays.

From the bottom of that link you posted before.

@daspecster
Copy link
Contributor Author

@dhermes, so is this just a REPL/print issue then?

That is the solution for #2346 is just to from __future__ import print_function?

@dhermes
Copy link
Contributor

dhermes commented Sep 20, 2016

No. The issue is __str__. It needs a return value that is bytes / str or unicode that can be converted directly to ASCII (that's why you see the error, they use the default encoding which is ASCII)

response = _Response(404)
content = u'{"error": {"message": "That\u2019s all we know."}}'
exception = self._callFUT(response, content)
self.assertTrue(isinstance(exception, NotFound))

This comment was marked as spam.

@daspecster
Copy link
Contributor Author

daspecster commented Sep 20, 2016

I'm apparently missing something here.

I have the code you demonstrated above, but I'm still getting the error.

from __future__ import print_function


class NotFound(Exception):

    def __init__(self, message):
        super(NotFound, self).__init__(message)
        self.message = message
        self._errors = ()

    def __str__(self):
        return '%d %s' % (404, self.message)

VAL = u'it\u2019s all good'
print(VAL)

print('=' * 40)

EXC = NotFound(VAL)
print(EXC)
╰─$ python2.7 test.py                                                                                         1 ↵
it’s all good
========================================
Traceback (most recent call last):
  File "test.py", line 20, in <module>
    print(EXC)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)

@dhermes
Copy link
Contributor

dhermes commented Sep 20, 2016

@daspecster The example I posted doesn't fix it, I just posted it because my original snippet was busted.

The issue is that print(...) tries to cast to str in Python, which is essentially amounts to calling encode() on the unicode object.

Python 2.7.12 (default, Jul 17 2016, 01:21:14) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> class A(object):
...     def __str__(self):
...         return u'\u2019'
... 
>>> a = A()
>>> a.__str__()
u'\u2019'
>>> str(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 0: ordinal not in range(128)
>>> a.__str__().encode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 0: ordinal not in range(128)
>>> unicode(a)
u'\u2019'

@@ -41,7 +41,7 @@ def __init__(self, message, errors=()):
self._errors = errors

def __str__(self):
return '%d %s' % (self.code, self.message)
return u'%d %s' % (self.code, self.message)

This comment was marked as spam.

This comment was marked as spam.

def test_hit_w_content_as_unicode(self):
from google.cloud.exceptions import NotFound
response = _Response(404)
content = u'{"error": {"message": "%s" }}' % self._EXPECTED_MESSAGE

This comment was marked as spam.

@@ -48,6 +48,8 @@ def test_ctor_explicit(self):

class Test_make_exception(unittest.TestCase):

_EXPECTED_MESSAGE = u'That\u2019s not found.'

This comment was marked as spam.

@@ -41,7 +43,10 @@ def __init__(self, message, errors=()):
self._errors = errors

def __str__(self):
return '%d %s' % (self.code, self.message)
result = u'%d %s' % (self.code, self.message)
if six.PY2: # pragma: NO COVER Python2

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

return '%d %s' % (self.code, self.message)
result = u'%d %s' % (self.code, self.message)
if six.PY2:
result = _to_bytes(result)

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

@daspecster
Copy link
Contributor Author

@dhermes PTAL when you have a chance.

return '%d %s' % (self.code, self.message)
result = u'%d %s' % (self.code, self.message)
if six.PY2:
result = _to_bytes(result)

This comment was marked as spam.

return '%d %s' % (self.code, self.message)
result = u'%d %s' % (self.code, self.message)
if six.PY2:
result = result.encode('ascii', 'ignore')

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

@@ -21,6 +21,7 @@
import json
import six


This comment was marked as spam.

if six.PY3: # pragma: NO COVER
_ERROR_MESSAGE = u'That\u2019s not found.'
elif six.PY2: # pragma: NO COVER
_ERROR_MESSAGE = 'Thats not found.'

This comment was marked as spam.

@daspecster
Copy link
Contributor Author

daspecster commented Sep 21, 2016

I had to remove this test for _to_bytes. Actually I can just remove the assertRaises.

    def test_unicode_non_ascii(self):
        value = u'\u2013'  # Long hyphen
        encoded_value = b'\xe2\x80\x93'
        self.assertRaises(UnicodeEncodeError, self._callFUT, value)
        self.assertEqual(self._callFUT(value, encoding='utf-8'),
                         encoded_value)

Also, to get tests for py27 and py35 to pass and cover, I had to check for two message strings.

@dhermes
Copy link
Contributor

dhermes commented Sep 21, 2016

I had to remove a test for_to_bytes to get this to pass.

Huh? Why should that be necessary?

@@ -455,7 +455,7 @@ def _datetime_to_rfc3339(value, ignore_zone=True):
return value.strftime(_RFC3339_MICROS)


def _to_bytes(value, encoding='ascii'):
def _to_bytes(value, encoding='utf-8'):

This comment was marked as spam.

@@ -736,7 +736,6 @@ def test_with_unicode(self):
def test_unicode_non_ascii(self):
value = u'\u2013' # Long hyphen
encoded_value = b'\xe2\x80\x93'
self.assertRaises(UnicodeEncodeError, self._callFUT, value)

This comment was marked as spam.

_ERROR_MESSAGE = u'That\'s not found.'

with _Monkey(six, PY2=False):
with _Monkey(six, PY3=True):

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

from google.cloud.exceptions import NotFound
_ERROR_MESSAGE = u'That\u2019s not found.'
_EXPECTED = ('404 That\xe2\x80\x99s not found.',
'404 %s' % (_ERROR_MESSAGE,))

This comment was marked as spam.

This comment was marked as spam.

@daspecster daspecster force-pushed the fix-google-cloud-error-exception-message branch 3 times, most recently from 72b29d6 to 8767a4c Compare September 21, 2016 17:28
import six
from unit_tests._testing import _Monkey
from google.cloud.exceptions import NotFound
_ERROR_MESSAGE = u'That\'s not found.'

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

content = u'{"error": {"message": "%s" }}' % (_ERROR_MESSAGE,)

exception = self._callFUT(response, content)
self.assertIn(str(exception), _EXPECTED)

This comment was marked as spam.

@tseaver
Copy link
Contributor

tseaver commented Sep 21, 2016

@daspecster Travis build cancelled. Is there another push coming?

@daspecster daspecster force-pushed the fix-google-cloud-error-exception-message branch from 8767a4c to 9240117 Compare September 21, 2016 18:35
@daspecster
Copy link
Contributor Author

@tseaver yup! Sorry, I was trying to get my commits GPG verified.

@daspecster
Copy link
Contributor Author

@dhermes, @tseaver PTAL. I'm ready to move on from this one lol.

Copy link
Contributor

@dhermes dhermes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@daspecster daspecster merged commit e9660a8 into googleapis:master Sep 22, 2016
@daspecster daspecster deleted the fix-google-cloud-error-exception-message branch September 22, 2016 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: core cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants