Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unicode decode error in logging handlers #5258

Merged
merged 5 commits into from
Sep 7, 2016
Merged

Fix unicode decode error in logging handlers #5258

merged 5 commits into from
Sep 7, 2016

Conversation

Gobberwart
Copy link
Contributor

@Gobberwart Gobberwart commented Sep 7, 2016

Short Description:

If formatted_msg is a string (eg. "Forts cached."), bot crashes with error:

[2016-09-07 10:55:48] [MainThread] [sentry.errors.uncaught] [ERROR] [u"UnicodeDecodeError: 'utf8' codec can't decode byte 0x82 in position 32: invalid start byte", u'  File ".\\pokecli.py", line 803, in <module>', u'  File ".\\pokecli.py", line 161, in main', u'  File "C:\\Users\\gobbe\\AppData\\Local\\GitHub\\Gobberwart\\PokemonGo-Bot\\pokemongo_bot\\__init__.py", line 684, in tick', u'  File "C:\\Users\\gobbe\\AppData\\Local\\GitHub\\Gobberwart\\PokemonGo-Bot\\pokemongo_bot\\cell_workers\\move_to_fort.py", line 84, in work', u'  File "C:\\Users\\gobbe\\AppData\\Local\\GitHub\\Gobberwart\\PokemonGo-Bot\\pokemongo_bot\\base_task.py", line 43, in emit_event', u'  File "C:\\Users\\gobbe\\AppData\\Local\\GitHub\\Gobberwart\\PokemonGo-Bot\\pokemongo_bot\\event_manager.py", line 76, in emit', u'  File "C:\\Users\\gobbe\\AppData\\Local\\GitHub\\Gobberwart\\PokemonGo-Bot\\pokemongo_bot\\event_handlers\\colored_logging_handler.py", line 140, in handle_event', u'  File "C:\\Python27\\lib\\encodings\\utf_8.py", line 16, in decode']

Modified logging handlers to use string if encode fails.

I've fixed the error in "forts cached" event, but left this in to gracefully handle any future similar errors.

@mention-bot
Copy link

@Gobberwart, thanks for your PR! By analyzing the annotation information on this pull request, we identified @anakin5, @extink and @eevee-github to be potential reviewers

@julienlavergne
Copy link
Contributor

You are missing from __future__ import unicode_literals in one of the files emitting the log. That is the real fix to apply. If we silently ignore the error, it is just kidding it under the carpet but you will still not be able to emit non ASCII characters.

@solderzzc
Copy link
Contributor

I guess this line was removed in 1-2 days:
from __future__ import unicode_literals

@Gobberwart
Copy link
Contributor Author

In this case, the actual error was an extra comma, eg.

                    bot.event_manager.emit(
                        'cached_fort',
                        sender=bot,
                        level='debug',
                        formatted='Forts cached.',
                    )

Fix has been applied to that as well.

@Gobberwart
Copy link
Contributor Author

I'm actually very happy to remove the changes to the logging handlers, and just leave the fix for this specific error in place. Let me know.

@solderzzc
Copy link
Contributor

I don't know UTF8 at all.... It's up to you guys.

@Gobberwart
Copy link
Contributor Author

@anakin5 your thoughts?

@Gobberwart
Copy link
Contributor Author

@anakin5 is correct. Left the fix in for cached_fort event, removed the carpet-sweeping stuff.

@Gobberwart Gobberwart merged commit 55c145b into PokemonGoF:dev Sep 7, 2016
@solderzzc
Copy link
Contributor

Is there any issue with CI build?

@julienlavergne
Copy link
Contributor

For the explanation.
str are encoded strings, in either ascii, or utf-8 or whatever
unicode are array of bytes

As a convenience, unicode can be initialised with a similar syntax as strings: x= u"abc".

In applications that need to handle unicode, it is generally better to have all your strings represented as unicode so that functions like len() are returning the correct result.
Encoded strings are only useful for storing in text file, displaying on stdout or sending the indo somewhere.
Each time you want to "display" your string or you try to mix unicode and str, there is conversion happening that depends on your default encoding.
For example u"%s" % "abc" and "%s" % u"abc" will implicitly call conversion functions.

The problem is that in python 2.x, default representation of strings is str ascii (it is unicode in python 3).
So "%s" % u"abc" will fail in python 2.x if the unicode contains characters that cannot be encoded in ascii.
It is important to add from __future__ import unicode_literals in our case when we want to format/display stuff that is coming from the API since the API gives us unicode. With this import, writting "abc" is the same as writting u"abc"

Second problem is that, when the python logging module fail to convert a unicode, it falls back to utf8 conversion, which will then fail to be displayed if the encoding of your stdout is ascii.

So well, there is no choice: use python 3 or make sure we use unicode to represent our strings.

@Gobberwart
Copy link
Contributor Author

Good info to know. In this case though it was just an extra comma :)

@Gobberwart Gobberwart deleted the dev_unicodeerror branch September 7, 2016 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants