-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Salt fails with UnicodeDecodeError on non-ascii characters on Windows (code pages 850 or 437). #19166
Comments
@markuskramerIgitt, thanks for opening a new issue. It seems that this has been fixed already on the 2014.7 branch. A new function called salt.utils.sdecode was created for this purpose and has been applied to the line causing the problem. |
You could try updating your salt minion, but we'll be releasing 2014.7.1 within the next few weeks. |
Hello @jfindlay The function
Why? When you issue Apart from that
Please forget Instead use
The function
Why? Use the right encoding. Networking configuration software should not "try a list of string encodings". On Windows code page 850 or 437 I seriously doubt that someone tested this fix on Windows. |
@markuskramerIgitt, you bring up some good points. Please don't think that your criticism is unwelcome; we are very glad to incorporate changes into salt such as those you've described. I am not directly involved with the encoding efforts, perhaps @thatch45 or @s0undt3ch can respond to the issues you've raised. If you think that you've got a better way to write sdecode, then please feel free to submit a pull request making the changes necessary. If you do, please reference this issue in the pull request. Autotesting windows with jenkins is one of our priorities, once we get jenkins stabilized. :-) |
@markuskramerIgitt It has been our will from the very first day we setup Jenkins to also test Windows, however, till this day, we haven't had the bandwidth to add support for it, we run destructive tests so we need to be able to recreate the machine over and over again(before salt is installed). About, "we don't know where a string is coming", that comment is referring to where as in, from the CLI? From a direct method call to a python library? ... not which minion ... As for
I agree with you that this solution is not yet the perfect one, specially on Windows, but it's a better solution than no solution. It's at least a start. |
@s0undt3ch, thank you for the command, it supports my suggestion! Here, W is a Windows Minion:
This is correct, which means that It also demonstrates that
This is a step forward. I kindly suggest that the encoding of the Minion can be queried without If Why does
Let us get an overview of the "origin of a string" problem:
Markus |
I don't think this bug applies to just windows. A file.managed with binary data can trigger it on linux as well. Rather than trying to guess the encoding, using repr() may assist by providing an escaped string if no encoding is found. |
Hello Joe, I disagree: Salt should never guess the encoding. Can you think of a situation where Salt cannot know the encoding? Because the encoding can always be found, repr() should never applied on human readable text. As a goody, the decoder could detect binary data and escape it with repr(). |
A step forward has been made in a247dc0 and 92dfa30. These changes have been merged into 2015.2 and we will start investigating if this detection is trustworthy(since it's done, or at least attempted, before the process has been daemonized), and once we know we can trust that information, then we can start taking care of the Unicode problems that we've been having. @markuskramerIgitt that command probably works on the windows minion because its probably not a detached daemon, ie, no forking has occurred. Just for information sake, is that minion using threads or multiprocessing? |
Hello @s0undt3ch, Best regards, Three notes: From your comment in a247dc0 I start to get an idea of the size of the problem: (" In 92dfa30 you have put the encoding in the grains. Do you know this encoding guessing library: https://pypi.python.org/pypi/chardet ? |
Besides requiring an additional external library(in case we used it), Chardet is most reliable when you feed it enough data. Salt's data is usually a line or two, not enough... Consider this: In [3]: chardet.detect('Pão')
Out[3]: {'confidence': 0.73, 'encoding': 'windows-1252'}
In [4]: chardet.detect('Alimentção')
Out[4]: {'confidence': 0.7525, 'encoding': 'utf-8'}
In [5]: chardet.detect('último')
Out[5]: {'confidence': 0.7539685890654046, 'encoding': 'ISO-8859-2'}
In [6]: chardet.detect('prático')
Out[6]: {'confidence': 0.99, 'encoding': 'TIS-620'}
In [7]: chardet.detect('pão alimentção último prático')
Out[7]: {'confidence': 0.9690625, 'encoding': 'utf-8'}
In [8]: chardet.detect(u'pão alimentção último prático'.encode('iso-8859-15'))
Out[8]: {'confidence': 0.99, 'encoding': 'windows-1251'}
In [9]: Those are all Portuguese words with accent characters. About threading/multi-processing... Linux defaults to multi-processes, and Windows not defaults to what @UtahDave ? |
Hello Pedro, But jokes aside: thank you for letting me know the limitations of Have you tried calling the windows command chcp (change codepage)? Bye, Am 05.04.2015 19:19 schrieb Pedro Algarvio:
Links: |
The traceback was fixed a while back in 2014.7.0, but the characters still aren't being printed properly. I have opened a new issue to track this: #24344 |
Hello
I use a Windows 7 Enterprise 64 bit minion with a German locale.
Two commands (of many) fail:
cmd.run dir
andcmd.run whoami
To reproduce, you need a German locale or a file with a non-ascii character (like ä).
From reading the stacktrace:
0x84 (hex) is 132 (decimal).
Character 132 in Codepage 437 (Windows) is ä, which you find...
at position 8 of of
dir
=Datenträger in Laufwerk c:
(=Volume in drive c:
).at position 10 of of
cmd.run whoami
=nt-autorität\system
(=nt-authority\system
).Character 132 in UTF-8 is isolatin-1. Character 132 in isolatin-1 is undefined.
A utf-8 decoder fails with
UnicodeDecodeError
on character 132.An example is
raw_input().decode('utf-8')
found at http://stackoverflow.com/questions/22772888/german-umlauts-read-in-with-raw-input-in-python-2-7I assume
utf_8_decode()
, used in salt/output/nested.py, behaves the same.The above stackoverflow link contains a possible solution.
The stacktraces:
The text was updated successfully, but these errors were encountered: