Do not print dictionaries in test output #1

keszybz · 2016-02-29T13:55:17Z

Order of keys in the dictionary is not reliable. The order is
pseudo-random under Python 3.

Tests still pass under Python 2.

Order of keys in the dictionary is not reliable. The order is pseudo-random under Python 3. Tests still pass under Python 2.

keszybz · 2016-02-29T13:59:16Z

I was working on a package for Fedora (https://bugzilla.redhat.com/show_bug.cgi?id=1310145), and wanted the tests to pass under both Python 2 and 3.

This patch removes a broken assumption from the tests. Unfortunately for the tests to pass under Python 3, much more extensive changes are required. I made such changes in neurord/igor@ba8f001ed9. But those changes work only for Python 3, and obviously break things for Python 2, so I'm not submitting them in this pull request.

wking · 2016-02-29T17:52:59Z

On Mon, Feb 29, 2016 at 05:59:17AM -0800, Zbigniew Jędrzejewski-Szmek wrote:

This patch removes a broken assumption from the tests.

I think dictionary keys are sorted by key for pprint functions since
Python 2.5 [1,2], and the output you're changing is generated by
pprint functions. The problem is that in Python 3, they sort bytes
before Unicode:

$ python3 -c "import pprint; pprint.pprint({u'a': 1, b'b': 2})"
{b'b': 2, 'a': 1}

But in Python 2 bytes and Unicode are interleaved:

$ python2 -c "import pprint; pprint.pprint({u'a': 1, b'b': 2})"
{u'a': 1, 'b': 2}

Running the tests under Python 3, most of the changes I see are things
like ‘''’ → ‘b''’ and ‘0x607L’ → ‘0x607’. That's going to be hard to
address with the doctest approach, but I'm happy to land patches that
transition doctests to unittests.

keszybz · 2016-03-01T20:43:02Z

You're right, the issue is with bytes vs. str as you say.

I'm wondering if the right solution might be to decode the keys to str under Python 3. This would be much more convenient for the user, compared to having to use bytes as the key.

wking · 2016-03-01T21:39:20Z

On Tue, Mar 01, 2016 at 12:43:02PM -0800, Zbigniew Jędrzejewski-Szmek wrote:

I'm wondering if the right solution might be to decode the keys to
str under Python 3.

I haven't looked at the Igor specs in a while, so I'm not sure what
encoding you'd use to do that. If you can point me at docs showing
keys are ASCII (or UTF-8, or whatever) or how to figure out what
encoding they are, then I'll happily merge a patch adding Unicode
decoding.

However, if we can decode to Unicode, we'd get u'…' prefixes in Python
2, so the doctests would still fail. I don't think the doctests
passing is worth not decoding in Python 2 (if there is a
well-defined encoding for the keys).

keszybz · 2016-03-01T21:58:41Z

I don't think they'd need to be decoded in Python 2... I have never seen any keys other than ASCII. So they could be decoded in Python 3, even as ASCII, and it would be enough. I'll look at the specs if there's anything about the encoding.

keszybz · 2016-03-07T19:05:06Z

TN003 does not say anything about encodings, and the code is encoding unaware afaict.
But http://www.wavemetrics.net/doc/igorman/II-09%20Data%20Import%20Export.pdf says that:

The LoadWave operation can load data from UTF-16 (two-byte Unicode) text files. It does not recognize non-ASCII characters, but does ignore the byte-order mark at the start of the file (BOM) and null bytes contained in UTF-16 text files. Consequently it can load data from UTF-16 files containing just numeric data and ASCII text

This implies that igor binary files must be ASCII-only. So using str for the keys under both Python2 and Python3 makes the most sense. Under Python3 they should be decoded using "ascii" coded. If at any point files with a different encoding show up it will be possible to allow this encoding without breaking backward compatibility.

wking · 2016-03-07T19:24:57Z

On Mon, Mar 07, 2016 at 11:05:07AM -0800, Zbigniew Jędrzejewski-Szmek wrote:

This implies that igor binary files must be ASCII-only. So using
str for the keys under both Python2 and Python3 makes the most
sense. Under Python3 they should be decoded using "ascii" coded.

Thanks for looking up the docs. And I agree that “ASCII-only” seems
reasonable for our decoding.

I'm less convinced that using the byte type (‘str’) in Python 2 and
the Unicode type (‘str’) in Python 3 is the best approach (I'd rather
use Unicode types in both Pythons), but it would make the doctests
work, so I guess it's an acceptable evil ;).

I haven't actually tested all of these, but I doubt I did anything so magical that support has been dropped in the meantime ;). It would be nice to drop the doctests [1], but until then testing Python 3 is going to be difficult. [1]: #1 (comment)

Do not print dictionaries in test output

0ea0895

Order of keys in the dictionary is not reliable. The order is pseudo-random under Python 3. Tests still pass under Python 2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not print dictionaries in test output #1

Do not print dictionaries in test output #1

keszybz commented Feb 29, 2016

keszybz commented Feb 29, 2016

wking commented Feb 29, 2016

keszybz commented Mar 1, 2016

wking commented Mar 1, 2016

keszybz commented Mar 1, 2016

keszybz commented Mar 7, 2016

wking commented Mar 7, 2016

Do not print dictionaries in test output #1

Are you sure you want to change the base?

Do not print dictionaries in test output #1

Conversation

keszybz commented Feb 29, 2016

keszybz commented Feb 29, 2016

wking commented Feb 29, 2016

keszybz commented Mar 1, 2016

wking commented Mar 1, 2016

keszybz commented Mar 1, 2016

keszybz commented Mar 7, 2016

wking commented Mar 7, 2016