Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not print dictionaries in test output #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

keszybz
Copy link

@keszybz keszybz commented Feb 29, 2016

Order of keys in the dictionary is not reliable. The order is
pseudo-random under Python 3.

Tests still pass under Python 2.

Order of keys in the dictionary is not reliable. The order is
pseudo-random under Python 3.

Tests still pass under Python 2.
@keszybz
Copy link
Author

keszybz commented Feb 29, 2016

I was working on a package for Fedora (https://bugzilla.redhat.com/show_bug.cgi?id=1310145), and wanted the tests to pass under both Python 2 and 3.

This patch removes a broken assumption from the tests. Unfortunately for the tests to pass under Python 3, much more extensive changes are required. I made such changes in neurord/igor@ba8f001ed9. But those changes work only for Python 3, and obviously break things for Python 2, so I'm not submitting them in this pull request.

@wking
Copy link
Owner

wking commented Feb 29, 2016

On Mon, Feb 29, 2016 at 05:59:17AM -0800, Zbigniew Jędrzejewski-Szmek wrote:

This patch removes a broken assumption from the tests.

I think dictionary keys are sorted by key for pprint functions since
Python 2.5 [1,2], and the output you're changing is generated by
pprint functions. The problem is that in Python 3, they sort bytes
before Unicode:

$ python3 -c "import pprint; pprint.pprint({u'a': 1, b'b': 2})"
{b'b': 2, 'a': 1}

But in Python 2 bytes and Unicode are interleaved:

$ python2 -c "import pprint; pprint.pprint({u'a': 1, b'b': 2})"
{u'a': 1, 'b': 2}

Running the tests under Python 3, most of the changes I see are things
like ‘''’ → ‘b''’ and ‘0x607L’ → ‘0x607’. That's going to be hard to
address with the doctest approach, but I'm happy to land patches that
transition doctests to unittests.

@keszybz
Copy link
Author

keszybz commented Mar 1, 2016

You're right, the issue is with bytes vs. str as you say.

I'm wondering if the right solution might be to decode the keys to str under Python 3. This would be much more convenient for the user, compared to having to use bytes as the key.

@wking
Copy link
Owner

wking commented Mar 1, 2016

On Tue, Mar 01, 2016 at 12:43:02PM -0800, Zbigniew Jędrzejewski-Szmek wrote:

I'm wondering if the right solution might be to decode the keys to
str under Python 3.

I haven't looked at the Igor specs in a while, so I'm not sure what
encoding you'd use to do that. If you can point me at docs showing
keys are ASCII (or UTF-8, or whatever) or how to figure out what
encoding they are, then I'll happily merge a patch adding Unicode
decoding.

However, if we can decode to Unicode, we'd get u'…' prefixes in Python
2, so the doctests would still fail. I don't think the doctests
passing is worth not decoding in Python 2 (if there is a
well-defined encoding for the keys).

@keszybz
Copy link
Author

keszybz commented Mar 1, 2016

I don't think they'd need to be decoded in Python 2... I have never seen any keys other than ASCII. So they could be decoded in Python 3, even as ASCII, and it would be enough. I'll look at the specs if there's anything about the encoding.

@keszybz
Copy link
Author

keszybz commented Mar 7, 2016

TN003 does not say anything about encodings, and the code is encoding unaware afaict.
But http://www.wavemetrics.net/doc/igorman/II-09%20Data%20Import%20Export.pdf says that:

The LoadWave operation can load data from UTF-16 (two-byte Unicode) text files. It does not recognize non-ASCII characters, but does ignore the byte-order mark at the start of the file (BOM) and null bytes contained in UTF-16 text files. Consequently it can load data from UTF-16 files containing just numeric data and ASCII text

This implies that igor binary files must be ASCII-only. So using str for the keys under both Python2 and Python3 makes the most sense. Under Python3 they should be decoded using "ascii" coded. If at any point files with a different encoding show up it will be possible to allow this encoding without breaking backward compatibility.

@wking
Copy link
Owner

wking commented Mar 7, 2016

On Mon, Mar 07, 2016 at 11:05:07AM -0800, Zbigniew Jędrzejewski-Szmek wrote:

This implies that igor binary files must be ASCII-only. So using
str for the keys under both Python2 and Python3 makes the most
sense. Under Python3 they should be decoded using "ascii" coded.

Thanks for looking up the docs. And I agree that “ASCII-only” seems
reasonable for our decoding.

I'm less convinced that using the byte type (‘str’) in Python 2 and
the Unicode type (‘str’) in Python 3 is the best approach (I'd rather
use Unicode types in both Pythons), but it would make the doctests
work, so I guess it's an acceptable evil ;).

wking added a commit that referenced this pull request Aug 2, 2016
I haven't actually tested all of these, but I doubt I did anything so
magical that support has been dropped in the meantime ;).  It would be
nice to drop the doctests [1], but until then testing Python 3 is
going to be difficult.

[1]: #1 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants