Py3k consistent unicode handling #1782

QuLogic · 2015-09-11T04:23:05Z

Convert all strings to Unicode as soon as possible. I had to add in some decode/encode without a specified encoding because I am not familiar with all the formats. This is probably consistent with Python 2, but it's not good practice. If you are more familiar with the correct encoding for a specific format, please let me know so I can make it explicit.

rhattersley · 2015-09-11T08:44:21Z

lib/iris/fileformats/_pyke_rules/fc_rules_cf.krb

@@ -913,7 +913,7 @@ fc_extras
        # Set the cube global attributes. 
        for attr_name, attr_value in six.iteritems(cf_var.cf_group.global_attributes):
            try:
-                if isinstance(attr_value, unicode):
+                if isinstance(attr_value, six.text_type):


This code path is for converting unicode to str to avoid print-outs using the u prefix syntax where possible. That problem doesn't apply in Python 3 so I'd be tempted to use if six.PY2 and isinstance(attr_value, unicode): to make that clear.

Hrm, good point. I sort of ignored what was in the if, but it makes sense to be explicit for when Python2 finally dies in a million years.

This will be consistent across versions, but doesn't change much else.

Hopefully we're explicit enough that this check is irrelevant.

In Python 3, str has the __iter__ method, so just checking for it causes some invalid iteration through the string characters.

This may not be the best way to fix the test failure. String handling may have to be made more explicit.

DPeterK · 2015-09-18T10:59:32Z

lib/iris/cube.py

@@ -2302,7 +2306,8 @@ def _as_list_of_coords(self, names_or_coords):
        Convert a name, coord, or list of names/coords to a list of coords.
        """
        # If not iterable, convert to list of a single item


This comment is a little misleading with py3k strings having gained an __iter__ method...

pp-mo · 2015-11-03T21:08:14Z

"Update:"
I've now checked out all the commits here independently, of which some are just fine and others raise potential problems with user-code backwards compatibility (mostly highlighted by @rhattersley above, but also maybe a few more).

I discussed offline with @rhattersley, and we conclude that it's best to tread softly on user code compatibility, so we propose :

for 1.9 : add version-specific code where needed to preserve backwards compatibility in Python 2 (i.e. no behavioural changes)
then for 2.0 : remove this extra complexity wherever possible : develop clean common code (i.e. essentially what we already have here) and document the Python2 behaviour changes in the release notes.

Note that we don't suggest adding any "deprecation" tests or notes. A major release means we can change things, as long as it is minor behavioural details and not the API.

So, what I think we should do with this PR :

First cherry-pick the already-ok changes and put them up for merge
-- but n.b. that does not deliver something that will test under Python 3.
Then reconsider and fix all the rest. Combine with other pending changes currently in QuLogic:py3k, to fix all the tests. Then raise a secondary PR that resolves all that and implements full tests in Python 2+3.

I'm intending to make a start on organising that tomorrow (sorry for a slow day today!).

pp-mo · 2015-11-05T12:44:02Z

I hope I'm finally getting close to delivering something on this.
Firstly, I have scanned all the individual commits in this PR, and categorised them into those I think are good as-is, and those which need work to preserve existing Python 2 behaviour.
Which I think goes like this:

9619cdf ok  py3k: Decode subprocess output in license header check.
45328f6 ok  py3k: Always use characters for string types.
732f6bf ok  py3k: Handle Unicode in Groupby.
875178a ok  py3k: Handle Unicode in ProtoCube merging.
f9cc0f9     py3k: Use Unicode in coord collapse.
9717793     py3k: Correct Unicode usage in merge test.
33f2e50     py3k: Set strings to Unicode in NAME loader.
3ffd5de ok  py3k: Correctly handle Unicode coords in plots.
a02e863     py3k: Handle strings from netCDF consistently.
6682bce     py3k: Change coord categories to Unicode.
b759876     py3k: Correctly handle string input to slicers.
244ced6 ok  py3k: Skip encoding test on Python 3.
35d3547 ok  py3k: Decode grib_load output.
199bec1     py3k: Decode strings from NIMROD files.
f52aa8d     py3k: Change GRIB originating centres to Unicode.
37dc9f2     py3k: Change merge tests to Unicode.
8980645 ok  py3k: Use six to get the Unicode type.

pp-mo · 2015-11-05T12:54:07Z

Future strategy for fixing this up :

(1) cherry-pick all the "ok" commits from the above onto a new branch + make a PR
(2) produce a branch containing all the non-"ok" commits from the above, with "fixes" applied to each to preserve Python 2 behavioiur
(3) add a couple of extra changes seen in https://github.com/QuLogic/iris/tree/py3k (which also contains grib fixes that we are not going to include for now - see Python3 grib fixes #1774)
- commit " Enable Python 3 on Travis, maybe. Not sure if this will work."
- commit "py3k: Handle new naming of extension libraries."
(4) add whatever else is needed to get all Python 3 tests to pass

pp-mo · 2015-11-05T12:57:03Z

(1) cherry-pick all the "ok" commits from the above onto a new branch + make a PR

See : #1825

If no-one objects, I may merge this anyway (as I've only grouped specific parts of what @QuLogic did)

QuLogic · 2015-11-05T21:55:05Z

I've been thinking about this a little bit more, and I remembered something that might cause a little trouble if you go native str on both 2 and 3. On Python 3, the CDL/CML (don't remember which) that's produced will say unicode (even though there's no such thing in Py3) but on Python 2 it will say string. There will be some trouble with the tests in that regard.

pp-mo · 2015-11-06T12:43:24Z

@QuLogic ... if you go native str on both 2 and 3 ...

Thanks for looking.
From this, I'm not sure if you were looking at the original version of of #1825 here ?
I included there a commit that I had actually meant to exclude. Then I re-published it, quite rapidly I thought but maybe not quick enough ?!

Anyway, that same proposal is now here : #1827

I thought that was working out ok, in that CDL files will now say "string" for 'ordinary' output in both cases - i.e. bytestrings in Python2 and unicode in Python3. It will however say "unicode" in Python 2 (only), and "bytes" in Python 3, if those less usual types are found.

pp-mo · 2015-11-06T12:47:48Z

produce a branch containing all the non-"ok" commits

I've decided now it is simpler for everyone if I post separate PRs for different areas.
That way others can chip in more easily, if wanted ( @rhattersley ? 😉 )

So, I'll put up PRs for the various aspects I'm addressing as I go.
Then afterwards I'll make a master list of all your unmerged commits (= non-"ok"as listed above) and what I'm proposing for each.

pp-mo · 2015-11-06T13:05:02Z

I'll put up PRs for the various aspects I'm addressing as I go.

In case you missed it, here's the first ...

replacement PR : #1827
Changes xml() output for coords with string datatype.
Addresses these original @QuLogic commits ...

37dc9f2     py3k: Change merge tests to Unicode.
6682bce     py3k: Change coord categories to Unicode.
9717793     py3k: Correct Unicode usage in merge test.

pp-mo · 2015-11-06T15:18:23Z

another "replacement" PR : #1828
Fixes detection of single/multiple args to coord/coords() in Python 3.
Reimplements the original @QuLogic commit :
b759876 py3k: Correctly handle string input to slicers.

pp-mo · 2015-11-06T15:40:32Z

another "replacement" PR : #1830
Fixes decoding of netcdf string data in Python 3.
Reimplements the original @QuLogic commit :
a02e863 py3k: Handle strings from netCDF consistently.

pp-mo · 2015-11-06T17:30:24Z

another "replacement" PR : #1832
Fixes collapse of string coordinates in Python 3.
Reimplements the original @QuLogic commit :
f9cc0f9 py3k: Use Unicode in coord collapse.

QuLogic · 2015-11-06T21:26:30Z

say "string" for 'ordinary' output in both cases - i.e. bytestrings in Python2 and unicode in Python3.

OK, perhaps I am misremembering. I thought it would say unicode on Python 3 as well, but if it works, then I guess everything's good.

QuLogic · 2015-11-06T21:30:06Z

Ah, I hadn't read #1825/#1827 yet, but now I see why it works properly.

pp-mo · 2015-11-07T14:28:44Z

I missed one (#1829 = nimrod loading).
I have also made an additional PR which disables all grib tests for Python 3 : #1833

I think I have now addressed all the 'non-"ok"' commits on this original PR.
When all those are merged with #1833, all tests again pass under Python 3.

Here's a summary of what happened to all the original 'non-"ok"' commits :

sha	descr	result
`6682bce`	py3k: Change coord categories to Unicode.	#1827
`37dc9f2`	py3k: Change merge tests to Unicode.	( also #1827 )
`9717793`	py3k: Correct Unicode usage in merge test.	( also #1827 )
`f52aa8d`	py3k: Change GRIB originating centres to Unicode.	no longer needed
`199bec1`	py3k: Decode strings from NIMROD files.	#1829
`b759876`	py3k: Correctly handle string input to slicers.	#1828
`a02e863`	py3k: Handle strings from netCDF consistently.	#1830
`33f2e50`	py3k: Set strings to Unicode in NAME loader.	no longer needed -- see following comment
`f9cc0f9`	py3k: Use Unicode in coord collapse.	#1832

pp-mo · 2015-11-07T14:35:08Z

Regarding 33f2e50 py3k: Set strings to Unicode in NAME loader. :

On revisiting this code, I found that the file-format is entirely text-based, so there is really no need to open the file in binary mode.
So, I am now proposing to leave this code completely unchanged.

Given the xml() fixes from #1827, the test iris.tests.test_name then passes.

pp-mo · 2015-11-13T15:23:36Z

Nearly there (i.e. back where @QuLogic was at the start of this!)
On his original branch https://github.com/QuLogic/iris/tree/py3k there are also 2 more commits to consider:

First is obvious.
The second is about...?

pp-mo · 2015-11-13T15:29:15Z

Re: previous comment on the older commit "py3k: Handle new naming of extension libraries."
I still can't work out what this was for.
@QuLogic can you explain what prompted this change and why/if we need it?

rhattersley · 2015-11-13T16:55:36Z

@QuLogic can you explain what prompted this change and why/if we need it?

Is it to do with generating docs for iris.fileformats.pp_packing?

QuLogic · 2015-11-14T00:55:22Z

That's correct; sphinx requires the module name. Trimming the extension is fine on Python 2, but on Python 3, the name of a C extension is <module>.cpython-XYm.so. If you just trim the extension there, you get <module>.cpython-XYm as the module name which sphinx will of course fail to import.

QuLogic · 2015-11-18T02:04:20Z

Everything necessary from here has been split off into separate PRs, so I am going to close this one.

QuLogic mentioned this pull request Sep 11, 2015

Python 3 support #1658

Closed

14 tasks

rhattersley reviewed Sep 11, 2015
View reviewed changes

QuLogic added the python3 label Sep 12, 2015

QuLogic modified the milestone: v1.9 Sep 12, 2015

QuLogic added 16 commits September 16, 2015 00:25

py3k: Use six to get the Unicode type.

8980645

py3k: Change merge tests to Unicode.

37dc9f2

This will be consistent across versions, but doesn't change much else.

py3k: Change GRIB originating centres to Unicode.

f52aa8d

py3k: Decode strings from NIMROD files.

199bec1

py3k: Decode grib_load output.

35d3547

py3k: Skip encoding test on Python 3.

244ced6

Hopefully we're explicit enough that this check is irrelevant.

py3k: Correctly handle string input to slicers.

b759876

In Python 3, str has the __iter__ method, so just checking for it causes some invalid iteration through the string characters.

py3k: Change coord categories to Unicode.

6682bce

py3k: Handle strings from netCDF consistently.

a02e863

py3k: Correctly handle Unicode coords in plots.

3ffd5de

py3k: Set strings to Unicode in NAME loader.

33f2e50

py3k: Correct Unicode usage in merge test.

9717793

py3k: Use Unicode in coord collapse.

f9cc0f9

py3k: Handle Unicode in ProtoCube merging.

875178a

py3k: Handle Unicode in Groupby.

732f6bf

py3k: Always use characters for string types.

45328f6

This may not be the best way to fix the test failure. String handling may have to be made more explicit.

QuLogic force-pushed the py3k-unicode branch from 9f2742e to 45328f6 Compare September 16, 2015 05:37

py3k: Decode subprocess output in license header check.

9619cdf

DPeterK reviewed Sep 18, 2015
View reviewed changes

pp-mo mentioned this pull request Nov 5, 2015

Pyk3 unicode partial #1825

Merged

This was referenced Nov 6, 2015

Py3k unicode coords pp-mo/iris#17

Closed

Py3k unicode coords #1827

Merged

pp-mo mentioned this pull request Nov 6, 2015

Py3k unicode slicers #1828

Merged

This was referenced Nov 6, 2015

Py3k unicode nimrod #1829

Merged

Py3k unicode cf #1830

Merged

pp-mo mentioned this pull request Nov 6, 2015

Py3k unicode collapse #1832

Merged

pp-mo mentioned this pull request Nov 7, 2015

Skip all grib tests for Python 3. #1833

Closed

QuLogic closed this Nov 18, 2015

QuLogic deleted the py3k-unicode branch November 23, 2015 22:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Py3k consistent unicode handling #1782

Py3k consistent unicode handling #1782

QuLogic commented Sep 11, 2015

rhattersley Sep 11, 2015

QuLogic Sep 12, 2015

DPeterK Sep 18, 2015

pp-mo commented Nov 3, 2015

pp-mo commented Nov 5, 2015

pp-mo commented Nov 5, 2015

pp-mo commented Nov 5, 2015

QuLogic commented Nov 5, 2015

pp-mo commented Nov 6, 2015

pp-mo commented Nov 6, 2015

pp-mo commented Nov 6, 2015

pp-mo commented Nov 6, 2015

pp-mo commented Nov 6, 2015

pp-mo commented Nov 6, 2015

QuLogic commented Nov 6, 2015

QuLogic commented Nov 6, 2015

pp-mo commented Nov 7, 2015

pp-mo commented Nov 7, 2015

pp-mo commented Nov 13, 2015

pp-mo commented Nov 13, 2015

rhattersley commented Nov 13, 2015

QuLogic commented Nov 14, 2015

QuLogic commented Nov 18, 2015

Py3k consistent unicode handling #1782

Py3k consistent unicode handling #1782

Conversation

QuLogic commented Sep 11, 2015

rhattersley Sep 11, 2015

Choose a reason for hiding this comment

QuLogic Sep 12, 2015

Choose a reason for hiding this comment

DPeterK Sep 18, 2015

Choose a reason for hiding this comment

pp-mo commented Nov 3, 2015

pp-mo commented Nov 5, 2015

pp-mo commented Nov 5, 2015

pp-mo commented Nov 5, 2015

QuLogic commented Nov 5, 2015

pp-mo commented Nov 6, 2015

pp-mo commented Nov 6, 2015

pp-mo commented Nov 6, 2015

pp-mo commented Nov 6, 2015

pp-mo commented Nov 6, 2015

pp-mo commented Nov 6, 2015

QuLogic commented Nov 6, 2015

QuLogic commented Nov 6, 2015

pp-mo commented Nov 7, 2015

pp-mo commented Nov 7, 2015

pp-mo commented Nov 13, 2015

pp-mo commented Nov 13, 2015

rhattersley commented Nov 13, 2015

QuLogic commented Nov 14, 2015

QuLogic commented Nov 18, 2015