-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Py3k consistent unicode handling #1782
Conversation
@@ -913,7 +913,7 @@ fc_extras | |||
# Set the cube global attributes. | |||
for attr_name, attr_value in six.iteritems(cf_var.cf_group.global_attributes): | |||
try: | |||
if isinstance(attr_value, unicode): | |||
if isinstance(attr_value, six.text_type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code path is for converting unicode to str to avoid print-outs using the u
prefix syntax where possible. That problem doesn't apply in Python 3 so I'd be tempted to use if six.PY2 and isinstance(attr_value, unicode):
to make that clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, good point. I sort of ignored what was in the if
, but it makes sense to be explicit for when Python2 finally dies in a million years.
This will be consistent across versions, but doesn't change much else.
Hopefully we're explicit enough that this check is irrelevant.
In Python 3, str has the __iter__ method, so just checking for it causes some invalid iteration through the string characters.
This may not be the best way to fix the test failure. String handling may have to be made more explicit.
9f2742e
to
45328f6
Compare
@@ -2302,7 +2306,8 @@ def _as_list_of_coords(self, names_or_coords): | |||
Convert a name, coord, or list of names/coords to a list of coords. | |||
""" | |||
# If not iterable, convert to list of a single item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a little misleading with py3k strings having gained an __iter__
method...
"Update:" I discussed offline with @rhattersley, and we conclude that it's best to tread softly on user code compatibility, so we propose :
Note that we don't suggest adding any "deprecation" tests or notes. A major release means we can change things, as long as it is minor behavioural details and not the API. So, what I think we should do with this PR :
I'm intending to make a start on organising that tomorrow (sorry for a slow day today!). |
I hope I'm finally getting close to delivering something on this.
|
Future strategy for fixing this up :
|
I've been thinking about this a little bit more, and I remembered something that might cause a little trouble if you go native |
Thanks for looking. Anyway, that same proposal is now here : #1827 I thought that was working out ok, in that CDL files will now say "string" for 'ordinary' output in both cases - i.e. bytestrings in Python2 and unicode in Python3. It will however say "unicode" in Python 2 (only), and "bytes" in Python 3, if those less usual types are found. |
I've decided now it is simpler for everyone if I post separate PRs for different areas. So, I'll put up PRs for the various aspects I'm addressing as I go. |
In case you missed it, here's the first ... replacement PR : #1827
|
OK, perhaps I am misremembering. I thought it would say unicode on Python 3 as well, but if it works, then I guess everything's good. |
I missed one (#1829 = nimrod loading). I think I have now addressed all the 'non-"ok"' commits on this original PR. Here's a summary of what happened to all the original 'non-"ok"' commits :
|
Regarding On revisiting this code, I found that the file-format is entirely text-based, so there is really no need to open the file in binary mode. Given the xml() fixes from #1827, the test |
Nearly there (i.e. back where @QuLogic was at the start of this!) First is obvious. |
Re: previous comment on the older commit "py3k: Handle new naming of extension libraries." |
Is it to do with generating docs for |
That's correct; sphinx requires the module name. Trimming the extension is fine on Python 2, but on Python 3, the name of a C extension is |
Everything necessary from here has been split off into separate PRs, so I am going to close this one. |
Convert all strings to Unicode as soon as possible. I had to add in some
decode
/encode
without a specified encoding because I am not familiar with all the formats. This is probably consistent with Python 2, but it's not good practice. If you are more familiar with the correct encoding for a specific format, please let me know so I can make it explicit.