-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-ascii data in string tags causes a UnicodeDecodeError on Python 3 #16
Comments
Hi @jmuhlich feel free to open a PR to fix this. How would you implement the flexible approach? |
A flexible approach might be to allow a user to set an encoding per |
How about setting an attribute e.g. |
I think @thejohnhoffer and I have come up with a reasonable backwards-compatible solution in #17 . The only API change is a new optional |
Although TIFF string tags are only supposed to contain 7-bit ASCII characters, many tools write values in UTF-8 or other encodings that aren't 7-bit clean. OME-TIFF/BioFormats is one such tool, where the XML stored in the ImageDescription tag is explicitly encoded as UTF-8. On Python 3,
pytiff.Tiff._read_ascii
raises a UnicodeDecodeError upon reading such values. On Python 2, where the treatment of string encoding/decoding isn't as rigorous, the problem is effectively ignored.Would you accept a patch to fix this? I think the simplest approach is to always decode strings as UTF-8, perhaps only under Python 3. This actually mirrors the way
_set_tag
already performs UTF-8 string encoding, only under Python 3. I would also be willing to implement a more flexible approach with a user-controlled encoding if you think that's a better option.The text was updated successfully, but these errors were encountered: