-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix tables StringColumn unicode #143
Conversation
Co-Authored-By: Simon Li <orpheus+devel@gmail.com>
HDF5 doesn't properly support Unicode. numexpr doesn't support it at all on Python 3.
The last commit fails on Python 2.7 with the inexplicable error:
https://travis-ci.org/ome/omero-py/jobs/620734836 |
There is also an |
(Title is probably also a bit off now, eh? 🙂) |
It might be possible to simplify this by moving the conversion into |
See last commit on #145 re the last comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
integration tests failed today with the
ValidationException
message. After adjusting the column length as per
ome/openmicroscopy@ebe2f98, a re-execution of the tests passed. -
tested a minimal annotation workflow using a dataset with one image, the following annotation CSV file and the
omero-metadata
plugin (with a fix to be opened separately)Image Name,Term test.fake,მიკროსკოპის პონი
A table was successfully created with the string containing Unicode characters as expected.
Overall, the changes ensure string arrays are systematically encoded before creating a StringColumn. The additional unit tests covering the ValidationException
thrown on incorrect string length as well as the broken getWhereList
query are very valuable.
As proposed this morning, merging this and releasing as a quick 5.6.dev9
pre-release of omero-py
which can be consumed by OMERO 5.6.0-m4. Given it includes a proposed breaking change similar to the return of bytes in the FileAnnotation API, the cleanup PR can be reviewed composedly as a follow-up.
Replacement for #133
See comment in hdfstorageV2.HdfStorage.append (9e1d8e0) for the full explanation
Probably needs a doc update to say string column sizes must be given as the number of bytes not the number of unicode characters.