Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tables StringColumn unicode #143

Merged
merged 8 commits into from
Dec 5, 2019
Merged

Fix tables StringColumn unicode #143

merged 8 commits into from
Dec 5, 2019

Conversation

manics
Copy link
Member

@manics manics commented Dec 4, 2019

Replacement for #133

See comment in hdfstorageV2.HdfStorage.append (9e1d8e0) for the full explanation

Probably needs a doc update to say string column sizes must be given as the number of bytes not the number of unicode characters.

@manics
Copy link
Member Author

manics commented Dec 4, 2019

The last commit fails on Python 2.7 with the inexplicable error:

    def arrays(self):
        """
        Check for strings longer than the initialised column width
        """
        for v in self.values:
>           if sys.version_info >= (3, 0, 0):
E           AttributeError: 'module' object has no attribute 'version_info'

https://travis-ci.org/ome/omero-py/jobs/620734836
Python 3 is fine, and the code works fine in the rest of omero-py ¯\_(ツ)_/¯

@joshmoore
Copy link
Member

There is also an omero.sys module which is likely what Py2 is picking up. There are existing import gymnastics to handle that.

@joshmoore
Copy link
Member

(Title is probably also a bit off now, eh? 🙂)

@manics
Copy link
Member Author

manics commented Dec 5, 2019

It might be possible to simplify this by moving the conversion into StringColumnI.arrays() so bytes are always returned. This assumes arrays() isn't used anywhere else.

@manics manics changed the title Unit test tables unicode Fix tables StringColumn unicode Dec 5, 2019
@manics
Copy link
Member Author

manics commented Dec 5, 2019

See last commit on #145 re the last comment

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • integration tests failed today with the ValidationException message. After adjusting the column length as per
    ome/openmicroscopy@ebe2f98, a re-execution of the tests passed.

  • tested a minimal annotation workflow using a dataset with one image, the following annotation CSV file and the omero-metadata plugin (with a fix to be opened separately)

    Image Name,Term
    test.fake,მიკროსკოპის პონი
    

    A table was successfully created with the string containing Unicode characters as expected.

Overall, the changes ensure string arrays are systematically encoded before creating a StringColumn. The additional unit tests covering the ValidationException thrown on incorrect string length as well as the broken getWhereList query are very valuable.

As proposed this morning, merging this and releasing as a quick 5.6.dev9 pre-release of omero-py which can be consumed by OMERO 5.6.0-m4. Given it includes a proposed breaking change similar to the return of bytes in the FileAnnotation API, the cleanup PR can be reviewed composedly as a follow-up.

@sbesson sbesson merged commit c9ae856 into ome:master Dec 5, 2019
@manics manics deleted the pr133 branch December 5, 2019 11:27
@joshmoore joshmoore added this to the 5.6.0 milestone Mar 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants