Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unit test covering Unicode #133

Closed
wants to merge 2 commits into from

Conversation

sbesson
Copy link
Member

@sbesson sbesson commented Nov 28, 2019

See also ome/openmicroscopy#6189

b45e95c exposes a Python 3.6 regression when adding a StringColumn containing Unicode. The same scenario passes without issue on Python 2

test/unit/tablestest/test_hdfstorage.py Outdated Show resolved Hide resolved
test/unit/tablestest/test_hdfstorage.py Outdated Show resolved Hide resolved
Co-Authored-By: Simon Li <orpheus+devel@gmail.com>
@sbesson
Copy link
Member Author

sbesson commented Dec 2, 2019

The numpy.dtype note about using strings in Python 3 is probably relevant to the root of this problem. Unfortunately, local attempts to migrateStringColumn.dtypes() from S to U have been unsuccessful.

Earlier demo on IDR upgraded to an experimental Python 3 environment seems to suggest that the reading of StringColumn created on Python 2 with Unicode characters is unaffected:

Screen Shot 2019-12-02 at 16 19 22

I expect I will not be in capacity to provide a fix for this regression for the OMERO 5.6.0. There is a question of whether this should be marked as a blocker for GA, it is certainly one for the upgrade of IDR to Python 3 as it breaks the annotation workflows if CSV files contains Unicode characters.

As immediate next steps, proposing to:

Alternate thoughts or suggestions welcome /cc @joshmoore @jburel @manics

@joshmoore
Copy link
Member

Something I haven't really considered yet: would a UnicodeColumn be of use?

@manics
Copy link
Member

manics commented Dec 3, 2019

What would be the difference between a StringColumn with unicode and a UnicodeColumn?

@joshmoore
Copy link
Member

It would be a location that could different read/write logic if that would help.

@sbesson
Copy link
Member Author

sbesson commented Dec 4, 2019

Superseded by #143

@sbesson sbesson closed this Dec 4, 2019
@sbesson sbesson deleted the StringColumn_unicode branch June 29, 2020 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants