Add unit test covering Unicode #133

sbesson · 2019-11-28T10:18:36Z

b45e95c exposes a Python 3.6 regression when adding a StringColumn containing Unicode. The same scenario passes without issue on Python 2

test/unit/tablestest/test_hdfstorage.py

Co-Authored-By: Simon Li <orpheus+devel@gmail.com>

sbesson · 2019-12-02T16:24:00Z

The numpy.dtype note about using strings in Python 3 is probably relevant to the root of this problem. Unfortunately, local attempts to migrateStringColumn.dtypes() from S to U have been unsuccessful.

Earlier demo on IDR upgraded to an experimental Python 3 environment seems to suggest that the reading of StringColumn created on Python 2 with Unicode characters is unaffected:

I expect I will not be in capacity to provide a fix for this regression for the OMERO 5.6.0. There is a question of whether this should be marked as a blocker for GA, it is certainly one for the upgrade of IDR to Python 3 as it breaks the annotation workflows if CSV files contains Unicode characters.

As immediate next steps, proposing to:

potentially extract the Unicode bit of this test, mark it as xfail
apply the same approach to the integration test in Update testAllColumnsSameTable to test Unicode with StringColumn openmicroscopy#6189
get this merged and capture the regression as an issue issue to be reviewed for OMERO 5.6.0

Alternate thoughts or suggestions welcome /cc @joshmoore @jburel @manics

joshmoore · 2019-12-03T08:02:48Z

Something I haven't really considered yet: would a UnicodeColumn be of use?

manics · 2019-12-03T09:41:22Z

What would be the difference between a StringColumn with unicode and a UnicodeColumn?

joshmoore · 2019-12-03T20:23:24Z

It would be a location that could different read/write logic if that would help.

sbesson · 2019-12-04T19:57:36Z

Superseded by #143

Add unit test covering Unicode

b45e95c

manics reviewed Nov 28, 2019

View reviewed changes

test/unit/tablestest/test_hdfstorage.py Outdated Show resolved Hide resolved

test/unit/tablestest/test_hdfstorage.py Outdated Show resolved Hide resolved

Avois using strongly homoglyphic Unicode characters

7020c13

Co-Authored-By: Simon Li <orpheus+devel@gmail.com>

sbesson force-pushed the StringColumn_unicode branch from 57374ae to 7020c13 Compare December 2, 2019 08:56

sbesson mentioned this pull request Dec 2, 2019

Update testAllColumnsSameTable to test Unicode with StringColumn ome/openmicroscopy#6189

Merged

manics mentioned this pull request Dec 4, 2019

Fix tables StringColumn unicode #143

Merged

sbesson closed this Dec 4, 2019

sbesson deleted the StringColumn_unicode branch June 29, 2020 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add unit test covering Unicode #133

Add unit test covering Unicode #133

sbesson commented Nov 28, 2019 •

edited

Loading

sbesson commented Dec 2, 2019 •

edited

Loading

joshmoore commented Dec 3, 2019

manics commented Dec 3, 2019

joshmoore commented Dec 3, 2019

sbesson commented Dec 4, 2019

Add unit test covering Unicode #133

Add unit test covering Unicode #133

Conversation

sbesson commented Nov 28, 2019 • edited Loading

sbesson commented Dec 2, 2019 • edited Loading

joshmoore commented Dec 3, 2019

manics commented Dec 3, 2019

joshmoore commented Dec 3, 2019

sbesson commented Dec 4, 2019

sbesson commented Nov 28, 2019 •

edited

Loading

sbesson commented Dec 2, 2019 •

edited

Loading