Fix HdfStorage.read() implementation to work with all column types #288

sbesson · 2021-05-27T13:59:05Z

Reported by @jburel, the tables.read() method internally uses col.values (via _getrows). For any column type with overrides the default methods of AbstractColumn like MaskColumnI, this will fail.

fe8b863 tried to address the problem by using the getsize() API which should be independent of the column implementation details. However, testMaskColumn is still failing as the logic does not populate the .bytes attributes.

6c27eae is a more robust fix to the underlying issue which uses the fact that the column API effectively has a read method and effectively delegates to this API like readCoordinates.

All the other commits expand the unit tests to cover both readCoordinates and read scenarios for the different column types (long, string, mask) as well as a test for various arguments of table.read

Proposing for a patch release of omero-py

snoopycrimecop · 2021-05-28T00:43:33Z

Conflicting PR. Removed from build OMERO-python-superbuild-push#707. See the console output for more details.
Possible conflicts:

PR Add logic to search OMERO.tables on non Pythonic named columns #287 sbesson 'Add logic to search OMERO.tables on non Pythonic named columns'
- test/unit/tablestest/test_hdfstorage.py

~~--conflicts~~ Conflict resolved in build OMERO-python-superbuild-push#708. See the console output for more details.

src/omero/hdfstorageV2.py

joshmoore

One minor item I noticed which we might want to just consider later, otherwise 👍

joshmoore · 2021-05-31T19:05:52Z

src/omero/hdfstorageV2.py


-    def _getrows(self, start, stop):
-        return self.__mea.read(start, stop)


The only thing I'd have to check is if there are any situations where the single read is significantly faster, though I assume it's a time v. space issue.

I certainly see how the performance can vary depending of the ratio percentage_of_read_columns/percentage_of_read_rows.

I would assume readCoordinates will suffer from the same tradeoffs as it it also looping over the individual columns to extract records and combine them into a grid.

Objections to capture this as an issue and merging?

sbesson added 8 commits May 27, 2021 07:18

Add additional asserts for readCoordinate tests

b967569

Do not mark testMaskColumn as broken

6ed7978

Call table.read on MaskColumn

9b25c69

table.read(): use getsize() API rather than values

fe8b863

Fix bytes assert

179e736

Add tests for table.read() output

938086e

Update hdfv2storage.read to use column.read() API

6c27eae

Add unit test covering different read arguments

146d65d

sbesson changed the title ~~Fix tables.read() API~~ Fix hdfstorageV2.read() API May 27, 2021

sbesson mentioned this pull request May 28, 2021

Add logic to search OMERO.tables on non Pythonic named columns #287

Merged

joshmoore reviewed May 28, 2021

View reviewed changes

src/omero/hdfstorageV2.py Show resolved Hide resolved

sbesson added 3 commits May 28, 2021 14:26

Expand unit test to covering table.read(start, stop) with None values

b3199f9

Fix row numbers returned by HDFStorage.read()

25cb496

Also test rowNumbers output

e37d12f

sbesson force-pushed the tables_read branch from 96725ae to 00cc59e Compare May 28, 2021 19:40

sbesson added 2 commits May 28, 2021 20:43

Fix rowNumbers when stop=None and update tests accordingly

2147ffc

Remove obsolete internal API

9811135

sbesson force-pushed the tables_read branch from 00cc59e to 9811135 Compare May 28, 2021 19:43

sbesson changed the title ~~Fix hdfstorageV2.read() API~~ Fix HdfStorage.read() implementation to work with all column types May 28, 2021

joshmoore approved these changes May 31, 2021

View reviewed changes

sbesson merged commit ff1c303 into ome:master Jun 21, 2021

sbesson mentioned this pull request Jun 21, 2021

Investigate HDFStorage.read() performance #291

Open

sbesson deleted the tables_read branch June 21, 2021 12:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix HdfStorage.read() implementation to work with all column types #288

Fix HdfStorage.read() implementation to work with all column types #288

sbesson commented May 27, 2021

snoopycrimecop commented May 28, 2021 •

edited

Loading

joshmoore left a comment

joshmoore May 31, 2021

sbesson Jun 1, 2021

sbesson Jun 21, 2021 •

edited

Loading

joshmoore Jun 21, 2021


		def _getrows(self, start, stop):
		return self.__mea.read(start, stop)

Fix HdfStorage.read() implementation to work with all column types #288

Fix HdfStorage.read() implementation to work with all column types #288

Conversation

sbesson commented May 27, 2021

snoopycrimecop commented May 28, 2021 • edited Loading

joshmoore left a comment

Choose a reason for hiding this comment

joshmoore May 31, 2021

Choose a reason for hiding this comment

sbesson Jun 1, 2021

Choose a reason for hiding this comment

sbesson Jun 21, 2021 • edited Loading

Choose a reason for hiding this comment

joshmoore Jun 21, 2021

Choose a reason for hiding this comment

snoopycrimecop commented May 28, 2021 •

edited

Loading

sbesson Jun 21, 2021 •

edited

Loading