Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HdfStorage.read() implementation to work with all column types #288

Merged
merged 13 commits into from
Jun 21, 2021

Conversation

sbesson
Copy link
Member

@sbesson sbesson commented May 27, 2021

Reported by @jburel, the tables.read() method internally uses col.values (via _getrows). For any column type with overrides the default methods of AbstractColumn like MaskColumnI, this will fail.

fe8b863 tried to address the problem by using the getsize() API which should be independent of the column implementation details. However, testMaskColumn is still failing as the logic does not populate the .bytes attributes.

6c27eae is a more robust fix to the underlying issue which uses the fact that the column API effectively has a read method and effectively delegates to this API like readCoordinates.

All the other commits expand the unit tests to cover both readCoordinates and read scenarios for the different column types (long, string, mask) as well as a test for various arguments of table.read

Proposing for a patch release of omero-py

@sbesson sbesson changed the title Fix tables.read() API Fix hdfstorageV2.read() API May 27, 2021
@snoopycrimecop
Copy link
Member

snoopycrimecop commented May 28, 2021

Conflicting PR. Removed from build OMERO-python-superbuild-push#707. See the console output for more details.
Possible conflicts:

--conflicts Conflict resolved in build OMERO-python-superbuild-push#708. See the console output for more details.

@sbesson sbesson changed the title Fix hdfstorageV2.read() API Fix HdfStorage.read() implementation to work with all column types May 28, 2021
Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor item I noticed which we might want to just consider later, otherwise 👍


def _getrows(self, start, stop):
return self.__mea.read(start, stop)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I'd have to check is if there are any situations where the single read is significantly faster, though I assume it's a time v. space issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I certainly see how the performance can vary depending of the ratio percentage_of_read_columns/percentage_of_read_rows.

I would assume readCoordinates will suffer from the same tradeoffs as it it also looping over the individual columns to extract records and combine them into a grid.

Copy link
Member Author

@sbesson sbesson Jun 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Objections to capture this as an issue and merging?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WFM

@sbesson sbesson merged commit ff1c303 into ome:master Jun 21, 2021
@sbesson sbesson deleted the tables_read branch June 21, 2021 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants