Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logic to search OMERO.tables on non Pythonic named columns #287

Merged
merged 3 commits into from
Feb 7, 2022

Conversation

sbesson
Copy link
Member

@sbesson sbesson commented May 18, 2021

See ome/omero-metadata#57 for the reference issue.

OMERO.tables HDF storage uses PyTables querying capabilities in particular the tables.get_where_list API. This API takes as a primary input a condition which syntax is defined as in https://www.pytables.org/usersguide/condition_syntax.html.

Columns named in a Python compliant way (i.e. matching ^[a-zA-Z_][a-zA-Z0-9_]*$), the column name can directly be used in the condition e.g. foo>1. Column names that do not match this patterns e.g. including spaces or brackets are supported but they need to be retrieved using getattr and they cannot directly used as an input condition. As it stands, this prevents the API from querying all the row based on a condition including such a column.

350dbc1 demonstrates how to use condvars and variable substitution to allow searching these non Pythonic named columns. ae5ca40 proposes to expand the semantics of the variables argument of tables.get_where_list to support these queries remotely. Any string/string key value pair is expanded as a string/column key/pair and passed to the underlying get_where_list API allowing variable substitution.

Using the OMERO API, this should allow the following types of queries

table.getWhereList("(count > 10) & (x =='pass')", variables={'x': rstring('Quality Control')}, start=0, stop=rowCount, step=0)

String columns can be stored using non-pythonic names (spaces, brackets) but
their column index cannot simply be used for of a search condition. Instead,
a variable can be substituted and mapped using the condvars dictionary.
@sbesson sbesson changed the title Add unit test for filtering OMERO.tables using non-natural named columns Add logic to search OMERO.tables on non Pythonic named columns May 27, 2021
@jburel
Copy link
Member

jburel commented Jun 28, 2021

table.getWhereList("(count > 10) & (x =='pass')", variables={'x': 'Quality Control'}, start=0, stop=rowCount, step=0)

This is not what the test does rstring("Long column") cf. ome/openmicroscopy#6283
I think it is preferable not to ask user of the API to have to use rstring

@sbesson
Copy link
Member Author

sbesson commented Jun 28, 2021

PR description updated although the primary issue is rather at the level of the omero-blitz component as the Table.getWhereList API expects variables to be of type RTypeDict.

I would also be of the opinion of lowering the exposition of these types for consumers of the API if possible. What are the viable options?

@joshmoore
Copy link
Member

What are the viable options?

A gateway method would be able to add a call to omero.rtypes.wrap, but we don't currently have a general interceptor for direct calls to Prx instances.

@joshmoore
Copy link
Member

Thoughts on getting this out with #292?

@sbesson
Copy link
Member Author

sbesson commented Jul 12, 2021

Answering #287 (comment) specifically, from a code perspective, I think the change here should be fairly harmless and releasable immediately. The PyTables API expectation is that the condvars variable will contain a dictionary pointing to Column instances. I have not been able to think of a scenario where the special handling of string values implemented here would break existing behaviors.

As discussed earlier this morning, I do not know of any immediate need for this fix so it probably makes sense to allow more time for the review of this PR alongside the examples and some technical documentation explaining its impact for API consumers.

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/omero-tables-questions/58161/4

Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, lgtm. I haven't been able to come with a scenario where I think existing code would be unhappy (and I also don't think there's too much of that)

@sbesson sbesson added this to the 5.11.0 milestone Feb 6, 2022
@sbesson sbesson merged commit fff75cc into ome:master Feb 7, 2022
@sbesson sbesson deleted the tables_column_space branch June 7, 2022 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants