-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python/r] DataFrame: value filter on enum/dict column generates internal error when sought value not in enumeration #1988
Comments
Also see TileDB-Inc/TileDB-Py#1880 which is a related ease-of-use issue for our use case. For many of our dataframe columns, where we want to use enums, it would be far easier to use if the value filter equality ops ( |
Needs triaging for R as well |
[sc-38450] |
@eddelbuettel this needs triaging for R as well please |
* typeguard nit missed in #1960 * factor common fixtures into conftest.py * factor test_update_dataframes fixture * `verify_obs_var` helper, more `test_update_dataframes` factoring * test_experiment_query.py: verify #1988 * `s/h5ad_file/h5ad_path/g`, factor `HERE`s Co-authored-by: Ryan Williams <ryan.williams@tiledb.com>
@ryan-williams there is independent verification in R. I'll do that. This PR is for Python and that's fine. |
I am blocked on the R side. Questions in Slack. |
See also #2311 for tracking toward 1.9 |
@mojaveazure has set me up! :) |
I have an empty dataframe containing dictionary/enum attributes. When a value filter / query condition is applied to it, it triggers an internal Arrow error. It should return an empty result. All works fine for non-dictionary attributes, so it appears that value filters do not always work correctly with dict/enum attributes.
Note that the
tiledb
package also has questionable behavior here, returning an exception if the value filter attempts to test for a value not in the enumeration. So it is likely that the Arrow error is unique to the libtiledbsoma codepath, but both behaviors make the combination of filters and enums problematic.What I think should happen: the value filter should have identical behavior (ie., results) for a column of type "T" and a column of type "enum-of-T", where T is string, int, etc (e.g., a query against a "dict of strings" column should perform the same as a query against a string column).
<late edit>
The empty dataframe is unrelated. It fails in exactly the same way for non-empty arrays. I'll add an example of that below.
</late edit>
The schema (abbreviated for ease of reading):
Reading the entire thing works correctly (output abbreviated):
Read with a value filter on a string attribute works fine (output abbreviated):
Reading with a value filter on a dict column fails an internal Arrow error check:
Using the latest
tiledb
has a different (and also arguably incorrect) behavior:Package version info:
I can make the problematic empty dataframe available if helpful.
The empty/non-empty state of the array is unrelated. Here is an example on a non-empty dataframe with the same schema, failing in the same way:
The text was updated successfully, but these errors were encountered: