-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing sparse arrays with variable length attributes bug #494
Comments
Hi @lunaroverlord, Apologies for the delayed reply. A workaround for now to prevent the NumPy array from automatically coalescing into a multi-dimensional array is by appending
Then slice the last element out when writing to the TileDB array:
We are going to see if we can add better support for this in the future so that we don't have to use this workaround. Please let us know if you have any questions or comments. |
Encountering this bug now in 2024. Do you have a sense about whether this will be fixed soon? |
This has not been high priority to look at as there's a workaround as commented above. However, we can bump the priority on this given that a few users have run into the problem now. |
Friendly bump +1. Also experiencing this bug @nguyenv |
Re-opening, although I can't give a timeline to provide an alternative solution. AFAICT there's no way to handle this through numpy (b/c of the "coalescing") so we'll probably need to provide some other input mechanism. |
Trying to write some multi-attribute data to tileDB for tensorflow model training. The model input/output contains a combination of variable size sequential data and fixed size image data. Currently the only way that works is to store every modality in a separate tileDB array b/c of the coalescing issue, which makes creating a TensorflowTileDBDataset slow, do you have any other suggestions? I cannot enforce my data to be of type object, as it is not under my control |
Are you able to use this workaround? |
@ihnorton I am not, as I do not have control over the dataset generation process, and my dataset is very large to pre-process as I have image data as well which is also homogenous |
Consider this:
Only happens when the attribute dimensions in
vals
form a block shape. There's no issue with either of the following:I think it's because numpy coalesces object types containing homogeneous subarrays.
The exception is raised because TileDB relies on
attr_val.size
checks in libtiledb.pyx#L5241.Is there a workaround or an alternative way of constructing the object?
The text was updated successfully, but these errors were encountered: