Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalization (and automated conversion R/Python) of data types ? #110

Open
Artur-man opened this issue Sep 10, 2024 · 0 comments
Open

Normalization (and automated conversion R/Python) of data types ? #110

Artur-man opened this issue Sep 10, 2024 · 0 comments

Comments

@Artur-man
Copy link
Contributor

Artur-man commented Sep 10, 2024

What would be the ideal input for the dtype normalization below ? say that the input arrays are composed of characters should we pass S8 or U8 or is it possible to pass chararcter and let Dtype automatically figure out the correct numpy data type ?

pizzarr/R/normalize.R

Lines 106 to 130 in f84355d

#' @keywords internal
normalize_dtype <- function(dtype, object_codec = NA) {
# Reference: https://github.com/zarr-developers/zarr-python/blob/5dd4a0e6cdc04c6413e14f57f61d389972ea937c/zarr/util.py#L152
if(is_na(dtype)) {
# np.dtype(None) returns 'float64'
if(!is_na(object_codec)) {
stop("expected object_codec to be NA due to NA dtype")
}
return(Dtype$new("<f8"))
}
# Construct Dtype instance.
# convenience API for object arrays
if("Dtype" %in% class(dtype)) {
return(dtype)
}
if(is.character(dtype)) {
# Filter list was NA but there could be non-NA object_codec parameter.
return(Dtype$new(dtype, object_codec = object_codec))
}
stop("dtype must be NA, string/character vector, or Dtype instance")
}

The typical scenario would be that one inserts a full character array, then type is provided to the dtype appropriately.

zarr.array <- pizzarr::zarr_open(store = "data/string_test.zarr")
z1 <- zarr.array$create_dataset(name = "assay", data = array(rep("a", 10), dim = 10), shape = 10)
zarr.array$get_item("assay")$get_item("...")$data
[1] "Buffer has ${numDataElements} of dtype ${dtype}, shape is too large or small"
Error in private$chunk_getitem_part2(part1_result, proj$chunk_coords,  : 
  Different type of error - rethrow

Looks like now the type is given as "<f8" (float32 ?) if not provided by the user.

@Artur-man Artur-man changed the title Normalization (and automatization) of data types ? Normalization (and automated conversion R/Python) of data types ? Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant