Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning list structures for the same array and group metadata are not identical #775

Open
cgiachalis opened this issue Oct 22, 2024 · 1 comment
Assignees

Comments

@cgiachalis
Copy link
Contributor

cgiachalis commented Oct 22, 2024

Issue

Putting the same metadata on an array and group and then retrieving them back to R, the returning objects are equivalent but not identical.

For the case of retrieving all metadata :

  • array getter returns a named list classed tiledb_metadata (used for print method)
  • group getter returns a named list but not classed and each element has an attribute named "key"

Is it intentional? I found no documentation or usage why the group metadata require an extra attribute on each element.

Here's a reproducible example:

R Code - reprex
library(tiledb) # version 0.30.2

# metadata for array and group
md <- list("a1" = 1, "b2" = 2)
nms <- names(md)

# Array metadata ------------------------

uri_arr <- tempfile("arr1")
fromDataFrame(data.frame(a = "foo"), uri_arr)
arr_handle <- tiledb_array(uri_arr)

arr_handle <- tiledb_array_open(arr_handle, type = "WRITE")

# Put metadata

status <- mapply(
  key = nms,
  val = md,
  FUN = function(key, val) {tiledb_put_metadata(arr_handle, key, val)})

all(status) # check all OK
#> [1] TRUE

arr_handle <- tiledb_array_close(arr_handle)
arr_handle <- tiledb_array_open(arr_handle, type = "READ")

arr_metadata <- tiledb_get_all_metadata(arr_handle)

# Group metadata ------------------------

uri_grp <- tempfile("grp1")
grp <- tiledb_group_create(uri_grp)
grp <- tiledb_group(grp, type = "WRITE")

# Put metadata
status <- mapply(
  key = nms,
  val = md,
  FUN = function(key, val) {tiledb_group_put_metadata(grp, key, val)})

all(status) # check all OK
#> [1] TRUE

grp <- tiledb_group_close(grp)
grp <- tiledb_group_open(grp, type = "READ")

grp_metadata <- tiledb_group_get_all_metadata(grp)

Results

# What ??? :(
all.equal(arr_metadata, grp_metadata)
 [1] "Attributes: < names for target but not for current >"             
 [2] "Attributes: < Length mismatch: comparison on first 0 components >"
 [3] "Component \"a1\": Attributes: < target is NULL, current is list >"
 [4] "Component \"b2\": Attributes: < target is NULL, current is list >"

# OK
all.equal(arr_metadata, grp_metadata, check.attributes = FALSE)
[1] TRUE

# Object structure
str(arr_metadata)
 List of 2
  $ a1: num 1
  $ b2: num 2
  - attr(*, "class")= chr "tiledb_metadata"
  
str(grp_metadata)
 List of 2
  $ a1: num 1
   ..- attr(*, "key")= chr "a1"
  $ b2: num 2
   ..- attr(*, "key")= chr "b2"


# Print to console
arr_metadata
a1:	1
b2:	2

grp_metadata
$a1
[1] 1
attr(,"key")
[1] "a1"

$b2
[1] 2
attr(,"key")
[1] "b2"

Comments/Notes/Fin

In practice, I do strip off the "key" attribute to get identical output structure which also helps in unit testing or mixing array and group metadata for whatever reason.

Other notes and observations:

  • The equivalent function of tiledb_group_get_metadata_from_index() for array is not implemented in R but exists in C++ (tiledb:::libtiledb_array_get_metadata_from_index())

  • tiledb_group_get_all_metadata() is written in R whereas tiledb_get_all_metadata() in C++ (loop under the hood), see libtiledb_array_get_metadata_list; not an issue other than memory efficiency but the implementation will be identical if you write it in C++ e.g., libtiledb_group_get_metadata_list.

  • Metadata related functions perhaps should get a roxygen tag @family metadata that will make it easier to navigate the vast documentation via See also auto generated links.

  • Not vacuum/consolidation operations for group metadata

I hope the above were helpful towards a consistent metadata interface (structure, class, print method, functionality) :) .

Thanks

@johnkerl johnkerl self-assigned this Oct 22, 2024
@cgiachalis
Copy link
Contributor Author

As a last note, it seems at C++ level the group getter is assigned 'key' attribute whereas 'names' for array although the code logic is identical.

libtiledb_array_get_metadata_from_index

TileDB-R/src/libtiledb.cpp

Lines 2878 to 2879 in c2ba622

RObject vec = _metadata_to_sexp(v_type, v_num, v);
vec.attr("names") = Rcpp::CharacterVector::create(key);

libtiledb_group_get_metadata_from_index

TileDB-R/src/libtiledb.cpp

Lines 5434 to 5435 in c2ba622

RObject vec = _metadata_to_sexp(v_type, v_num, v);
vec.attr("key") = Rcpp::CharacterVector::create(key);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants