Skip to content

Commit

Permalink
Add tiledb_vfs_ls_recursive() (#691)
Browse files Browse the repository at this point in the history
* Add tiledb_vfs_ls_recursive()

* Add unit tests conditional on having AWS_ACCESS_KEY_ID

* Update NEWS (+ fix typos), roll micro version, more typos [ci skip]
  • Loading branch information
eddelbuettel authored Apr 11, 2024
1 parent 6fa1cb3 commit 4806938
Show file tree
Hide file tree
Showing 10 changed files with 115 additions and 10 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: tiledb
Type: Package
Version: 0.25.0.6
Version: 0.25.0.7
Title: Modern Database Engine for Complex Data Based on Multi-Dimensional Arrays
Authors@R: c(person("TileDB, Inc.", role = c("aut", "cph")),
person("Dirk", "Eddelbuettel", email = "dirk@tiledb.com", role = "cre"))
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,7 @@ export(tiledb_vfs_is_dir)
export(tiledb_vfs_is_empty_bucket)
export(tiledb_vfs_is_file)
export(tiledb_vfs_ls)
export(tiledb_vfs_ls_recursive)
export(tiledb_vfs_move_dir)
export(tiledb_vfs_move_file)
export(tiledb_vfs_open)
Expand Down
16 changes: 10 additions & 6 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,12 @@

## Improvements

* The display of a `filter_list` not labels is correctly as a filter list (@cgiachalis in #681)
* The display of a `filter_list` is now labeled correctly as a filter list (@cgiachalis in #681 addressing #678)

* The Arrow integration has been simplified using [nanoarrow](https://github.com/apache/arrow-nanoarrow) returning a single `nanoarrow` object; an unexported helper function `nanoarrow2list()` is provided to matching the previous interface (#682, #685)

* An new accessor for recursive listings of (currently S3-only) URI is now available (with TileDB Core >= 2.21.0) (#691)

## Bug Fixes

* The column headers now correspond to the column content in the two-column `data.frame` returns by `tiledb_object_walk` (#684 closing #683)
Expand All @@ -16,6 +18,8 @@

* The `configure` and `Makevars.in` received a minor update correcting small issues (#680)

* The nightly valgrind run was updated to include release 2.22 (#687)

## Documentation

* A number of minor typographical and grammar errors in the function documentation has been corrected (@cgiachalis in #681)
Expand All @@ -41,15 +45,15 @@

* The `tiledb_get_query_range_var()` accessor now correctly calls the range getter for variable-sized dimensions (#662)

* The nighly valgrind check now installs to require `nanoarrow` package (#664)
* The nightly valgrind check now installs to require `nanoarrow` package (#664)

* Variable cell numbers can now set consistently for all attribute types (#670)

* Object walk traversal order detection has been corrected (#671)

## Build and Test Systems

* The nighly valgrind run was updated to include release 2.21 (#669)
* The nightly valgrind run was updated to include release 2.21 (#669)

* Unit tests have been added for the TileDB 'object' functions (#671, #672)

Expand All @@ -76,7 +80,7 @@

## Build and Test Systems

* The nighly valgrind run was updated to include release 2.20 (#649)
* The nightly valgrind run was updated to include release 2.20 (#649)

## Documentation

Expand Down Expand Up @@ -149,7 +153,7 @@

## Build and Test Systems

* The nighly valgrind run was updated to include release 2.18 (#615)
* The nightly valgrind run was updated to include release 2.18 (#615)

## Documentation

Expand Down Expand Up @@ -194,7 +198,7 @@

## Build and Test Systems

* The nighly valgrind run was updated to include release 2.17 (#603)
* The nightly valgrind run was updated to include release 2.17 (#603)



Expand Down
4 changes: 2 additions & 2 deletions R/Init.R
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
return("")
}

.onLoad <- function(libname, pkgName) {
.onLoad <- function(libname, pkgname) {
## create a slot for ctx in the per-package enviroment but do no fill it yet to allow 'lazy load'
## this entry is generally accessed with a (non-exported) getter and setter in R/Ctx.R
.pkgenv[["ctx"]] <- NULL
Expand All @@ -57,7 +57,7 @@
.set_compile_link_options()
}

.onAttach <- function(libname, pkgName) {
.onAttach <- function(libname, pkgname) {
if (interactive()) {
packageStartupMessage("TileDB R ", packageVersion("tiledb"),
" with TileDB Embedded ", format(tiledb_version(TRUE)),
Expand Down
4 changes: 4 additions & 0 deletions R/RcppExports.R
Original file line number Diff line number Diff line change
Expand Up @@ -948,6 +948,10 @@ libtiledb_vfs_fh_free <- function(fhxp) {
invisible(.Call(`_tiledb_libtiledb_vfs_fh_free`, fhxp))
}

libtiledb_vfs_ls_recursive <- function(ctx, vfs, uri) {
.Call(`_tiledb_libtiledb_vfs_ls_recursive`, ctx, vfs, uri)
}

libtiledb_stats_enable <- function() {
invisible(.Call(`_tiledb_libtiledb_stats_enable`))
}
Expand Down
20 changes: 19 additions & 1 deletion R/VFS.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# MIT License
#
# Copyright (c) 2017-2023 TileDB Inc.
# Copyright (c) 2017-2024 TileDB Inc.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -451,3 +451,21 @@ tiledb_vfs_copy_file <- function(file, uri, vfs = tiledb_get_vfs()) {
is.character(uri) && file.exists(file))
libtiledb_vfs_copy_file(vfs@ptr, file, uri)
}

#' Recursively list objects from given URI
#'
#' This functionality is currently limited to S3 URIs.
#'
#' @param uri Character variable with a URI describing a file path
#' @param vfs (optiona) A TileDB VFS object; default is to use a cached value.
#' @param ctx (optional) A TileDB Ctx object
#' @return A data.frame object with two columns for the full path and the object
#' size in bytes
#' @export
tiledb_vfs_ls_recursive <- function(uri, vfs = tiledb_get_vfs(), ctx = tiledb_get_context()) {
stopifnot("Argument 'vfs' must be a tiledb_vfs object" = is(vfs, "tiledb_vfs"),
"Argument 'ctx' must be a tiledb_ctx object" = is(ctx, "tiledb_ctx"),
"Argument 'uri' must be character variable" = is.character(uri),
"This function needs TileDB 2.21.0 or later" = tiledb_version(TRUE) >= "2.17.0")
libtiledb_vfs_ls_recursive(ctx@ptr, vfs@ptr, uri)
}
6 changes: 6 additions & 0 deletions inst/tinytest/test_vfs.R
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,9 @@ if (requireNamespace("palmerpenguins", quietly=TRUE)) {

expect_equal(pp, tiledb_vfs_unserialize(uriser))
}

if (tiledb_version(TRUE) >= '2.21.0' && nzchar(Sys.getenv("AWS_ACCESS_KEY_ID"))) {
expect_silent(dat <- tiledb::tiledb_vfs_ls_recursive("s3://tiledb-test-arrays/1.4/customer"))
expect_true(inherits(dat, "data.frame"))
expect_true(nrow(dat) > 400)
}
26 changes: 26 additions & 0 deletions man/tiledb_vfs_ls_recursive.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 14 additions & 0 deletions src/RcppExports.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2844,6 +2844,19 @@ BEGIN_RCPP
return R_NilValue;
END_RCPP
}
// libtiledb_vfs_ls_recursive
Rcpp::DataFrame libtiledb_vfs_ls_recursive(XPtr<tiledb::Context> ctx, XPtr<tiledb::VFS> vfs, const std::string& uri);
RcppExport SEXP _tiledb_libtiledb_vfs_ls_recursive(SEXP ctxSEXP, SEXP vfsSEXP, SEXP uriSEXP) {
BEGIN_RCPP
Rcpp::RObject rcpp_result_gen;
Rcpp::RNGScope rcpp_rngScope_gen;
Rcpp::traits::input_parameter< XPtr<tiledb::Context> >::type ctx(ctxSEXP);
Rcpp::traits::input_parameter< XPtr<tiledb::VFS> >::type vfs(vfsSEXP);
Rcpp::traits::input_parameter< const std::string& >::type uri(uriSEXP);
rcpp_result_gen = Rcpp::wrap(libtiledb_vfs_ls_recursive(ctx, vfs, uri));
return rcpp_result_gen;
END_RCPP
}
// libtiledb_stats_enable
void libtiledb_stats_enable();
RcppExport SEXP _tiledb_libtiledb_stats_enable() {
Expand Down Expand Up @@ -3779,6 +3792,7 @@ static const R_CallMethodDef CallEntries[] = {
{"_tiledb_libtiledb_vfs_ls", (DL_FUNC) &_tiledb_libtiledb_vfs_ls, 2},
{"_tiledb_libtiledb_vfs_copy_file", (DL_FUNC) &_tiledb_libtiledb_vfs_copy_file, 3},
{"_tiledb_libtiledb_vfs_fh_free", (DL_FUNC) &_tiledb_libtiledb_vfs_fh_free, 1},
{"_tiledb_libtiledb_vfs_ls_recursive", (DL_FUNC) &_tiledb_libtiledb_vfs_ls_recursive, 3},
{"_tiledb_libtiledb_stats_enable", (DL_FUNC) &_tiledb_libtiledb_stats_enable, 0},
{"_tiledb_libtiledb_stats_disable", (DL_FUNC) &_tiledb_libtiledb_stats_disable, 0},
{"_tiledb_libtiledb_stats_reset", (DL_FUNC) &_tiledb_libtiledb_stats_reset, 0},
Expand Down
32 changes: 32 additions & 0 deletions src/libtiledb.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4528,6 +4528,38 @@ void libtiledb_vfs_fh_free(XPtr<vfs_fh_t> fhxp) {
#endif
}

// [[Rcpp::export]]
Rcpp::DataFrame libtiledb_vfs_ls_recursive(XPtr<tiledb::Context> ctx,
XPtr<tiledb::VFS> vfs,
const std::string& uri) {
check_xptr_tag<tiledb::Context>(ctx);
check_xptr_tag<tiledb::VFS>(vfs);

#if TILEDB_VERSION >= TileDB_Version(2,21,0)
// standard / default list object (a vector of a pair<string, uint64_t?>) and callback
tiledb::VFSExperimental::LsObjects ls_objects;
tiledb::VFSExperimental::LsCallback cb = [&](const std::string_view& path, uint64_t size) {
ls_objects.emplace_back(path, size);
return true; // Continue traversal to next entry.
};
tiledb::VFSExperimental::ls_recursive(*ctx.get(), *vfs.get(), uri, cb);

size_t n = ls_objects.size();
Rcpp::CharacterVector path(n);
std::vector<int64_t> size(n);
for (size_t i=0; i<n; i++) {
auto obj = ls_objects[i];
path[i] = obj.first;
size[i] = static_cast<int64_t>(obj.second);
}
return Rcpp::DataFrame::create(Rcpp::Named("path") = path,
Rcpp::Named("size") = Rcpp::toInteger64(size));
#else
return Rcpp::DataFrame::create(Rcpp::Named("path") = Rcpp::CharacterVector(),
Rcpp::Named("size") = Rcpp::NumericVector());
#endif
}

/**
* Stats
*/
Expand Down

0 comments on commit 4806938

Please sign in to comment.