-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[r] Port blockwise iterator/reader to R #2152
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #2152 +/- ##
==========================================
+ Coverage 65.46% 69.73% +4.26%
==========================================
Files 143 55 -88
Lines 12805 4794 -8011
Branches 510 0 -510
==========================================
- Hits 8383 3343 -5040
+ Misses 4334 1451 -2883
+ Partials 88 0 -88
Flags with carried forward coverage won't be shown. Click here to find out more.
|
b019048
to
0a4cd6c
Compare
# return(NULL) | ||
#} | ||
message("blockwise read next") | ||
if (is.null(private$soma_reader_pointer)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch.
[sc-42211] |
This pull request has been linked to Shortcut Story #42211: [r] Port blockwise sparse iterator from Python to R. |
0a5ac3b
to
15f50f9
Compare
a1b24a5
to
f975c61
Compare
apis/r/R/SOMASparseNDArrayRead.R
Outdated
"'size' must be a single integer value" = is.null(size) || | ||
rlang::is_integerish(size, 1L, finite = TRUE) || | ||
(inherits(size, 'integer64') && length(size) == 1L && is.finite(size)), | ||
"'reindex_disable_on_axis' must be avector of integers" = is.null(reindex_disable_on_axis) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"'reindex_disable_on_axis' must be avector of integers" = is.null(reindex_disable_on_axis) || | |
"'reindex_disable_on_axis' must be a vector of integers" = is.null(reindex_disable_on_axis) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mojaveazure @eddelbuettel Thanks for all the work on this! One high-level question at this stage: The Python In Python (scipy) However, also in Python, the internal representation of I emphasized in Python because of course I don't know if we're working under the same design constraints in R. |
@mlin There is a (currently unused) argument modified apis/r/R/BlockwiseIter.R
@@ -193,7 +193,7 @@ BlockwiseSparseReadIter <- R6::R6Class(
coords,
axis,
...,
- repr = "T",
+ repr = c("T","R","C"),
reindex_disable_on_axis = NULL
) {
super$initialize( permits return also of, respectively, a row- or column-compressed variant (dgRMatrix or dgCMatrix) corresponding to those compression formats. |
@mlin I would vote for only returning a COO matrix. The existing I can see a case for removing the |
Also, @eddelbuettel's fix isn't as simple as expanding the allowed values to |
(Well it passed the unit test where we do But thumbs for 'simpler is better'. COO seems fine as default. |
@mojaveazure @eddelbuettel Ok, no objection from me to COO-only in principle, subject to: What are we thinking about reindexing? As I mentioned, in Python it's practically essential, albeit because of an implementation detail in Besides that peculiarity, it's still useful for the iterator to tell you the range of the major axis (the one being strided) you're getting in each block. Reindexing is one, not the only, way of providing that info. Without, imagine getting the sparse matrix with a huge shape, and you know only one small block/stripe in it is populated, but you don't know where that is. Then the minor axis- there are some use cases for which it is and isn't helpful to reindex that too. I'm less concerned about that. |
Reindexing would be useful, and essential to offering CSC/CSR output in R as well. However, that has yet to be ported to R, so I don't think we can offer that now. As for knowing the range of the major axis, that poses different problem in R, and one I hadn't thought of. R does not allow tuple outputs, and has no native unpacking like Python. The ways around this that I can think of are:
block <- read_next()
attr(block, "indices") <- indices
attr(block, "axis") <- axis
As for minor-axis, this PR currently doesn't do anything there as reindexing is not a part of this PR |
As we discussed, overall this looks great! I do think we need to think carefully about @mlin's comment:
and your suggestion to return a list with the values might be the way to go. Perhaps @bkmartinjr and/or @pablo-gar could weigh in since this is used in the cellxgene-census package. |
…Iter$concat()` Plumb through `BlockwiseTableIter$conat()` and `BlockwiseTableIter$private$soma_reader_transform()` Slight rejiggering of `read_next()` to avoid multiple `$read_complete()` checks Improve `BlockwiseReadIterBase$read_next()` checks
…vate` Update docs
Delay registration of `nextElem.CoordsStrider()` and `hasNext.CoordsStrider()`
…r$next_element()`
Have `SparseReadIter$concat()` use new helper function
f4d27c1
to
a1f726d
Compare
@mojaveazure Thanks for the rebase, that was on my TODO as well but it has been a busy day. Looks like we inherited some good state from |
Bump develop version [ci skip]
Regarding
As per discussion above we decided collectively that the reindexer will be a follow-on PR, and the CI issue has been resolved via #2363 which has been merged to So I believe we are good to merge this PR. |
Connect the re-indexer to the blockwise iterator, allowing reads to be re-indexed on-the-fly. This PR parallels #1792 and completes #2152 and #2637; in addition, provides new shorthand for `reindex_disable_on_axis`: - `TRUE`: disable re-indexing on all axes - `FALSE: re-index on all axes - `NA`: re-index only on major axis, disable re-indexing on all axes (default) `BlockwiseTableReadIter$concat()` and `BlockwiseSparseReadIter$concat()` are disabled when re-indexing is requested (paralleling Python) `BlockwiseSparseReadIter` now accepts `repr = "R"` or `repr = "C"` under certain circumstances: - axis 0 (`soma_dim_0`) must be re-indexed to allow `repr = "R"` - axis 1 (`soma_dim_1`) must be re-indexed to allow `repr = "C"` `repr` of `"T"` is allowed in all circumstances and continues to be the default Two new fields are available to blockwise iterators: - `$axes_to_reindex`: a vector of minor axes slated to be re-indexed - `$reindexable`: status indicator stating if _any_ axis (major or minor) is slated to be re-indexed resolves #2671
Connect the re-indexer to the blockwise iterator, allowing reads to be re-indexed on-the-fly. This PR parallels #1792 and completes #2152 and #2637; in addition, provides new shorthand for `reindex_disable_on_axis`: - `TRUE`: disable re-indexing on all axes - `FALSE: re-index on all axes - `NA`: re-index only on major axis, disable re-indexing on all axes (default) `BlockwiseTableReadIter$concat()` and `BlockwiseSparseReadIter$concat()` are disabled when re-indexing is requested (paralleling Python) `BlockwiseSparseReadIter` now accepts `repr = "R"` or `repr = "C"` under certain circumstances: - axis 0 (`soma_dim_0`) must be re-indexed to allow `repr = "R"` - axis 1 (`soma_dim_1`) must be re-indexed to allow `repr = "C"` `repr` of `"T"` is allowed in all circumstances and continues to be the default Two new fields are available to blockwise iterators: - `$axes_to_reindex`: a vector of minor axes slated to be re-indexed - `$reindexable`: status indicator stating if _any_ axis (major or minor) is slated to be re-indexed resolves #2671
Connect the re-indexer to the blockwise iterator, allowing reads to be re-indexed on-the-fly. This PR parallels #1792 and completes #2152 and #2637; in addition, provides new shorthand for `reindex_disable_on_axis`: - `TRUE`: disable re-indexing on all axes - `FALSE: re-index on all axes - `NA`: re-index only on major axis, disable re-indexing on all axes (default) `BlockwiseTableReadIter$concat()` and `BlockwiseSparseReadIter$concat()` are disabled when re-indexing is requested (paralleling Python) `BlockwiseSparseReadIter` now accepts `repr = "R"` or `repr = "C"` under certain circumstances: - axis 0 (`soma_dim_0`) must be re-indexed to allow `repr = "R"` - axis 1 (`soma_dim_1`) must be re-indexed to allow `repr = "C"` `repr` of `"T"` is allowed in all circumstances and continues to be the default Two new fields are available to blockwise iterators: - `$axes_to_reindex`: a vector of minor axes slated to be re-indexed - `$reindexable`: status indicator stating if _any_ axis (major or minor) is slated to be re-indexed resolves #2671
Connect the re-indexer to the blockwise iterator, allowing reads to be re-indexed on-the-fly. This PR parallels #1792 and completes #2152 and #2637; in addition, provides new shorthand for `reindex_disable_on_axis`: - `TRUE`: disable re-indexing on all axes - `FALSE: re-index on all axes - `NA`: re-index only on major axis, disable re-indexing on all axes (default) `BlockwiseTableReadIter$concat()` and `BlockwiseSparseReadIter$concat()` are disabled when re-indexing is requested (paralleling Python) `BlockwiseSparseReadIter` now accepts `repr = "R"` or `repr = "C"` under certain circumstances: - axis 0 (`soma_dim_0`) must be re-indexed to allow `repr = "R"` - axis 1 (`soma_dim_1`) must be re-indexed to allow `repr = "C"` `repr` of `"T"` is allowed in all circumstances and continues to be the default Two new fields are available to blockwise iterators: - `$axes_to_reindex`: a vector of minor axes slated to be re-indexed - `$reindexable`: status indicator stating if _any_ axis (major or minor) is slated to be re-indexed resolves #2671 Co-authored-by: Paul Hoffman <mojaveazure@users.noreply.github.com>
Implement the blockwise iterator and reader for the R API
This PR parallels #1792; it implements new classes for blockwise iteration through a SOMA sparse nd-array. Blockwise iteration is implemented through
SOMASparseNDArrayRead$blockwise()
(paralleling the Python implenetation) and enabled for Arrow tables ($blockwise()$tables()
) and COO sparse matrices ($blockwise()$sparse_matrix()
)New classes:
CoordsStrider
: new class to iterate through coordinate similar to Python's_coords_strider
SOMASparseNDArrayReadBase
: base class for sparse array readsSOMASparseNDArrayBlockwiseRead
: new reader class for blockwise iterated readsBlockwiseReadIterBase
: base class for blockwise iterationBlockwiseTableReadIter
: blockwise iterator returning Arrow tablesBlockwiseSparseReadIter
: blockwise iterator returning sparse matricesNew SOMA methods:
SOMASparseNDArrayRead$blockwse()
: perform a blockwise readresolves #1853