Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[r] Add initial support for ragged array writing for Seurat v5 #2523

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

mojaveazure
Copy link
Member

@mojaveazure mojaveazure commented May 7, 2024

Seurat v5 adds support for ragged arrays, where not every X layer has exactly the same cells and features. To handle these ragged arrays on ingestion, re-indexing the soma join IDs is necessary to pad the X layer to the full domain of the SOMA measurement

Implemented SOMA methods:

  • write_soma.Assay5(): write a Seurat v5 assay to a SOMA measurement. When writing X layers, if a layer is ragged:
    • cast layer to TsparseMatrix for COO representation
    • re-index Seurat's character IDs to SOMA join IDs
    • re-index COO coordinates to SOMA join IDs
    • write array using SOMASparseNDArray$private$.write_coo_dataframe()

Notes:

  • This PR does not implement alternate matrix (eg. DelayedArray, BPCells) ingestion

SC-46644

resolves #2658

@mojaveazure mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from b21a806 to 49e4edf Compare May 30, 2024 21:01
@mojaveazure mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from 49e4edf to 3f5bca1 Compare July 15, 2024 21:57
@mojaveazure mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from 3f5bca1 to c5a48a3 Compare August 1, 2024 19:08
@mojaveazure mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch 2 times, most recently from 6461bc9 to b692361 Compare August 14, 2024 20:53
@mojaveazure mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch 3 times, most recently from cb7147e to 86a7be1 Compare September 9, 2024 17:43
@mojaveazure mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from 86a7be1 to f80acdc Compare September 16, 2024 14:41
mojaveazure added a commit that referenced this pull request Sep 17, 2024
Seurat v5 adds support for ragged arrays, where not every `X` layer has
exactly the same cells and features. To handle this, ragged `X` layers
need to be re-indexed and re-shaped on ingestion to resize down to only
the data present

Modified SOMA methods:
 - `SOMAExperimentAxisQuery$to_seurat()` and
   `SOMAExperimentAxisQuery$to_seurat_assay()`: now read in as v5 assays

New SOMA methods:
 - `SOMAExperimentAxisQuery$private$.to_seurat_assay_v5()`: helper
   method to read in ragged and non-ragged arrays into a v5 assay; note
   this method only handles expression layers, all other assay-level
   information is handled by parent `$to_seurat_assay()` to share code
   with v3 assay outgestion

Requires #2523 and #3007

[SC-52261](https://app.shortcut.com/tiledb-inc/story/52261/)
Seurat v5 adds support for ragged arrays, where not every `X` layer has
exactly the same cells and features. To handle these ragged arrays on
ingestion, re-indexing the soma join IDs is necessary to pad the `X`
layer to the full domain of the SOMA measurement

Implemented SOMA methods:
- `write_soma.Assay5()`: write a Seurat v5 assay to a SOMA measurement.
  When writing `X` layers, if a layer is ragged:
  - cast layer to `TsparseMatrix` for COO representation
  - re-index Seurat's character IDs to SOMA join IDs
  - re-index COO coordinates to SOMA join IDs
  - write array using `SOMASparseNDArray$private$.write_coo_dataframe()`

Notes:
 - This PR does not implement alternate matrix (eg. DelayedArray,
   BPCells) ingestion
@mojaveazure mojaveazure force-pushed the paulhoffman/sc-46644/add-support-for-ragged-arrays-in-write-soma branch from db6e539 to 9d0357a Compare September 17, 2024 18:02
@mojaveazure mojaveazure marked this pull request as ready for review September 17, 2024 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[r] Add support for ragged arrays in write_soma()
1 participant