Merge branch 'main' into maximum_bayes_factor

scverse · Jan 26, 2025 · 15fc4b5 · 15fc4b5
2 parents f2410ea + abfcbfc
commit 15fc4b5
Show file tree

Hide file tree

Showing 26 changed files with 2,939 additions and 75 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -41,7 +41,7 @@ repos:
           )$
 
   - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.9.1
+    rev: v0.9.2
     hooks:
       - id: ruff
         args: [--fix, --exit-non-zero-on-fix]

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -12,10 +12,13 @@ to [Semantic Versioning]. Full commit history is available in the
 
 - Add {class}`scvi.external.Decipher` for dimensionality reduction and interpretable
     representation learning in single-cell RNA sequencing data {pr}`3015`, {pr}`3091`.
+- Add {class}`scvi.external.RESOLVI` for bias correction in single-cell resolved spatial
+    transcriptomics {pr}`3144`.
 
 #### Fixed
 
 - Fixed bug in distributed {class}`scvi.dataloaders.ConcatDataLoader` {pr}`3053`.
+- Fixed bug when loading Pyro-based models and scArches support for Pyro {pr}`3138`
 
 #### Changed
 

diff --git a/docs/api/developer.md b/docs/api/developer.md
@@ -181,6 +181,7 @@ Module classes in the external API with respective generative and inference proc
    external.mrvi.MRVAE
    external.methylvi.METHYLVAE
    external.decipher.DecipherPyroModule
+   external.resolvi.RESOLVAE
 
 ```
 

diff --git a/docs/api/user.md b/docs/api/user.md
@@ -63,6 +63,7 @@ import scvi
    external.MRVI
    external.METHYLVI
    external.Decipher
+   external.RESOLVI
 ```
 
 ## Data loading

diff --git a/docs/references.bib b/docs/references.bib
@@ -313,3 +313,16 @@ @article{Zhang19
   pages = {1007--1015},
   publisher = {Nature Publishing Group}
 }
+
+@article{Ergen25,
+  title = {ResolVI - addressing noise and bias in spatial transcriptomics},
+  author = {Can Ergen and Nir Yosef},
+  doi = {...},
+  year = {2025},
+  month = sep,
+  journal = {biorxiv},
+  volume = {..},
+  number = {..},
+  pages = {...},
+  publisher = {...}
+}
diff --git a/docs/tutorials/index_spatial.md b/docs/tutorials/index_spatial.md
@@ -3,6 +3,7 @@
 ```{toctree}
 :maxdepth: 1
 
+notebooks/spatial/resolVI_tutorial
 notebooks/spatial/DestVI_tutorial
 notebooks/spatial/DestVI_in_R
 notebooks/spatial/gimvi_tutorial

diff --git a/docs/user_guide/index.md b/docs/user_guide/index.md
@@ -139,6 +139,9 @@ scvi-tools is composed of models that can perform one or many analysis tasks. In
    * - :doc:`/user_guide/models/tangram`
      - Deconvolution, single cell spatial mapping
      - :cite:p:`Biancalani21`
+   * - :doc:`/user_guide/models/resolvi`
+     - Generative model of single-cell resolved spatial transcriptomics
+     - :cite:p:`Ergen25`
 ```
 
 ## General purpose analysis

diff --git a/docs/user_guide/models/resolvi.md b/docs/user_guide/models/resolvi.md
@@ -0,0 +1,191 @@
+# ResolVI
+
+**resolVI** (Python class {class}`~scvi.external.RESOLVI`) is a generative model of single-cell resolved spatial
+transcriptomics that can subsequently be used for many common downstream tasks.
+
+The advantages of resolVI are:
+
+-   Addresses noise and bias in ST data due to wrong segmentation, unspecific background and limited spatial resolution
+-   Scalable to very large datasets (>1 million cells).
+
+The limitations of resolVI include:
+
+-   Effectively requires a GPU for fast inference.
+-   Latent space is not interpretable, unlike that of a linear method.
+-   Assumes single cells are observed and does not work with low resolution ST like Visium or Slide-Seq.
+
+```{topic} Tutorials:
+
+-   {doc}`/tutorials/notebooks/spatial/resolVI_tutorial.ipynb`
+```
+
+## Preliminaries
+
+ResolVI takes as input spatially-resolved RNA_seq count matrices downstream of cellular segmentation and molecule
+assignments to cells. These counts can be either derived from sequencing spatially-resolved molecules or fluorescent
+imaging. ResolVI leverages the gene expression of neighboring cells and reassigns observed gene expression to neighboring
+cells as well as an unspecific background.
+
+ResolVI accepts as input the observed expression of the cell itself, its spatial neighbors and their gene expression
+as well as the distance between these cells. Additionally, a vector of categorical covariates $S$, representing
+batch, donor, etc, is an optional input to the model. ResolVI provides a semi-supervised mode, adjusting the prior in
+the latent space for different cell types and training a classifier to predict cell types from latent embeddings.
+
+## Generative process
+
+ResolVI posits that the observed expression of cell $n$ in gene $g$, $x_{ng}$ is generated by the following process:
+
+```{math}
+:nowrap: true
+
+\begin{align}
+    z &\sim \mathrm{MixtureOfGaussians}(\mu_1, \dots, \mu_K, \Sigma_1, \dots, \Sigma_K) \\
+    \alpha_n &\sim \mathrm{Dirichlet}(C) \\
+    r_{ng} &\sim \mathrm{Exponential}(R) \\
+    h_{ng} &=
+    \mathrm{Gamma}(r_{ng}, \frac{r_{ng}}{\alpha_0 f_\theta(z, b) + \alpha_1 \sum\limits_{{N(n)}} \beta_{N(n)} f_\theta(z_{N(n)}, b)}) + \alpha_2 bg\\
+    x_{ng} &\sim \mathrm{Poisson}(l_n h_{ng})
+\end{align}
+```
+
+In particular, $z$ and $z_{N(n)}$ are the latent embeddings of the cell itself as well as its spatial neighbors
+both of dimension $L$. ResolVI uses a mixture of Gaussians prior on $z$:
+
+```{math}
+:nowrap: true
+
+\begin{align}
+    c_n &\sim \textrm{Categorical}(
+        \pi_1, \pi_2, \dots, \pi_K
+    ), \\
+    z_n \mid c_n = c &\sim \mathcal{N}(\mu_c, \sigma_c)
+\end{align}
+```
+
+In brief, we assume that observed expression of gene $g$ for cell $n$ can be modelled as a sum over
+the components of expression truly expressed by the cell $\alpha_0$, the expression explained by neighboring
+cells $\alpha_1$ and wrongly assigned to $n$ and a component due to unspecific background $\alpha_2$.
+The expression of neighboring cells $N(n)$ is assigned to each of the neighboring cells $\beta_{N(n)n}$.
+Both the expression of cell $n$ and the expression of neighboring cells $N(n)$ are generated using the same
+generative network $f_\Theta$ from their respective latent code $z_{N(n)}$ and $z_n$.
+This generative process uses a neural network:
+
+```{math}
+:nowrap: true
+
+\begin{align}
+    f_{\theta}(z_{n}, s_n) &: \mathbb{R}^{d} \times \{0, 1\}^K \to \Delta^{G-1}
+\end{align}
+```
+
+which estimates the normalized gene expression of cell $n$. We use the observed counts per cell to scale these rates.
+
+The latent variables, along with their description are summarized in the following table:
+
+```{eval-rst}
+.. list-table::
+   :widths: 20 90 15of ce
+   :header-rows: 1
+
+   * - Latent variable
+     - Description
+     - Code variable (if different)
+     - Prior
+   * - :math:`z_n \in \mathbb{R}^L`
+     - Low-dimensional representation capturing the state of a cell
+     - ``latent``
+     - Mixture-of-Gaussian
+   * - :math:`\beta_{N(n)} \in \Delta^{N(n) - 1}`
+     - Per-neighbor diffusion
+     - ``per_neighbor_diffusion``
+     - Dirichlet
+   * - :math:`\alpha_{n0 \dots 2} \in \Delta^{2}`
+     - Per cell true, diffusion and background proportion
+     - ``mixture_proportions``
+     - Dirichlet
+   * - :math:`bg_{ng} \in \Delta^{G - 1}`
+     - Per cell estimate of background
+     - ``background``
+     - None
+   * - :math:`background_{s} \in \mathbb{R}^G`
+     - Per sample background vector
+     - ``per_gene_background``
+     - Dirichlet
+   * - :math:`\rho_n \in \Delta^{G - 1}`
+     - Per cell rate of expression
+     - ``px_scale``
+     - None
+   * - :math:`\mu_n, \mu_{N(n)} \in \mathbb{R}^G`
+     - Per cell estimated expression
+     - ``px_rate and px_rate_n``
+     - None
+```
+
+
+## Inference
+
+ResolVI uses variational inference, specifically auto-encoding variational Bayes
+(see {doc}`/user_guide/background/variational_inference`) in Pyro to learn both the model parameters
+(the neural network parameters, dispersion parameters, etc.) and an approximate posterior distribution.
+We perform amortization using neural network for $z_n$ and $\alpha_n$, while $\beta_{N(n)n}$ is estimated
+for each cell.
+
+## Tasks
+
+Here we provide an overview of some of the tasks that resolVI can perform. Please see {class}`scvi.external.RESOLVI`
+for the full API reference.
+
+### Dimensionality reduction
+
+For dimensionality reduction, the mean of the approximate posterior $q_\phi(z_i \mid y_i, n_i)$ is returned by default.
+This is achieved using the method:
+
+```
+>>> adata.obsm["X_resolvi"] = model.get_latent_representation()
+```
+
+Users may also return samples from this distribution, as opposed to the mean, by passing the argument `give_mean=False`.
+The latent representation can be used to create a nearest neighbor graph with scanpy with:
+
+```
+>>> import scanpy as sc
+>>> sc.pp.neighbors(adata, use_rep="X_resolvi")
+>>> adata.obsp["distances"]
+```
+
+### Transfer learning
+
+A resolVI model can be pre-trained on reference data and updated with query data using {meth}`~scvi.external.RESOLVI.load_query_data`, which then facilitates transfer of metadata like cell type annotations. $\beta_{N(n)n}$ is extended to the new cells and learned on these cells. The encoder by default does not see the batch covariate and $z_n$ can be predicted without performing query model training. See the {doc}`/user_guide/background/transfer_learning` guide for more information.
+
+### Estimation of true expression levels
+
+In {meth}`~scvi.external.RESOLVI.get_normalized_methylation` ResolVI returns the expected true expression value of $\rho_n$ under the approximate posterior. For one cell $n$, this can be written as:
+
+```{math}
+:nowrap: true
+
+\begin{align}
+   \mathbb{E}_{q_\phi(z_n \mid x_n)}\left[f_{\theta}\left(z_{n}, s_n \right) \right]
+\end{align}
+```
+
+### Differential expression
+
+Differential expression analysis is achieved with {meth}`~scvi.external.RESOLVI.differential_expression`.
+ResolVI tests differences in expression levels $\rho_{n} = f_{\theta}\left(z_n, s_n\right)$.
+We allow for importance based sampling using pyro's built-in function.
+
+### Cell-type prediction
+
+Prediction of cell-type labels is performed with {meth}`~scvi.external.RESOLVI.predict`.
+A semisupervised model is necessary to perform this analysis as it leverages the cell-type classifier.
+ResolVI performs for each cell $n$ $c_{n} = h_{nu}\left(z_n\right)$ and samples from $z_n$ to yield
+the cell-type labels.
+
+### Differential niche abundance
+
+Differential niche abundance analysis is achieved with {meth}`~scvi.external.RESOLVI.differential_niche_abundance`.
+A semisupervised model is necessary to perform this analysis as it leverages the cell-type classifier.
+ResolVI tests differences in abundance of various cell-types in the neighborhood of a cell $n$
+$c_{n} = h_{nu}\left(z_n\right)$. Cell-type prediction vectors are averaged weighted by the distance of a specific cell
+and differential computation is performed.
diff --git a/pyproject.toml b/pyproject.toml
@@ -82,7 +82,7 @@ docs = [
 docsbuild = ["scvi-tools[docs,optional]"]
 
 # scvi.autotune
-autotune = ["hyperopt>=0.2", "ray[tune]>=2.5.0"]
+autotune = ["hyperopt>=0.2", "ray[tune]"]
 # scvi.hub.HubModel.pull_from_s3
 aws = ["boto3"]
 # scvi.data.cellxgene

diff --git a/src/scvi/external/__init__.py b/src/scvi/external/__init__.py
@@ -5,6 +5,7 @@
 from .methylvi import METHYLVI
 from .mrvi import MRVI
 from .poissonvi import POISSONVI
+from .resolvi import RESOLVI
 from .scar import SCAR
 from .scbasset import SCBASSET
 from .solo import SOLO
@@ -27,4 +28,5 @@
     "VELOVI",
     "MRVI",
     "METHYLVI",
+    "RESOLVI",
 ]
diff --git a/src/scvi/external/resolvi/__init__.py b/src/scvi/external/resolvi/__init__.py
@@ -0,0 +1,4 @@
+from ._model import RESOLVI
+from ._module import RESOLVAE
+
+__all__ = ["RESOLVAE", "RESOLVI"]