-
Notifications
You must be signed in to change notification settings - Fork 378
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into maximum_bayes_factor
- Loading branch information
Showing
26 changed files
with
2,939 additions
and
75 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -63,6 +63,7 @@ import scvi | |
external.MRVI | ||
external.METHYLVI | ||
external.Decipher | ||
external.RESOLVI | ||
``` | ||
|
||
## Data loading | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,191 @@ | ||
# ResolVI | ||
|
||
**resolVI** (Python class {class}`~scvi.external.RESOLVI`) is a generative model of single-cell resolved spatial | ||
transcriptomics that can subsequently be used for many common downstream tasks. | ||
|
||
The advantages of resolVI are: | ||
|
||
- Addresses noise and bias in ST data due to wrong segmentation, unspecific background and limited spatial resolution | ||
- Scalable to very large datasets (>1 million cells). | ||
|
||
The limitations of resolVI include: | ||
|
||
- Effectively requires a GPU for fast inference. | ||
- Latent space is not interpretable, unlike that of a linear method. | ||
- Assumes single cells are observed and does not work with low resolution ST like Visium or Slide-Seq. | ||
|
||
```{topic} Tutorials: | ||
- {doc}`/tutorials/notebooks/spatial/resolVI_tutorial.ipynb` | ||
``` | ||
|
||
## Preliminaries | ||
|
||
ResolVI takes as input spatially-resolved RNA_seq count matrices downstream of cellular segmentation and molecule | ||
assignments to cells. These counts can be either derived from sequencing spatially-resolved molecules or fluorescent | ||
imaging. ResolVI leverages the gene expression of neighboring cells and reassigns observed gene expression to neighboring | ||
cells as well as an unspecific background. | ||
|
||
ResolVI accepts as input the observed expression of the cell itself, its spatial neighbors and their gene expression | ||
as well as the distance between these cells. Additionally, a vector of categorical covariates $S$, representing | ||
batch, donor, etc, is an optional input to the model. ResolVI provides a semi-supervised mode, adjusting the prior in | ||
the latent space for different cell types and training a classifier to predict cell types from latent embeddings. | ||
|
||
## Generative process | ||
|
||
ResolVI posits that the observed expression of cell $n$ in gene $g$, $x_{ng}$ is generated by the following process: | ||
|
||
```{math} | ||
:nowrap: true | ||
\begin{align} | ||
z &\sim \mathrm{MixtureOfGaussians}(\mu_1, \dots, \mu_K, \Sigma_1, \dots, \Sigma_K) \\ | ||
\alpha_n &\sim \mathrm{Dirichlet}(C) \\ | ||
r_{ng} &\sim \mathrm{Exponential}(R) \\ | ||
h_{ng} &= | ||
\mathrm{Gamma}(r_{ng}, \frac{r_{ng}}{\alpha_0 f_\theta(z, b) + \alpha_1 \sum\limits_{{N(n)}} \beta_{N(n)} f_\theta(z_{N(n)}, b)}) + \alpha_2 bg\\ | ||
x_{ng} &\sim \mathrm{Poisson}(l_n h_{ng}) | ||
\end{align} | ||
``` | ||
|
||
In particular, $z$ and $z_{N(n)}$ are the latent embeddings of the cell itself as well as its spatial neighbors | ||
both of dimension $L$. ResolVI uses a mixture of Gaussians prior on $z$: | ||
|
||
```{math} | ||
:nowrap: true | ||
\begin{align} | ||
c_n &\sim \textrm{Categorical}( | ||
\pi_1, \pi_2, \dots, \pi_K | ||
), \\ | ||
z_n \mid c_n = c &\sim \mathcal{N}(\mu_c, \sigma_c) | ||
\end{align} | ||
``` | ||
|
||
In brief, we assume that observed expression of gene $g$ for cell $n$ can be modelled as a sum over | ||
the components of expression truly expressed by the cell $\alpha_0$, the expression explained by neighboring | ||
cells $\alpha_1$ and wrongly assigned to $n$ and a component due to unspecific background $\alpha_2$. | ||
The expression of neighboring cells $N(n)$ is assigned to each of the neighboring cells $\beta_{N(n)n}$. | ||
Both the expression of cell $n$ and the expression of neighboring cells $N(n)$ are generated using the same | ||
generative network $f_\Theta$ from their respective latent code $z_{N(n)}$ and $z_n$. | ||
This generative process uses a neural network: | ||
|
||
```{math} | ||
:nowrap: true | ||
\begin{align} | ||
f_{\theta}(z_{n}, s_n) &: \mathbb{R}^{d} \times \{0, 1\}^K \to \Delta^{G-1} | ||
\end{align} | ||
``` | ||
|
||
which estimates the normalized gene expression of cell $n$. We use the observed counts per cell to scale these rates. | ||
|
||
The latent variables, along with their description are summarized in the following table: | ||
|
||
```{eval-rst} | ||
.. list-table:: | ||
:widths: 20 90 15of ce | ||
:header-rows: 1 | ||
* - Latent variable | ||
- Description | ||
- Code variable (if different) | ||
- Prior | ||
* - :math:`z_n \in \mathbb{R}^L` | ||
- Low-dimensional representation capturing the state of a cell | ||
- ``latent`` | ||
- Mixture-of-Gaussian | ||
* - :math:`\beta_{N(n)} \in \Delta^{N(n) - 1}` | ||
- Per-neighbor diffusion | ||
- ``per_neighbor_diffusion`` | ||
- Dirichlet | ||
* - :math:`\alpha_{n0 \dots 2} \in \Delta^{2}` | ||
- Per cell true, diffusion and background proportion | ||
- ``mixture_proportions`` | ||
- Dirichlet | ||
* - :math:`bg_{ng} \in \Delta^{G - 1}` | ||
- Per cell estimate of background | ||
- ``background`` | ||
- None | ||
* - :math:`background_{s} \in \mathbb{R}^G` | ||
- Per sample background vector | ||
- ``per_gene_background`` | ||
- Dirichlet | ||
* - :math:`\rho_n \in \Delta^{G - 1}` | ||
- Per cell rate of expression | ||
- ``px_scale`` | ||
- None | ||
* - :math:`\mu_n, \mu_{N(n)} \in \mathbb{R}^G` | ||
- Per cell estimated expression | ||
- ``px_rate and px_rate_n`` | ||
- None | ||
``` | ||
|
||
|
||
## Inference | ||
|
||
ResolVI uses variational inference, specifically auto-encoding variational Bayes | ||
(see {doc}`/user_guide/background/variational_inference`) in Pyro to learn both the model parameters | ||
(the neural network parameters, dispersion parameters, etc.) and an approximate posterior distribution. | ||
We perform amortization using neural network for $z_n$ and $\alpha_n$, while $\beta_{N(n)n}$ is estimated | ||
for each cell. | ||
|
||
## Tasks | ||
|
||
Here we provide an overview of some of the tasks that resolVI can perform. Please see {class}`scvi.external.RESOLVI` | ||
for the full API reference. | ||
|
||
### Dimensionality reduction | ||
|
||
For dimensionality reduction, the mean of the approximate posterior $q_\phi(z_i \mid y_i, n_i)$ is returned by default. | ||
This is achieved using the method: | ||
|
||
``` | ||
>>> adata.obsm["X_resolvi"] = model.get_latent_representation() | ||
``` | ||
|
||
Users may also return samples from this distribution, as opposed to the mean, by passing the argument `give_mean=False`. | ||
The latent representation can be used to create a nearest neighbor graph with scanpy with: | ||
|
||
``` | ||
>>> import scanpy as sc | ||
>>> sc.pp.neighbors(adata, use_rep="X_resolvi") | ||
>>> adata.obsp["distances"] | ||
``` | ||
|
||
### Transfer learning | ||
|
||
A resolVI model can be pre-trained on reference data and updated with query data using {meth}`~scvi.external.RESOLVI.load_query_data`, which then facilitates transfer of metadata like cell type annotations. $\beta_{N(n)n}$ is extended to the new cells and learned on these cells. The encoder by default does not see the batch covariate and $z_n$ can be predicted without performing query model training. See the {doc}`/user_guide/background/transfer_learning` guide for more information. | ||
|
||
### Estimation of true expression levels | ||
|
||
In {meth}`~scvi.external.RESOLVI.get_normalized_methylation` ResolVI returns the expected true expression value of $\rho_n$ under the approximate posterior. For one cell $n$, this can be written as: | ||
|
||
```{math} | ||
:nowrap: true | ||
\begin{align} | ||
\mathbb{E}_{q_\phi(z_n \mid x_n)}\left[f_{\theta}\left(z_{n}, s_n \right) \right] | ||
\end{align} | ||
``` | ||
|
||
### Differential expression | ||
|
||
Differential expression analysis is achieved with {meth}`~scvi.external.RESOLVI.differential_expression`. | ||
ResolVI tests differences in expression levels $\rho_{n} = f_{\theta}\left(z_n, s_n\right)$. | ||
We allow for importance based sampling using pyro's built-in function. | ||
|
||
### Cell-type prediction | ||
|
||
Prediction of cell-type labels is performed with {meth}`~scvi.external.RESOLVI.predict`. | ||
A semisupervised model is necessary to perform this analysis as it leverages the cell-type classifier. | ||
ResolVI performs for each cell $n$ $c_{n} = h_{nu}\left(z_n\right)$ and samples from $z_n$ to yield | ||
the cell-type labels. | ||
|
||
### Differential niche abundance | ||
|
||
Differential niche abundance analysis is achieved with {meth}`~scvi.external.RESOLVI.differential_niche_abundance`. | ||
A semisupervised model is necessary to perform this analysis as it leverages the cell-type classifier. | ||
ResolVI tests differences in abundance of various cell-types in the neighborhood of a cell $n$ | ||
$c_{n} = h_{nu}\left(z_n\right)$. Cell-type prediction vectors are averaged weighted by the distance of a specific cell | ||
and differential computation is performed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
from ._model import RESOLVI | ||
from ._module import RESOLVAE | ||
|
||
__all__ = ["RESOLVAE", "RESOLVI"] |
Oops, something went wrong.