diff --git a/README.md b/README.md index d64910e..0e1cc18 100644 --- a/README.md +++ b/README.md @@ -1,68 +1,123 @@ -[![Build](https://github.com/mlo-lab/muvi/actions/workflows/build.yml/badge.svg)](https://github.com/mlo-lab/muvi/actions/workflows/build.yml/) -[![Coverage](https://codecov.io/gh/mlo-lab/muvi/branch/release/graph/badge.svg)](https://codecov.io/gh/mlo-lab/muvi) - # MuVI A multi-view latent variable model with domain-informed structured sparsity, that integrates noisy domain expertise in terms of feature sets. -## Quick links - [Examples](examples/1_basic_tutorial.ipynb) | [Paper](https://proceedings.mlr.press/v206/qoku23a/qoku23a.pdf) | [BibTeX](citation.bib) -## Setup +[![Build](https://github.com/mlo-lab/muvi/actions/workflows/build.yml/badge.svg)](https://github.com/mlo-lab/muvi/actions/workflows/build.yml/) +[![Coverage](https://codecov.io/gh/mlo-lab/muvi/branch/release/graph/badge.svg)](https://codecov.io/gh/mlo-lab/muvi) -We suggest using [conda](https://docs.conda.io/en/latest/miniconda.html) to manage your environments, and either [pip](https://pypi.org/project/pip/) or [poetry](https://python-poetry.org/) to install `muvi` as a python package. Follow these steps to get `muvi` up and running! +## Basic usage + +The `MuVI` class is the main entry point for loading the data and performing the inference: + +```py +import numpy as np +import pandas as pd +import anndata as ad +import mudata as md +import muvi + +# Load processed input data (missing values are allowed) +# Matrix of dimensions n_samples x n_rna_features +rna_df = pd.read_csv(...) +# Matrix of dimensions n_samples x n_prot_features +prot_df = pd.read_csv(...) + +# Load prior feature sets, e.g. gene sets +gene_sets = muvi.fs.from_gmt(...) +# Binary matrix of dimensions n_gene_sets x n_rna_features +gene_sets_mask = gene_sets.to_mask(rna_df.columns) + +# Create a MuVI object by passing both input data and prior information +model = muvi.MuVI( + observations={"rna": rna_df, "prot": prot_df}, + prior_masks={"rna": gene_sets_mask}, + ... + device=device, +) + +# Alternatively, create a MuVI model from AnnData (single-view) +rna_adata = ad.AnnData(rna_df, dtype=np.float32) +rna_adata.varm['gene_sets_mask'] = gene_sets_mask.T +model = muvi.tl.from_adata( + adata, + prior_mask_key="gene_sets_mask", + ..., + device=device +) + +# Alternatively, create a MuVI model from MuData (multi-view) +mdata = md.MuData({"rna": rna_adata, "prot": prot_adata}) +model = muvi.tl.mdata( + mdata, + prior_mask_key="gene_sets_mask", + ..., + device=device +) + +# Fit the model for a given number of training epochs +model.fit(batch_size, n_epochs, ...) + +# Continue with the downstream analysis (see below) +``` -### Remotely +## Submodules -1. Create a python environment in `conda`: +The package consists of three additional submodules for analysing the results post-training: -```bash -conda create -n muvi python=3.9 -``` +- [`muvi.tl`](muvi/tools/utils.py) provides tools for downstream analysis, e.g., + - compute `muvi.tl.variance_explained` across all factors and views + - `muvi.tl.test` the significance between the prior feature sets and the inferred factors + - apply clustering on the latent space such as `muvi.tl.leiden` + - `muvi.tl.save` the model in order to `muvi.tl.load` it at a later point in time +- [`muvi.pl`](muvi/tools/plotting.py) works in tandem with `muvi.tl` by providing visualization methods such as + - `muvi.pl.variance_explained` (see above) + - plotting the latent space via `muvi.pl.tsne`, `muvi.pl.scatter` or `muvi.pl.stripplot` + - investigating factors in terms of their inferred loadings with `muvi.pl.inspect_factor` +- [`muvi.fs`](muvi/tools/feature_sets.py) serves the data structure and methods for loading, processing and storing the prior information from feature sets -2. Activate freshly created environment: +## Tutorials -```bash -source activate muvi -``` +Check out our [basic tutorial](examples/1_basic_tutorial.ipynb) to get familiar with `MuVI`, or jump straight to a [single-cell multiome](examples/3a_single-cell_multi-omics_integration.ipynb) analysis! -3. Install `muvi` with `pip`: +`R` users can readily export a trained `MuVI` model into `R` with a single line of code and resume the analysis with the [`MOFA2`](https://biofam.github.io/MOFA2) package. -```bash -python3 -m pip install git+https://github.com/MLO-lab/MuVI.git +```py +muvi.ext.save_as_hdf5(model, "muvi.hdf5", save_metadata=True) ``` -### Locally +See [this vignette]([examples/4_single-cell_multi-omics_integration_R.html](https://raw.githack.com/MLO-lab/MuVI/master/examples/4_single-cell_multi-omics_integration_R.html)) for more details! -1. Clone repository: +## Installation -```bash -git clone https://github.com/MLO-lab/MuVI.git -``` +We suggest using [conda](https://docs.conda.io/en/latest/miniconda.html) to manage your environments, and [pip](https://pypi.org/project/pip/) to install `muvi` as a python package. Follow these steps to get `muvi` up and running! -2. Create a python environment in `conda`: +1. Create a python environment in `conda`: ```bash conda create -n muvi python=3.9 ``` -3. Activate freshly created environment: +2. Activate freshly created environment: ```bash source activate muvi ``` -4. Install `muvi` with `poetry`: +3. Install `muvi` with `pip`: ```bash -cd MuVI -poetry install +python3 -m pip install muvi ``` -## Getting started +4. Alternatively, install the latest version with `pip`: + +```bash +python3 -m pip install git+https://github.com/MLO-lab/MuVI.git +``` -Check out [basic tutorial](examples/1_basic_tutorial.ipynb) to get familiar with MuVI! +Make sure to install a GPU version of [PyTorch](https://pytorch.org/) to significantly speed up the inference. ## Citation diff --git a/muvi/core/models.py b/muvi/core/models.py index c05b9c2..1041ea3 100755 --- a/muvi/core/models.py +++ b/muvi/core/models.py @@ -1131,13 +1131,13 @@ def fit( by default 0 (1000 // batch_size) learning_rate : float, optional Learning rate, by default 0.005 - scale_elbo : bool + scale_elbo : bool, optional Whether to scale the ELBO across views, by default True optimizer : str, optional Optimizer as string, 'adam' or 'clipped', by default "clipped" callbacks : List[Callable], optional List of callbacks during training, by default None - verbose : bool + verbose : bool, optional Whether to log progress, by default True seed : int, optional Training seed, by default None diff --git a/pyproject.toml b/pyproject.toml index 6dec52d..d7923c5 100755 --- a/pyproject.toml +++ b/pyproject.toml @@ -115,5 +115,6 @@ exclude_lines = [ "pragma: no cover" ] omit = [ - "**/tests/*" + "**/tests/*", + "**/muvi/tools/plotting.py" ] \ No newline at end of file diff --git a/tests/test_synthetic.py b/tests/test_synthetic.py index 864304e..9c4bd94 100755 --- a/tests/test_synthetic.py +++ b/tests/test_synthetic.py @@ -33,6 +33,35 @@ def test_shapes(data_gen): ) +def test_generate_all_combs(data_gen): + data_gen.generate(all_combs=True) + + four_variable_binary_table = np.array( + [ + [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0], + [1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0], + [1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0], + [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1], + ] + ) + + np.testing.assert_equal(data_gen.view_factor_mask, four_variable_binary_table) + + +def test_normalise(data_gen): + data_gen.generate() + data_gen.normalise(with_std=True) + for m in range(data_gen.n_views): + if data_gen.likelihoods[m] == "normal": + y = np.array(data_gen.ys[m], dtype=np.float32, copy=True) + np.testing.assert_almost_equal( + np.zeros_like(y.mean(axis=0)), y.mean(axis=0), decimal=3 + ) + np.testing.assert_almost_equal( + np.ones_like(y.std(axis=0)), y.std(axis=0), decimal=3 + ) + + def test_w_mask(data_gen): data_gen.generate()