Update readme

Add test cases for synthetic data generation
MLO-lab · Oct 24, 2023 · 9e5cf55 · 9e5cf55
1 parent 09ac5bb
commit 9e5cf55
Show file tree

Hide file tree

Showing 4 changed files with 119 additions and 34 deletions.
diff --git a/README.md b/README.md
@@ -1,68 +1,123 @@
-[![Build](https://github.com/mlo-lab/muvi/actions/workflows/build.yml/badge.svg)](https://github.com/mlo-lab/muvi/actions/workflows/build.yml/)
-[![Coverage](https://codecov.io/gh/mlo-lab/muvi/branch/release/graph/badge.svg)](https://codecov.io/gh/mlo-lab/muvi)
-
 # MuVI
 
 A multi-view latent variable model with domain-informed structured sparsity, that integrates noisy domain expertise in terms of feature sets.
 
-## Quick links
-
 [Examples](examples/1_basic_tutorial.ipynb) | [Paper](https://proceedings.mlr.press/v206/qoku23a/qoku23a.pdf) | [BibTeX](citation.bib)
 
-## Setup
+[![Build](https://github.com/mlo-lab/muvi/actions/workflows/build.yml/badge.svg)](https://github.com/mlo-lab/muvi/actions/workflows/build.yml/)
+[![Coverage](https://codecov.io/gh/mlo-lab/muvi/branch/release/graph/badge.svg)](https://codecov.io/gh/mlo-lab/muvi)
 
-We suggest using [conda](https://docs.conda.io/en/latest/miniconda.html) to manage your environments, and either [pip](https://pypi.org/project/pip/) or [poetry](https://python-poetry.org/) to install `muvi` as a python package. Follow these steps to get `muvi` up and running!
+## Basic usage
+
+The `MuVI` class is the main entry point for loading the data and performing the inference:
+
+```py
+import numpy as np
+import pandas as pd
+import anndata as ad
+import mudata as md
+import muvi
+
+# Load processed input data (missing values are allowed)
+# Matrix of dimensions n_samples x n_rna_features
+rna_df = pd.read_csv(...)
+# Matrix of dimensions n_samples x n_prot_features
+prot_df = pd.read_csv(...)
+
+# Load prior feature sets, e.g. gene sets
+gene_sets = muvi.fs.from_gmt(...)
+# Binary matrix of dimensions n_gene_sets x n_rna_features
+gene_sets_mask = gene_sets.to_mask(rna_df.columns)
+
+# Create a MuVI object by passing both input data and prior information
+model = muvi.MuVI(
+    observations={"rna": rna_df, "prot": prot_df},
+    prior_masks={"rna": gene_sets_mask},
+    ...
+    device=device,
+)
+
+# Alternatively, create a MuVI model from AnnData (single-view)
+rna_adata = ad.AnnData(rna_df, dtype=np.float32)
+rna_adata.varm['gene_sets_mask'] = gene_sets_mask.T
+model = muvi.tl.from_adata(
+    adata, 
+    prior_mask_key="gene_sets_mask", 
+    ..., 
+    device=device
+)
+
+# Alternatively, create a MuVI model from MuData (multi-view)
+mdata = md.MuData({"rna": rna_adata, "prot": prot_adata})
+model = muvi.tl.mdata(
+    mdata, 
+    prior_mask_key="gene_sets_mask", 
+    ..., 
+    device=device
+)
+
+# Fit the model for a given number of training epochs
+model.fit(batch_size, n_epochs, ...)
+
+# Continue with the downstream analysis (see below)
+```
 
-### Remotely
+## Submodules
 
-1. Create a python environment in `conda`:
+The package consists of three additional submodules for analysing the results post-training:
 
-```bash
-conda create -n muvi python=3.9
-```
+- [`muvi.tl`](muvi/tools/utils.py) provides tools for downstream analysis, e.g.,
+  - compute `muvi.tl.variance_explained` across all factors and views
+  - `muvi.tl.test` the significance between the prior feature sets and the inferred factors
+  - apply clustering on the latent space such as `muvi.tl.leiden`
+  - `muvi.tl.save` the model in order to `muvi.tl.load` it at a later point in time
+- [`muvi.pl`](muvi/tools/plotting.py) works in tandem with `muvi.tl` by providing visualization methods such as
+  - `muvi.pl.variance_explained` (see above)
+  - plotting the latent space via `muvi.pl.tsne`, `muvi.pl.scatter` or `muvi.pl.stripplot`
+  - investigating factors in terms of their inferred loadings with `muvi.pl.inspect_factor`
+- [`muvi.fs`](muvi/tools/feature_sets.py) serves the data structure and methods for loading, processing and storing the prior information from feature sets
 
-2. Activate freshly created environment:
+## Tutorials
 
-```bash
-source activate muvi
-```
+Check out our [basic tutorial](examples/1_basic_tutorial.ipynb) to get familiar with `MuVI`, or jump straight to a [single-cell multiome](examples/3a_single-cell_multi-omics_integration.ipynb) analysis!
 
-3. Install `muvi` with `pip`:
+`R` users can readily export a trained `MuVI` model into `R` with a single line of code and resume the analysis with the [`MOFA2`](https://biofam.github.io/MOFA2) package.
 
-```bash
-python3 -m pip install git+https://github.com/MLO-lab/MuVI.git
+```py
+muvi.ext.save_as_hdf5(model, "muvi.hdf5", save_metadata=True)
 ```
 
-### Locally
+See [this vignette]([examples/4_single-cell_multi-omics_integration_R.html](https://raw.githack.com/MLO-lab/MuVI/master/examples/4_single-cell_multi-omics_integration_R.html)) for more details!
 
-1. Clone repository:
+## Installation
 
-```bash
-git clone https://github.com/MLO-lab/MuVI.git
-```
+We suggest using [conda](https://docs.conda.io/en/latest/miniconda.html) to manage your environments, and [pip](https://pypi.org/project/pip/) to install `muvi` as a python package. Follow these steps to get `muvi` up and running!
 
-2. Create a python environment in `conda`:
+1. Create a python environment in `conda`:
 
 ```bash
 conda create -n muvi python=3.9
 ```
 
-3. Activate freshly created environment:
+2. Activate freshly created environment:
 
 ```bash
 source activate muvi
 ```
 
-4. Install `muvi` with `poetry`:
+3. Install `muvi` with `pip`:
 
 ```bash
-cd MuVI
-poetry install
+python3 -m pip install muvi
 ```
 
-## Getting started
+4. Alternatively, install the latest version with `pip`:
+
+```bash
+python3 -m pip install git+https://github.com/MLO-lab/MuVI.git
+```
 
-Check out [basic tutorial](examples/1_basic_tutorial.ipynb) to get familiar with MuVI!
+Make sure to install a GPU version of [PyTorch](https://pytorch.org/) to significantly speed up the inference.
 
 ## Citation
 

diff --git a/muvi/core/models.py b/muvi/core/models.py
@@ -1131,13 +1131,13 @@ def fit(
             by default 0 (1000 // batch_size)
         learning_rate : float, optional
             Learning rate, by default 0.005
-        scale_elbo : bool
+        scale_elbo : bool, optional
             Whether to scale the ELBO across views, by default True
         optimizer : str, optional
             Optimizer as string, 'adam' or 'clipped', by default "clipped"
         callbacks : List[Callable], optional
             List of callbacks during training, by default None
-        verbose : bool
+        verbose : bool, optional
             Whether to log progress, by default True
         seed : int, optional
             Training seed, by default None

diff --git a/pyproject.toml b/pyproject.toml
@@ -115,5 +115,6 @@ exclude_lines = [
     "pragma: no cover"
 ]
 omit = [
-    "**/tests/*"
+    "**/tests/*",
+    "**/muvi/tools/plotting.py"
 ]
diff --git a/tests/test_synthetic.py b/tests/test_synthetic.py
@@ -33,6 +33,35 @@ def test_shapes(data_gen):
         )
 
 
+def test_generate_all_combs(data_gen):
+    data_gen.generate(all_combs=True)
+
+    four_variable_binary_table = np.array(
+        [
+            [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
+            [1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0],
+            [1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0],
+            [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
+        ]
+    )
+
+    np.testing.assert_equal(data_gen.view_factor_mask, four_variable_binary_table)
+
+
+def test_normalise(data_gen):
+    data_gen.generate()
+    data_gen.normalise(with_std=True)
+    for m in range(data_gen.n_views):
+        if data_gen.likelihoods[m] == "normal":
+            y = np.array(data_gen.ys[m], dtype=np.float32, copy=True)
+            np.testing.assert_almost_equal(
+                np.zeros_like(y.mean(axis=0)), y.mean(axis=0), decimal=3
+            )
+            np.testing.assert_almost_equal(
+                np.ones_like(y.std(axis=0)), y.std(axis=0), decimal=3
+            )
+
+
 def test_w_mask(data_gen):
     data_gen.generate()