Skip to content

Commit

Permalink
Update readme
Browse files Browse the repository at this point in the history
Add test cases for synthetic data generation
  • Loading branch information
arberqoku committed Oct 24, 2023
1 parent 09ac5bb commit 9e5cf55
Show file tree
Hide file tree
Showing 4 changed files with 119 additions and 34 deletions.
117 changes: 86 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,68 +1,123 @@
[![Build](https://github.com/mlo-lab/muvi/actions/workflows/build.yml/badge.svg)](https://github.com/mlo-lab/muvi/actions/workflows/build.yml/)
[![Coverage](https://codecov.io/gh/mlo-lab/muvi/branch/release/graph/badge.svg)](https://codecov.io/gh/mlo-lab/muvi)

# MuVI

A multi-view latent variable model with domain-informed structured sparsity, that integrates noisy domain expertise in terms of feature sets.

## Quick links

[Examples](examples/1_basic_tutorial.ipynb) | [Paper](https://proceedings.mlr.press/v206/qoku23a/qoku23a.pdf) | [BibTeX](citation.bib)

## Setup
[![Build](https://github.com/mlo-lab/muvi/actions/workflows/build.yml/badge.svg)](https://github.com/mlo-lab/muvi/actions/workflows/build.yml/)
[![Coverage](https://codecov.io/gh/mlo-lab/muvi/branch/release/graph/badge.svg)](https://codecov.io/gh/mlo-lab/muvi)

We suggest using [conda](https://docs.conda.io/en/latest/miniconda.html) to manage your environments, and either [pip](https://pypi.org/project/pip/) or [poetry](https://python-poetry.org/) to install `muvi` as a python package. Follow these steps to get `muvi` up and running!
## Basic usage

The `MuVI` class is the main entry point for loading the data and performing the inference:

```py
import numpy as np
import pandas as pd
import anndata as ad
import mudata as md
import muvi

# Load processed input data (missing values are allowed)
# Matrix of dimensions n_samples x n_rna_features
rna_df = pd.read_csv(...)
# Matrix of dimensions n_samples x n_prot_features
prot_df = pd.read_csv(...)

# Load prior feature sets, e.g. gene sets
gene_sets = muvi.fs.from_gmt(...)
# Binary matrix of dimensions n_gene_sets x n_rna_features
gene_sets_mask = gene_sets.to_mask(rna_df.columns)

# Create a MuVI object by passing both input data and prior information
model = muvi.MuVI(
observations={"rna": rna_df, "prot": prot_df},
prior_masks={"rna": gene_sets_mask},
...
device=device,
)

# Alternatively, create a MuVI model from AnnData (single-view)
rna_adata = ad.AnnData(rna_df, dtype=np.float32)
rna_adata.varm['gene_sets_mask'] = gene_sets_mask.T
model = muvi.tl.from_adata(
adata,
prior_mask_key="gene_sets_mask",
...,
device=device
)

# Alternatively, create a MuVI model from MuData (multi-view)
mdata = md.MuData({"rna": rna_adata, "prot": prot_adata})
model = muvi.tl.mdata(
mdata,
prior_mask_key="gene_sets_mask",
...,
device=device
)

# Fit the model for a given number of training epochs
model.fit(batch_size, n_epochs, ...)

# Continue with the downstream analysis (see below)
```

### Remotely
## Submodules

1. Create a python environment in `conda`:
The package consists of three additional submodules for analysing the results post-training:

```bash
conda create -n muvi python=3.9
```
- [`muvi.tl`](muvi/tools/utils.py) provides tools for downstream analysis, e.g.,
- compute `muvi.tl.variance_explained` across all factors and views
- `muvi.tl.test` the significance between the prior feature sets and the inferred factors
- apply clustering on the latent space such as `muvi.tl.leiden`
- `muvi.tl.save` the model in order to `muvi.tl.load` it at a later point in time
- [`muvi.pl`](muvi/tools/plotting.py) works in tandem with `muvi.tl` by providing visualization methods such as
- `muvi.pl.variance_explained` (see above)
- plotting the latent space via `muvi.pl.tsne`, `muvi.pl.scatter` or `muvi.pl.stripplot`
- investigating factors in terms of their inferred loadings with `muvi.pl.inspect_factor`
- [`muvi.fs`](muvi/tools/feature_sets.py) serves the data structure and methods for loading, processing and storing the prior information from feature sets

2. Activate freshly created environment:
## Tutorials

```bash
source activate muvi
```
Check out our [basic tutorial](examples/1_basic_tutorial.ipynb) to get familiar with `MuVI`, or jump straight to a [single-cell multiome](examples/3a_single-cell_multi-omics_integration.ipynb) analysis!

3. Install `muvi` with `pip`:
`R` users can readily export a trained `MuVI` model into `R` with a single line of code and resume the analysis with the [`MOFA2`](https://biofam.github.io/MOFA2) package.

```bash
python3 -m pip install git+https://github.com/MLO-lab/MuVI.git
```py
muvi.ext.save_as_hdf5(model, "muvi.hdf5", save_metadata=True)
```

### Locally
See [this vignette]([examples/4_single-cell_multi-omics_integration_R.html](https://raw.githack.com/MLO-lab/MuVI/master/examples/4_single-cell_multi-omics_integration_R.html)) for more details!

1. Clone repository:
## Installation

```bash
git clone https://github.com/MLO-lab/MuVI.git
```
We suggest using [conda](https://docs.conda.io/en/latest/miniconda.html) to manage your environments, and [pip](https://pypi.org/project/pip/) to install `muvi` as a python package. Follow these steps to get `muvi` up and running!

2. Create a python environment in `conda`:
1. Create a python environment in `conda`:

```bash
conda create -n muvi python=3.9
```

3. Activate freshly created environment:
2. Activate freshly created environment:

```bash
source activate muvi
```

4. Install `muvi` with `poetry`:
3. Install `muvi` with `pip`:

```bash
cd MuVI
poetry install
python3 -m pip install muvi
```

## Getting started
4. Alternatively, install the latest version with `pip`:

```bash
python3 -m pip install git+https://github.com/MLO-lab/MuVI.git
```

Check out [basic tutorial](examples/1_basic_tutorial.ipynb) to get familiar with MuVI!
Make sure to install a GPU version of [PyTorch](https://pytorch.org/) to significantly speed up the inference.

## Citation

Expand Down
4 changes: 2 additions & 2 deletions muvi/core/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -1131,13 +1131,13 @@ def fit(
by default 0 (1000 // batch_size)
learning_rate : float, optional
Learning rate, by default 0.005
scale_elbo : bool
scale_elbo : bool, optional
Whether to scale the ELBO across views, by default True
optimizer : str, optional
Optimizer as string, 'adam' or 'clipped', by default "clipped"
callbacks : List[Callable], optional
List of callbacks during training, by default None
verbose : bool
verbose : bool, optional
Whether to log progress, by default True
seed : int, optional
Training seed, by default None
Expand Down
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -115,5 +115,6 @@ exclude_lines = [
"pragma: no cover"
]
omit = [
"**/tests/*"
"**/tests/*",
"**/muvi/tools/plotting.py"
]
29 changes: 29 additions & 0 deletions tests/test_synthetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,35 @@ def test_shapes(data_gen):
)


def test_generate_all_combs(data_gen):
data_gen.generate(all_combs=True)

four_variable_binary_table = np.array(
[
[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0],
[1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0],
[1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1],
]
)

np.testing.assert_equal(data_gen.view_factor_mask, four_variable_binary_table)


def test_normalise(data_gen):
data_gen.generate()
data_gen.normalise(with_std=True)
for m in range(data_gen.n_views):
if data_gen.likelihoods[m] == "normal":
y = np.array(data_gen.ys[m], dtype=np.float32, copy=True)
np.testing.assert_almost_equal(
np.zeros_like(y.mean(axis=0)), y.mean(axis=0), decimal=3
)
np.testing.assert_almost_equal(
np.ones_like(y.std(axis=0)), y.std(axis=0), decimal=3
)


def test_w_mask(data_gen):
data_gen.generate()

Expand Down

0 comments on commit 9e5cf55

Please sign in to comment.