README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# scWGCNA V 1.0.0

<!-- badges: start -->
<!-- badges: end -->

scWGCNA is an adaptation of WGCNA to work with single-cell datasets.
The functionality is presented in  <a href="http://doi.org/10.1002/dvdy.384" target="_blank">Feregrino & Tschopp 2021</a>

The new version of the package allows for a better workflow, and more interaction with the data.
The package no longer produces markdown html reports, nor it saves files by itself.

scWGCNA works with Seurat objects.

## References

<a href="https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-559" target="_blank">Langfelder, P., Horvath, S. (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559</a>

<a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1001057" target="_blank">Langfelder P, Luo R, Oldham MC, Horvath S (2011) Is My Network Module Preserved and Reproducible?. PLOS Computational Biology 7(1): e1001057</a>

<a href="https://doi.org/10.1101/2021.02.09.430383" target="_blank">Feregrino, C, Tschopp, P (2021) Assessing evolutionary and developmental transcriptome dynamics in homologous cell types. bioRxiv 2021.02.09.430383</a>


## Installation

To install scWGCNA run in R:

```{r eval = FALSE}

devtools::install_github("cferegrino/scWGCNA", ref="main")

```

## Basic scWGCNA workflow

**A few words on data**:  

* This package is an adaptation of WGCNA, which was originally intended to find modules of co-expression in RNA-seq bulk data. We are therefore not sure of how it would behave with multi-batch data, and batch effects. We suggest to calculate modules based on one sample / library.  

* If you are planning to compare the modules of co-expression across samples (conditions / species) try to restrict the analysis to genes that are present and expressed across all samples, to obtain the best results.  

* This package works with Seurat objects, we therefore suggest to bare in mind and control what your default assay is, since this is used by the different functions.

### Pseudocell calculation

* Our first step is to calculate pseudocells from a Seurat object with pre-calculated PCA (or other reductions) and cell clusters.
For this example we are using a small subset of one of the datasets we use in our paper.  
**Warning**: This might be very time or memory consuming depending on the size of your dataset. (ca. 15 minutes for 10k cells). Consider to run this on a script for a dedicated job.

```{r}

library(scWGCNA)

# A seurat object we use as example
my.small_MmLimbE155

# Calculate the pseudocells
MmLimbE155.pcells = calculate.pseudocells(s.cells = my.small_MmLimbE155, # Single cells in Seurat object
                                          seeds=0.2, # Fraction of cells to use as seeds to aggregate pseudocells
                                          nn = 10, # Number of neighbors to aggregate
                                          reduction = "pca", # Reduction to use
                                          dims = 1:10) # The dimensions to use

```

### single-cell WGCNA

* Our next step is to perform WCGNA analyses
From this, we will obtain a scWGNCA object, which is a list with information obtained from the analyses.
For this, we suggest to use the pseudocells, calculated as shown above.
For the sake of this example, we ask for the analysis to be started with all the genes in our example dataset, as this is very small.  Normally, the function would find variable genes for us.  
The function prints some long messages, with important information about the analysis.  
**Important** If we obtain many modules with apparent expression in only 1, or a couple of cells, we can use the option "less=T" in this function, to remove such modules.

```{r}

# Run scWGCNA
MmLimbE155.scWGCNA = run.scWGCNA(p.cells = MmLimbE155.pcells, # Pseudocells (recommended), or Seurat single cells
                                 s.cells = my.small_MmLimbE155, # single cells in Seurat format
                                 is.pseudocell = T, # We are using single cells twice this time
                                 features = rownames(my.small_MmLimbE155)) # Recommended: variable genes

```

### Modules of co-expression

* We can now see which modules were detected by WGCNA after the iterations
* Very important, we can see what is the **average expression** of each module per cell, or pseudocell.

```{r}
# Plot the gene / modules dendrogram
scW.p.dendro(scWGCNA.data = MmLimbE155.scWGCNA)

#Look at the membership tables
names(MmLimbE155.scWGCNA$modules)

#Let's look at the first module, "blue"
head(MmLimbE155.scWGCNA$modules$`1_blue`)

# We use gene unique identifiers. For this reason we need to translate them to names
my.module = MmLimbE155.scWGCNA$modules$`1_blue`

# We have gene names translation in the misc slot of this seurat object.
my.gnames = my.small_MmLimbE155@misc$gnames
head(my.gnames)

# Look at the table again
my.module$gname = my.gnames[rownames(my.module), "Gene.name"]
head(my.module)

# Here we can see what is the expression of each co-expression module, per cell. This is in one of the list items
head(MmLimbE155.scWGCNA[["sc.MEList"]]$averageExpr)

# If we want to see the average module expression per pseudocell instead of single cells, we find it here
head(MmLimbE155.scWGCNA[["MEList"]]$averageExpr)

```

* Using the information of the modules we calculated, we can also calculate what's the average expression of each module in any other set of cells. This, as long as we have a Seurat object with the same gene names as the ones used to calculate the modules. This is very useful to calculate expression and module activity in another sample or dataset, and then use the expression as if it was another gene.
* Moreover, we can calculate the membership of each gene to its module, taking as a reference the expression of any other expression dataset. These memberships, can be then compared across samples.

```{r}
# Here we can calculate the different eigengenes, using the single cell data from the mouse.
MmLimb.eigengenes.sc = scW.eigen(modules = MmLimbE155.scWGCNA, seurat = my.small_MmLimbE155)

# We can then find the expression of the different modules in this slot
head(MmLimb.eigengenes.sc$Module.Eigengenes$averageExpr)

# we can explore the membership of each gene to its module, by examining the scWGCNA data list object
head(MmLimbE155.scWGCNA[["modules"]]$`1_blue`)

# And we can access the membership and pvalues calculated from another data set, by looking at these tables
head(MmLimb.eigengenes.sc$Gene.Module.Membership)

head(MmLimb.eigengenes.sc$Module.Membership.Pvalue)

```

* We can also plot what's the mean expression of all, or each module, across the single cells

```{r}
# Plot the expression of all modules at once
scW.p.expression(s.cells = my.small_MmLimbE155, # Single cells in Seurat format
                       scWGCNA.data = MmLimbE155.scWGCNA, # scWGCNA list dataset
                       modules = "all", # Which modules to plot?
                       reduction = "tsne", # Which reduction to plot?
                       ncol=3) # How many columns to use, in case we're plotting several?

#Plot only the expression of the first module, "blue"
scW.p.expression(s.cells = my.small_MmLimbE155,
                       scWGCNA.data = MmLimbE155.scWGCNA,
                       modules = 1)

```

* We can also plot the different modules in a network visualization, to observe the relationships between the genes.  

```{r}
# First generate the networks in the scWCGNA object
MmLimbE155.scWGCNA = scWGCNA.networks(scWGCNA.data = MmLimbE155.scWGCNA)

# Plot the module "blue" as a network
scW.p.network(MmLimbE155.scWGCNA, module=1)

# Plot the module "blue" as a network. Again, we are using gene unique identifiers, not very informative.
# For this, we use the gene names translation we had before.
scW.p.network(MmLimbE155.scWGCNA, module=1, gnames = my.gnames)

```

### Module comparison accross samples

* We can now run a comparative analysis against other samples (different species, treatments, etc.).  
This is basically running the modulePreservation function from WGCNA, using the data we have generated so far.
For this, we use data from another species as the test, which will be compared against the reference (our calculated scWGCNA data).
For this, in case we have different species, we need a table of 1-to-1 orthologues.

```{r warning=FALSE}
# A Seurat object with chicken limb cells
my.small_GgLimbHH29

# We calculate pseudocells for this object
Gg.ps=calculate.pseudocells(my.small_GgLimbHH29, dims = 1:10)

# Our Seurat objects contain gene names equivalencies in the misc slot.
# We use them to biuld a table of orthologous genes
my.ortho = merge(my.small_MmLimbE155@misc$gnames,my.small_GgLimbHH29@misc$gnames, by = "Gene.name")
head(my.ortho)
my.ortho=my.ortho[,2:3]

# We then run the comparative analysis
MmvGg.comparative = scWGNA.compare(scWGCNA.data = MmLimbE155.scWGCNA,
                                   test.list = list(Gg.ps),
                                   test.names = c("Gg"),
                                   ortho = my.ortho, # not needed unless reference and tests have different gene names
                                   ortho.sp = c(2))

```

* We can now check if any genes were "lost", due to not being present as 1-to-1 orthologues (if it's the case), or not being expressed in all samples.

* **Very important**, we suggest to make the calculation of modules, with genes that you will already find in other samples from the very beginning. Nonetheless, we show here how to go about in case you didn't.

```{r}
# We can see how many genes are present in each module, at each category. In the misc slot of the scWGCNA comparative list object
MmvGg.comparative$misc$modulefrac

# And the identity of the lost genes is found under the same slot
MmvGg.comparative$misc$geneslost

# We can also plot these fractions as barplots, for each module we used for comparisons
scW.p.modulefrac(MmvGg.comparative)
```

* Finally, we can plot the different aspects of the module conservation ("preservation", as the authors of WGCNA refere to).

Here, we can plot 4 different aspects, the zscore of the overall preservation, which according to the original authors of WGCNA, is understood as a threshold. Values under 2 mean no evidence of preservation in the test sample, over 10 mean strong evidence of preservation. Moreover, we can plot the median rank, which we can use to compare the preservation of the different modules. And, to go into details of the modules, we can plot the zscore of density and connectivity preservation.  


```{r}
# Here we can plot the overall preservation and median rank.
scW.p.preservation(scWGCNA.comp.data = MmvGg.comparative,
                   to.plot=c("preservation", "median.rank"))

# We can also plot the preservation of density and connectivity.
scW.p.preservation(scWGCNA.comp.data = MmvGg.comparative,
                   to.plot=c("density", "connectivity"))
```

While these are the only aspects we plot here, all other aspects of the preservation can be explored in the list object we obtained. We refer the user to read the original publications:
<a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1001057" target="_blank">Langfelder P, Luo R, Oldham MC, Horvath S (2011) Is My Network Module Preserved and Reproducible?. PLOS Computational Biology 7(1): e1001057</a>