Skip to content

Latest commit

 

History

History
213 lines (161 loc) · 6.48 KB

README.md

File metadata and controls

213 lines (161 loc) · 6.48 KB

ppEffect

Codecov test coverage

The goal of ppEffect is to systematically evaluate protoplasting effect (enzymatic effect) in scRNA-seq datasets

Installation

You can install the development version of ppEffect from GitHub with:

# install.packages("devtools")
devtools::install_github("YAOJ-bioin/ppEffect")

Example

This is a basic example which shows you how to use this package:

Load packages

library(ppEffect)
library(rmarkdown)
library(markdown)
library(Seurat)
library(dplyr)
library(cowplot)
library(plotly)
library(ggvenn)
library(RColorBrewer)

Load your scRNA dataset

Prepare your scRNA dataset and load it, seurat object is compatible in our package.

## load your scRNA dataset, and create a Seurat object
data_obj <- readRDS("../data_dir/GSE123818_at_root_anno_simple.rds")

## Load 10X data from cellRanger results
# raw.data <- Read10X(data.dir = "./data/filtered_gene_bc_matrices/GSE123818_at_root_anno/")
## Initialize the Seurat object with the raw (non-normalized data).
# data_obj<- CreateSeuratObject(counts = raw.data, project = "at_root", min.cells = 3, min.features = 200)
# data_obj

Standard pre-processing workflow

The pre-processing workflow is as same as the basic pipeline on seurat.

data_obj <- SCTransform(data_obj, verbose = FALSE)

data_obj <- RunPCA(data_obj, verbose = FALSE, approx = FALSE, npcs = 10, seed.use = NULL)
data_obj <- RunUMAP(data_obj, dims = 1:10)
data_obj <- RunTSNE(data_obj, dims = 1:10)
data_obj <- FindNeighbors(data_obj, reduction = "pca", dims = 1:10)
data_obj <- FindClusters(data_obj, resolution = 0.8)

## Find marker genes is necessary, we require to submit the data.frame of marker genes in the module of evaluation.
Markers <- FindAllMarkers(data_obj)

Our package mainly contains three modules: Data, Evaluation, and Method.

  • Data : A warehouse of ppDEGs from different species. We collected totally 6 gene sets, from 4 species and 4 tissues or organs. All data can be easily obtained by using ppDEGs_DB.

  • Evaluation : We bulit the function eval_ppEffect to help your systematically evaluate the ppEffect in your datasets conveniently. And a ppEffect Report (.html) will be produced automatically.

  • Method : In this module, we provided several Method to help your correct the ppEffects in your datasets.

Module one: Data

The overview of ppDEGs_DB, you can choose dataset by ID.

# Check the overview of ppDEGs_DB, and confirm which ppDEGs dataset your will choose.
Overview(ppDEGs_DB)

# *** Protoplasting induced gene sets (ppDEGs) from bulk RNA-seq analysis 
# between protoplasted cells and un-protoplasted sample.*** 
# 
# ID  |      Dataset name 
# 01  |      At_root_Denyer_2019 
# 02  |      At_leaf_Kim_2021 
# 03  |      Zm_ear_Xu_2021 
# 04  |      Zm_leaf_Bezrutczyk_2021 
# 05  |      Os_root_Liu_2021 
# 06  |      Nt_BY2_Yao_2023 

Here, you can use Details to check the specific informations for each dataset. Input the ID number as a parameter. The basic information includes dataset’s ID, name, species, sample type, experiment treatment,reference, and ppDEGs. All gene set have been pre-processed, using the threshold as log2FC>=2 and p_adj<=0.05

# Show details about the specific dataset, your can choose it by ID
Details(ppDEGs_DB, ID ="01")

# *************************
# 
# ID: 01 
# name: At_root_Denyer_2019 
# species: Arabidopsis thaliana 
# sample: root 
# treatment: 6-day-old; 120mins 
# ref: Denyer et al., 2019 
# ppDEGs: total 3435 genes in this dataset, E.g AT5G62520 AT2G24850 ; 
#         among then, 917 genes were up-regulated, and 2518 were down-regulated. 
# 
# *************************

ppDEGs can be extracted by the function ppDEGsExtra.

# select and extra  a data.frame about ppDEGs, by ID displayed .
# Or you can provide your own ppDEGs.
UP_ppDEGs <- ppDEGsExtra(ppDEGs_DB, ID ="01", type = "Up_ppDEGs") # up-regulated genes
Down_ppDEGs <- ppDEGsExtra(ppDEGs_DB, ID ="01", type = "Down_ppDEGs") # down-related genes
# Extract genes directly:
UP_ppDEGs <- UP_ppDEGs$Gene
Down_ppDEGs <- Down_ppDEGs$Gene


## Or, you can filter the genes by the parameter: log2FoldChange, baseMean, p_adj, or pvalue.
### baseMean: mean of normalized counts for all samples.

## For example:
ppDEGs <- UP_ppDEGs %>% filter(log2FoldChange >=3, pvalue<0.05)

Module two: Evaluation

This step may cost several minutes, and a ppEffect Report will be produced.

data_obj <- eval_ppEffect(
  object = data_obj,
  Up_ppDEGs = Up_ppDEGs,
  Down_ppDEGs = Down_ppDEGs,
  marker_genes = Markers,
  report_dir = "/home/your absolute output path/ppEffect_eval_report-example.html"
)

# Result of pp.Score were stored here.
data_obj$ppDEGs

An example report can be obtained here: ppEffect_eval_report-example (Maybe github can’t show files that are this big right now. Please download the file to open it.)

Module three: Method

Here we provided different methods to help your correct the ppEffect. You can choose the optimal one depending on your data specificity.

## return a new seurat_object after ppEffect correction.
### method 1 "regress.out.ppDEGs"
object_new1 <- corr_ppEffect(object = object,ppDEGs = ppDEGs, method = "regress.out.ppDEGs")
# Completed successfully!
# We have corrected ppEffect by the method "regress.out.ppDEGs".
# The dimensional reduction should be re-runned following up
### method 2 "remove.ppCells"
object_new2 <- corr_ppEffect(object,ppDEGs = ppDEGs, method = "remove.ppCells")

# [1] "Completed successfully!"
# We have corrected ppEffect by the method 'remove.ppCells'.
# 1177 cell has been removed
# Please re-run the pre-process pipeline following up.
### method 3 "remove.ppDEGs"
object_new3 <- corr_ppEffect(object,ppDEGs = ppDEGs, method = "remove.ppDEGs")

# [1] "Completed successfully!"
# We have corrected ppEffect by the method 'remove.ppDEGs'.
# Please re-run the pre-process pipeline following up.