Single-Cell ATAC-seq analysis via Latent feature Extraction

Installation

SCALE neural network is implemented in Pytorch framework.
Running SCALE on CUDA is recommended if available.

install from GitHub

git clone git://github.com/jsxlei/SCALE.git
cd SCALE
python setup.py install

Installation only requires a few minutes.

Quick Start

Input

either a count matrix file:
- row is peak and column is barcode, in txt / tsv (sep="\t") or csv (sep=",") format
or a folder contains three files:
- count file: count in mtx format, filename contains key word "count" / "matrix"
- peak file: 1-column of peaks chr_start_end, filename contains key word "peak"
- barcode file: 1-column of barcodes, filename contains key word "barcode"

Run

with known cluster number k:

SCALE.py -d [input] -k [k]

with estimated cluster number k by SCALE if k is unknown:

SCALE.py -d [input]

Output

Output will be saved in the output folder including:

model.pt: saved model to reproduce results cooperated with option --pretrain
feature.txt: latent feature representations of each cell used for clustering or visualization
cluster_assignments.txt: clustering assignments of each cell
tsne.txt: 2d t-SNE embeddings of each cell
tsne.pdf: visualization of 2d t-SNE embeddings of each cell

Imputation

Get binary imputed data in folder binary_imputed with option --binary (recommended for saving storage)

SCALE.py -d [input] --binary

or get numerical imputed data in file imputed_data.txt with option --impute

SCALE.py -d [input] --impute

Useful options

save results in a specific folder: [-o] or [--outdir]
filter rare peaks if the peaks quality if not good or too many, default is 0.01: [-x]
filter low quality cells by valid peaks number, default 100: [--min_peaks]
modify the initial learning rate, default is 0.002: [--lr]
change iterations by watching the convergence of loss, default is 30000: [-i] or [--max_iter]
change random seed for parameter initialization, default is 18: [--seed]
binarize the imputation values: [--binary]
run with scRNA-seq dataset: [--log_transform]

Note

If come across the nan loss,

try another random seed
filter peaks with harsher threshold, e.g. -x 0.04 or 0.06
filter low quality cells, e.g. --min_peaks 400 or 600
change the initial learning rate, e.g. --lr 0.0002

Help

Look for more usage of SCALE

SCALE.py --help

Use functions in SCALE packages.

import scale
from scale import *
from scale.plot import *
from scale.utils import *

Running time

Data availability

Download all the provided datasets [Download]

Tutorial

Tutorial Forebrain Run SCALE on dense matrix Forebrain dataset (k=8, 2088 cells)

Tutorial Mouse Atlas Run SCALE on sparse matrix Mouse Atlas dataset (k=30, ~80,000 cells)

Reference

Lei Xiong, Kui Xu, Kang Tian, Yanqiu Shao, Lei Tang, Ge Gao, Michael Zhang, Tao Jiang & Qiangfeng Cliff Zhang. SCALE method for single-cell ATAC-seq analysis via latent feature extraction. Nature Communications, (2019). https://www.nature.com/articles/s41467-019-12630-7

Name		Name	Last commit message	Last commit date
Latest commit History 293 Commits
scale		scale
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SCALE.py		SCALE.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Single-Cell ATAC-seq analysis via Latent feature Extraction

Installation

install from GitHub

Quick Start

Input

Run

Output

Imputation

Useful options

Note

Help

Running time

Data availability

Tutorial

Reference

About

Releases

Packages

Languages

License

woshiyangsi/SCALE

Folders and files

Latest commit

History

Repository files navigation

Single-Cell ATAC-seq analysis via Latent feature Extraction

Installation

install from GitHub

Quick Start

Input

Run

Output

Imputation

Useful options

Note

Help

Running time

Data availability

Tutorial

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages