Skip to content

Commit

Permalink
Merge pull request #16 from bodegalab/dev
Browse files Browse the repository at this point in the history
Merge 1.1.0-beta.2
  • Loading branch information
bepoli authored Feb 15, 2024
2 parents 7849b5a + 995d00a commit 9224900
Show file tree
Hide file tree
Showing 12 changed files with 601 additions and 559 deletions.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2022 Benedetto Polimeni
Copyright (c) 2022-2024 Benedetto Polimeni

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
35 changes: 22 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
# IRescue - <ins>I</ins>nterspersed <ins>Re</ins>peats <ins>s</ins>ingle-<ins>c</ins>ell q<ins>u</ins>antifi<ins>e</ins>r

<img align="right" height="160" src="docs/logo.png">
IRescue is a software for quantifying the expression of transposable elements (TEs) subfamilies in single cell RNA sequencing (scRNA-seq) data. The core feature of IRescue is to consider all multiple alignments (i.e. non-primary alignments) of reads/UMIs mapping on multiple TEs in a BAM file, to accurately infer the TE subfamily of origin. IRescue implements a UMI error-correction, deduplication and quantification strategy that includes such alignment events. IRescue's output is compatible with most scRNA-seq analysis toolkits, such as Seurat or Scanpy.
IRescue quantifies the expression fo transposable elements (TEs) subfamilies in single cell RNA sequencing (scRNA-seq) data that performs UMI-deduplication with sequencing errors correction and probabilistic assignment of multi-mapping reads by expectation-maximization (EM) procedure. TE counts are written on a sparse matrix (similar to Cell Ranger's output) compatible with Seurat, Scanpy and other toolkits.

## Content

Expand All @@ -34,7 +34,7 @@ conda create -n irescue -c conda-forge -c bioconda irescue

### <a name="pip"></a>Using pip

If for any reason it's not possible or desiderable to use conda, it can be installed with pip and the following requirements must be installed manually: `python>=3.7`, `samtools>=1.12`, `bedtools>=2.30.0`, and fairly recent versions of the GNU utilities are required, specifically `coreutils>=8.30` and `gzip>=1.10` (older versions are untested).
If for any reason it's not possible or desiderable to use conda, it can be installed with pip and the following requirements must be installed manually: `python>=3.7`, `samtools>=1.12`, `bedtools>=2.30.0`, and fairly recent versions of the GNU utilities are required, specifically `gawk>=5.0.1`, `coreutils>=8.30` and `gzip>=1.10` (older versions are untested).

```bash
pip install irescue
Expand All @@ -57,29 +57,36 @@ singularity exec https://depot.galaxyproject.org/singularity/irescue:$TAG irescu

## <a name="usage"></a>Usage

### <a name="quick_start"></a>Quick start
```sh
irescue --help
```

The only required input is a BAM file annotated with cell barcode and UMI sequences as tags (by default, `CB` tag for cell barcode and `UR` tag for UMI; override with `--cb-tag` and `--umi-tag`).

The only required input is a BAM file annotated with cell barcode and UMI sequences as tags (by default, `CB` tag for cell barcode and `UR` tag for UMI; override with `--CBtag` and `--UMItag`). You can obtain it by aligning your reads using [STARsolo](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md).
You can obtain it by aligning your reads using [STARsolo](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md). It is advised to keep secondary alignments in BAM file, that will be used in the EM procedure to assign multi-mapping reads (e.g. `--outFilterMultimapNmax 100 --winAnchorMultimapNmax 100` or more), and remember to output all the needed SAM attributes (e.g. `--outSAMattributes NH HI AS nM NM MD jM jI XS MC ch cN CR CY UR UY GX GN CB UB sM sS sQ`).

RepeatMasker annotation will be automatically downloaded for the chosen genome assembly (e.g. `-g hg38`), or provide your own annotation in bed format (e.g. `-r TE.bed`).

```bash
```sh
irescue -b genome_alignments.bam -g hg38
```

If you already obtained gene-level counts (using STARsolo, Cell Ranger, Alevin, Kallisto or other tools), it is advised to provide the whitelisted cell barcodes list as a text file, e.g.: `-w barcodes.tsv`. This will significantly improve performance.
If you already obtained gene-level counts (using STARsolo, Cell Ranger, Alevin, Kallisto or other tools), it is advised to provide the whitelisted cell barcodes list as a text file (`-w barcodes.tsv`). This will significantly improve performance by processing viable cells only.

IRescue performs best using at least 4 threads, e.g.: `-p 8`.
For optimal run time, use at least, e.g.: `-p 8`.

### <a name="output_files"></a>Output files

IRescue generates TE counts in a sparse matrix format, readable by [Seurat](https://github.com/satijalab/seurat) or [Scanpy](https://github.com/scverse/scanpy):
IRescue generates TE counts in a sparse matrix readable by [Seurat](https://github.com/satijalab/seurat) or [Scanpy](https://github.com/scverse/scanpy) into a `counts/` subdirectory. Optional outputs include a description of equivalence classes with UMI deduplication stats `ec_dump.tsv.gz` and a subdirectory of temporary files `tmp/` for debugging purpose. A detailed logging is enabled by `--verbose` and written to standard error.

```
IRescue_out/
├── barcodes.tsv.gz
├── features.tsv.gz
└── matrix.mtx.gz
irescue_out/
├── counts/
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── ec_dump.tsv.gz
└── tmp/
```

### <a name="seurat"></a>Load IRescue data with Seurat
Expand Down Expand Up @@ -108,8 +115,10 @@ Active assay: RNA (31078 features, 0 variable features)
1 other assay present: TE
```

From here, TE expression can be normalized. Reductions can be made using TE or gene expression.

## <a name="cite"></a>Cite

Polimeni B, Marasca F, Ranzani V, Bodega B.
IRescue: single cell uncertainty-aware quantification of transposable elements expression.
*IRescue: uncertainty-aware quantification of transposable elements expression at single cell level.*
bioRxiv 2022.09.16.508229; doi: https://doi.org/10.1101/2022.09.16.508229
2 changes: 1 addition & 1 deletion irescue/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '1.1.0-beta.1'
__version__ = '1.1.0-beta.2'
Loading

0 comments on commit 9224900

Please sign in to comment.