Skip to content

Commit

Permalink
0.4.0 release
Browse files Browse the repository at this point in the history
  • Loading branch information
alshai committed Mar 26, 2021
2 parents 1b0ee29 + fb68b69 commit fbda055
Show file tree
Hide file tree
Showing 5 changed files with 184 additions and 90 deletions.
23 changes: 20 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,33 @@ set(PROJECT_URL "https://github.com/alshai/leviosam")
set(PROJECT_DESCRIPTION "leviosam: library for moving alignments between aligned genomes")
set(CMAKE_CXX_STANDARD 11)

set(HTS_LIB_DIR "" CACHE FILEPATH "path to htslib lib dir")
set(HTS_INC_DIR "" CACHE FILEPATH "path to htslib include dir")
set(SDSL_LIB_DIR "" CACHE FILEPATH "path to sdsl-lite lib dir")
set(SDSL_INC_DIR "" CACHE FILEPATH "path to sdsl-lite include dir")

if(EXISTS ${HTS_INC_DIR})
include_directories(${HTS_INC_DIR})
endif()
if(EXISTS ${SDSL_INC_DIR})
include_directories(${SDSL_INC_DIR})
endif()
if(EXISTS ${HTS_LIB_DIR})
link_directories(${HTS_LIB_DIR})
endif()
if(EXISTS ${SDSL_LIB_DIR})
link_directories(${SDSL_LIB_DIR})
endif()


# check for hts divsufsort lsdsl and install if not found
find_library(HTS_LIB hts)
if(NOT HTS_LIB)
message(FATAL_ERROR "htslib not found. Please make sure libhts.a or libhts.so is found in \$LD_LIBRARY_PATH")
message(FATAL_ERROR "htslib library not found. Please specify -D HTS_LIB_DIR and -D HTS_INC_DIR in the CMake command")
endif()

find_library(SDSL_LIB sdsl)
if(NOT SDSL_LIB)
message(FATAL_ERROR "sdsl-lite not found. Please make sure libsdsl.a is found in \$LD_LIBRARY_PATH")
message(FATAL_ERROR "sdsl-lite library not found. Please specify -D SDSL_LIB_DIR and -D SDSL_INC_DIR in the CMake command")
endif()


Expand Down
88 changes: 88 additions & 0 deletions INSTALL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Installation instructions for levioSAM

levioSAM supports a variety of methods for installation:

- Conda (*highly recommended*)
- Docker
- CMake
- Make

## Conda

This is our recommended method of installing levioSAM.

To install, simpy run this command:

```
conda install -c conda-forge -c bioconda leviosam
```

## Docker

You can obtain a Docker image of the latest version from Docker hub:

```
docker pull alshai/leviosam
```

## CMake and Make


Make sure the following prerequisite libraries are installed on your system.

- [htslib v1.10+](https://github.com/samtools/htslib)
- [sdsl-lite v2.1.1+](https://github.com/simongog/sdsl-lite/)

An easy way to install these dependencies is to use your OS's existing package system:
```
apt-get install libhts-dev libsdsl-dev # Debian/Ubuntu
brew tap brewsci/bio; brew install htslib sdsl-lite # MacOS
```

If using RedHat or Fedora, then you must install sdsl-lite manually. But you can install htslib through yum:
```
yum install htslib
```

Or you can choose to install them manually by following the install instructions on their respective pages.

### CMake

Once the prerequisite packages are installed and the locations of their installations are known, specify their locations
to CMake by running the following commands:

```
mkdir build
cd build
cmake ..
make
```

If you installed the dependencies manually, you might have to modify the `cmake` command to specify their library and
include directory locations like so:
```
cmake -DHTS_LIB_DIR=<htslib lib directory> \
-DHTS_INC_DIR=<htslib include dir> \
-DSDSL_LIB_DIR=<sdsl-lite lib directory> \
-DSDSL_INC_DIR=<sdsl-lite include dir> \
..
```

### Make

Update `LD_LIBRARY_PATH` and `CPLUS_INCLUDE_PATH` paths after installing sdsl-lite and htslib and install with `make`:

```
export LD_LIBRARY_PATH=<path/to/lib>:$LD_LIBRARY_PATH
export C_INCLUDE_PATH=<path/to/include>:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=<path/to/include>:$CPLUS_INCLUDE_PATH
make
```

## Test

We provide an end-to-end test and a set of unit tests for levioSAM.

- The end-to-end test can be run with `python leviosam-test.py`. This test includes running levioSAM on several test files in `testdata`. We also use `picard` to test if the lifted SAM files are valid.

- The unit test can be run `cd build; ctest` if you use cmake to build levioSAM; or `make gtest; cd testdata; ../gtest` if you use make to build levioSAM.
133 changes: 60 additions & 73 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,119 +1,116 @@
# levioSAM lifts alignments to the reference genome
# levioSAM lifts variant-aware alignments to the reference genome

Use a VCF file containing alternative haplotype information to lift SAM/BAM alignments
from that haplotype to the reference sequence
to the reference sequence

## Building

First, clone the repository:
## Supported Features:

```
git clone https://github.com/alshai/levioSAM
```
- Serialization of VCF file for faster parsing

### Dependencies
- Converting SAM/BAM records from haplotype to reference, including:
- CIGAR string
- MD:Z and NM:i tag
- paired read information

- [sdsl-lite](https://github.com/simongog/sdsl-lite)
- [htslib](https://github.com/samtools/htslib)
- Multithreading support

You can install these dependencies using conda:
## Building and installation

```
conda install -c conda-forge sdsl-lite
conda install -c bioconda htslib
```
The easiest way to install levioSAM is by using [Conda](https://docs.conda.io/en/latest/):

Or homebrew (for Macs)
```
brew install htslib
brew install sdsl-lite
conda install -c conda-forge -c bioconda leviosam
```

[CMake](https://cmake.org) or `make` can be used to build `levioSAM`.
We support a variety of other methods for getting levioSAM to work, including Docker, CMake and Make. See
[INSTALL.md](INSTALL.md) for more details.

## Usage (command line)

### Using CMake

Our `CMakeLists.txt` file expectes to use `PkgConfig` to load the libraries; you may need to add the `pkgconfig`
subdirectories of the `sdsl-lite` and `libhts` libraries to the `CMAKE_PREFIX_PATH` environment variable yourself.

We highly suggest you normalize and left-align your VCF file before using it to lift your alignments:
```
mkdir build
cd build
cmake ..
make
bcftools norm <VCF> > <output VCF>
```

### Using make

Update `LD_LIBRARY_PATH` and `CPLUS_INCLUDE_PATH` paths after installing sdsl-lite and htslib and install with `make`:

Run this command to lift your alignments:
```
export LD_LIBRARY_PATH=<path/to/lib>:$LD_LIBRARY_PATH
export C_INCLUDE_PATH=<path/to/include>:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=<path/to/include>:$CPLUS_INCLUDE_PATH
make
$ ./levioSAM lift -t <nthreads> -a <sam> -v <vcf> -s <sample_name> -g <haplotype (0 or 1)> -p <output prefix>
```

`<sample_name>` is the name of the individual in your VCF whose genotype you want to use for the lift-over.
if `-s` is not specified, then it will be assumed that all the alternate alleles in all VCF are present in the variant-aware
reference.

## Usage (command line)

**DISCLAIMER**: normalize and left-align your VCF files before lifting alignments!

- We suggest using `bcftools norm` to do this.

To serialize lift-over information between a reference sequence and given alternative sequence described in a VCF file:
The lifted coordinates will be saved in SAM format to `<output prefix>.sam`. If `-p` is not specified then the SAM file
will be output to `stdout`.

You can speed up the lift-over by serializing the lift-over structure beforehand to avoid re-parsing the entire VCF. This
is useful if you want to use the same VCF to run the command multiple times.
```
$ ./levioSAM serialize -v <vcf> -s <sample_name> -p <output prefix>
```
The levioSAM file will saved to `<output prefix>.lft`.

The levioSAM file saved to `<output prefix>.lft`.

To lift over coordinates given from a SAM/BAM file using a serialized `.lft` file:

You can then pass the serialized structure into lift-over by using the `-l` option instead of the `-v` option:
```
$ ./levioSAM lift -a <sam> -l <lft> -p <output prefix>
```

The lifted coordinates will be saved to `<output prefix>.sam`.

To lift over coordinates without serializing (note: this will be slower):

```
$ ./levioSAM lift -a <sam> -v <vcf> -s <sample_name> -p <output prefix>
```

Some common optional usages:
A list of common options:

- To read from stdin, use `-a -` or exclude `-a`.
- To use multiple threads, use `-t <threads>`.
- To lift `NM:i` and `MD:z` tags, add `-m`.
- To lift `NM:i` and `MD:z` tags, add `-m` (this will be slow if your alignments are not sorted).
- To write output as a BAM file, use `-O bam` (the lifted file will be `<output prefix>.bam`).

## Example (Command line)

The `testdata` directory contains some toy data that you can play around with. It also
contains a `.lft` file that will let you lift alignments from the [major-allele reference](https://doi.org/10.1371/journal.pgen.1002280) to GRCh38.
Though we recommend you build your own `.lft` file from a VCF for analyses.

Here's an example using a small set of Bowtie 2 paired-end alignments against the major-allele reference

## Usage (C++)
```
leviosam -a testdata/bt2-paired_end-major.sam -l testdata/wg-maj.lft > out.sam
```

We provide instructions of how to use levioSAM in common variant-aware reference pipelines (major-allele reference and personalized reference) in the [levioSAM wiki](https://github.com/alshai/levioSAM/wiki/Alignment-with-variant-aware-reference-genomes).

## Example (C++)

Use the `LiftMap` class to generate lift-over information between a reference genome and an alternative genotype.
A `VCF` file containing the `FMT/GT` field for a specified sample must be provided.

```
#include <cstdio>
#include <tuple>
#include <unordered_map>
#include <htslib/vcf.h>
#include <vector>
#include "levioSAM.hpp"
int main() {
const char* fname = "data/dna.vcf";
vcfFile* fp = bcf_open(fname, "r");
bcf_hdr_t* hdr = bcf_hdr_read(fp);
lift::LiftMap l(fp, hdr, "<sample_name>");
std::string sample_name = ""; // GT=1 will be assumed for all variants if left blank
std::string haplotype = "0"; // set this to "0" or "1" if sample_name is not blank
// populate name_map if the contig names in the VCF file do not match that of the genome
// (e.g.. chr1 vs 1).
std::vector<std::pair<std::string,std::string>> name_map;
// populate contig_lengths if the chromsome lengths are not specified in the VCF file.
std::unordered_map<std::string> contig_lengths;
lift::LiftMap l(fp, hdr, sample_name, haplotype, name_map, contig_length);
}
```

To query the equivalent reference position for a given haplotype position:

```
l.alt_to_ref("contig_name", 8)); // give the contig name and the position on the contig
std::vector<std::string> c; // for compatibility with multi-thread
std::mutex m; // for compatibility with multi-thread
l.s2_to_s1("contig_name", 8, &c, &m)); // give the contig name and the position on the contig
```

To serialize lift-over information to a file:
Expand All @@ -132,16 +129,6 @@ To load from a serialized file
in.close();
```

## Publications

## Features Currently Support:

- Serialized lift-over information VCF file w/ FMT/GT field for a specified sample.
- Convert SAM/BAM records from haplotype to reference.
- Multithreading support.
- Recalculate read-pair information.
- Recalculate `MD:z` and `NM:i` tags with `-m` (optionally).

## TODO

- Recalculate MAPQ
- Support chain format
Mun T., Chen N., Langmead B. ["LevioSAM: Fast lift-over of alternate reference alignments"](https://doi.org/10.1101/2021.02.05.429867). [BioRxiv](https://www.biorxiv.org/). 2021
16 changes: 16 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# image: alshai/leviosam
FROM ubuntu:20.04
MAINTAINER tmun1@jhu.edu

ENV TZ=America/New_York
ENV VERSION 0.4.0
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
RUN apt-get update && apt-get install -y curl git build-essential cmake libhts-dev libsdsl-dev
RUN curl -k -L https://github.com/alshai/levioSAM/archive/refs/tags/v${VERSION}.tar.gz -o leviosam-v${VERSION}.tar.gz && \
tar -xzf leviosam-v${VERSION}.tar.gz && \
cd levioSAM-${VERSION} && \
mkdir build && \
cd build && \
cmake .. && \
make && \
make install
14 changes: 0 additions & 14 deletions updates.md

This file was deleted.

0 comments on commit fbda055

Please sign in to comment.