Skip to content

Usage (Python Pipeline)

Saki Tojo edited this page May 12, 2023 · 12 revisions

HANA provides a Python library to call the HANA binaries, which allows to run the HANA pipeline in a much more flexible way.

Install

You have to install the hana_scaffold package from PyPI or source code. To check whether you have installed the package, just launch the Python interpreter and run

import hana_scaffold

If the package is not installed, please check Python Package section of Compile & Install.

Quick Start Guide

Please make sure HANA standard binaries can be called from terminal (i.e. HANA standard modules can be found from PATH of system environment). If you cannot installed HANA to such a place, you can add the path that contains all the HANA standard module binaries by

hana_scaffold.add_search_path(r'/path/to/HANA/standard/module/binaries/')

To scaffold a 12 chromosomes diploid genome contigs from Hi-C libraries using MboI restriction enzyme:

from hana_scaffold.pipeline import DiploidPipeline
pipeline = DiploidPipeline(threads=16)
pipeline(contig_path='contigs.fasta',
         lib_files=[('forward.fastq.gz', 'reverse.fastq.gz')],
         enzyme='MboI', output_dir='any_output_directory',
         groups=12)

To specify enzyme using base pairs directly (For example "GATC"), just modify the enzyme parameter:

pipeline(contig_path='contigs.fasta',
         lib_files=[('forward.fastq.gz', 'reverse.fastq.gz')],
         enzyme='GATC', output_dir='any_output_directory',
         groups=12)

To scaffold from a mapped .bam file, please use mapping_files instead of lib_files:

pipeline(contig_path='contigs.fasta',
         mapping_files=['sample.bwa_mem.bam'],
         enzyme='AGTCCT', output_dir='any_output_directory',
         groups=12)

To change the parameters of the standard pipelines (e.g. change the random seed using for ordering to 870806):

pipeline = DiploidPipeline(threads=16, ea_seed=870806)

HANA Module wrapper

TBD

HANA Pipeline

TBD

Clone this wiki locally