-
Notifications
You must be signed in to change notification settings - Fork 0
Usage (Python Pipeline)
HANA provides a Python library to call the HANA binaries, which allows to run the HANA pipeline in a much more flexible way.
You have to install the hana_scaffold
package from PyPI or source code. To check whether you have installed the package, just launch the Python interpreter and run
import hana_scaffold
If the package is not installed, please check Python Package section of Compile & Install.
Please make sure HANA standard binaries can be called from terminal (i.e. HANA standard modules can be found from PATH of system environment). If you cannot installed HANA to such a place, you can add the path that contains all the HANA standard module binaries by
hana_scaffold.add_search_path(r'/path/to/HANA/standard/module/binaries/')
To scaffold a 12 chromosomes diploid genome contigs from Hi-C libraries using MboI restriction enzyme:
from hana_scaffold.pipeline import DiploidPipeline
pipeline = DiploidPipeline(threads=16)
pipeline(contig_path='contigs.fasta',
lib_files=[('forward.fastq.gz', 'reverse.fastq.gz')],
enzyme='MboI', output_dir='any_output_directory',
groups=12)
To specify enzyme using base pairs directly (For example "GATC"), just modify the enzyme
parameter:
pipeline(contig_path='contigs.fasta',
lib_files=[('forward.fastq.gz', 'reverse.fastq.gz')],
enzyme='GATC', output_dir='any_output_directory',
groups=12)
To scaffold from a mapped .bam
file, please use mapping_files
instead of lib_files
:
pipeline(contig_path='contigs.fasta',
mapping_files=['sample.bwa_mem.bam'],
enzyme='AGTCCT', output_dir='any_output_directory',
groups=12)
To change the parameters of the standard pipelines (e.g. change the random seed using for ordering to 870806
):
pipeline = DiploidPipeline(threads=16, ea_seed=870806)
TBD
TBD