-
Notifications
You must be signed in to change notification settings - Fork 0
Usage (Binaries)
This page is the way to use the HANA binaries (for constructing bash or 3rd party pipeline).
This is a demonstration for assembling the Nipponbare genome data. Assume we have already got the following two files:
-
contigs.fasta
which stores all the contigs -
sample.bwa_mem.bam
which is the mapping results frombwa
To assemble the data and generate the FASTA of the chromosome, please follow the steps below.
hana_extract -f contigs_sim.fasta -m sample.bwa_mem.bam -o bare -e GATC -t 16 -q 40
This should generate bare.hmr_nodes
and bare.hmr_reads
.
hana_draft -n bare.hmr_nodes -r bare.hmr_reads -o bare
This should generate bare.hmr_edges
and bare.hmr_nodes_invalid
.
hana_partition -n bare.hmr_nodes -e bare.hmr_edges -g 12 -o bare
This should generate bare_1g12.hmr_group
to bare_12g12.hmr_group
.
hana_ordering -n bare.hmr_nodes -e bare.hmr_edges -g bare_1g12.hmr_group -o bare_1g12.hmr_seq -t 16
hana_ordering -n bare.hmr_nodes -e bare.hmr_edges -g bare_2g12.hmr_group -o bare_2g12.hmr_seq -t 16
hana_ordering -n bare.hmr_nodes -e bare.hmr_edges -g bare_3g12.hmr_group -o bare_3g12.hmr_seq -t 16
hana_ordering -n bare.hmr_nodes -e bare.hmr_edges -g bare_4g12.hmr_group -o bare_4g12.hmr_seq -t 16
hana_ordering -n bare.hmr_nodes -e bare.hmr_edges -g bare_5g12.hmr_group -o bare_5g12.hmr_seq -t 16
hana_ordering -n bare.hmr_nodes -e bare.hmr_edges -g bare_6g12.hmr_group -o bare_6g12.hmr_seq -t 16
hana_ordering -n bare.hmr_nodes -e bare.hmr_edges -g bare_7g12.hmr_group -o bare_7g12.hmr_seq -t 16
hana_ordering -n bare.hmr_nodes -e bare.hmr_edges -g bare_8g12.hmr_group -o bare_8g12.hmr_seq -t 16
hana_ordering -n bare.hmr_nodes -e bare.hmr_edges -g bare_9g12.hmr_group -o bare_9g12.hmr_seq -t 16
hana_ordering -n bare.hmr_nodes -e bare.hmr_edges -g bare_10g12.hmr_group -o bare_10g12.hmr_seq -t 16
hana_ordering -n bare.hmr_nodes -e bare.hmr_edges -g bare_11g12.hmr_group -o bare_11g12.hmr_seq -t 16
hana_ordering -n bare.hmr_nodes -e bare.hmr_edges -g bare_12g12.hmr_group -o bare_12g12.hmr_seq -t 16
These should generate bare_1g12.hmr_seq
to bare_12g12.hmr_seq
.
hana_orientation -n bare.hmr_nodes -r bare.hmr_reads -s bare_1g12.hmr_group bare_2g12.hmr_group bare_3g12.hmr_group bare_4g12.hmr_group bare_5g12.hmr_group bare_6g12.hmr_group bare_7g12.hmr_group bare_8g12.hmr_group bare_9g12.hmr_group bare_10g12.hmr_group bare_11g12.hmr_group bare_12g12.hmr_group
This should generate bare_1g12.hmr_chromo
to bare_12g12.hmr_chromo
.
hana_build -f contigs_sim.fasta -c bare_1g12.hmr_chromo bare_2g12.hmr_chromo bare_3g12.hmr_chromo bare_4g12.hmr_chromo bare_5g12.hmr_chromo bare_6g12.hmr_chromo bare_7g12.hmr_chromo bare_8g12.hmr_chromo bare_9g12.hmr_chromo bare_10g12.hmr_chromo bare_11g12.hmr_chromo bare_12g12.hmr_chromo -o chromosome.fasta
This should generate chromosome.fasta
.
Extract the useful information from FASTA and BAM files. We expected it to generate the following file:
-
.hmr_nodes
describes the contig length and restriction sites count. -
.hmr_reads
describes the necessary reads information (contig id, position, paired contig id, paired position). -
.hmr_allele_table
describes the allele table contigs indices if allele table is provided.
usage: hana_extract [-h] -f FASTA -a ALLELE -m MAPPING 1, MAPPING 2... -o OUTPUT -e ENZYME -t THREADS -r ENZYME_RANGE -q MAPQ --search-buffer SEARCH_BUF_SIZE --mapping-buffer MAP_BUF_SIZE --no-flag --no-range
-h, --help Show this help message and exit
-f FASTA, --fasta FASTA
Contig FASTA file (.fasta/.fasta.gz)
-a ALLELE, --allele ALLELE
Contig allele table file (.ctg.table)
-m MAPPING 1, MAPPING 2..., --mapping MAPPING 1, MAPPING 2...
Hi-C reads mapping files (.bam/.pairs)
-o OUTPUT, --output OUTPUT
Output file prefix
-e ENZYME, --enzyme ENZYME
Enzyme to find in the sequence
-t THREADS, --threads THREADS
Number of threads (default: 1)
-r ENZYME_RANGE, --range ENZYME_RANGE
Hi-C pairs valid distance around enzyme (default: 1000)
-q MAPQ, --mapq MAPQ BAM minimum mapping quality (default: 40)
--search-buffer SEARCH_BUF_SIZE
FASTA searching buffer size per thread (default: 32)
--mapping-buffer MAP_BUF_SIZE
Mapping parse buffer size (unit: K, default: 512)
--no-flag Skip the flag checking
--no-range Skip the enzyme range checking
Based on the contig and reads information, calculate the edge counts and weights for partition. We expected it to generate the following file:
-
.hmr_edges
describes the relationships between contigs in the.hmr_nodes
file. -
.hmr_nodes_invalid
describes the contigs that failed to meet the constraints.
usage: hana_draft [-h] -n NODES -r READS -a ALLELE_TABLE -o OUTPUT -b BUFFER_SIZE --min-links MIN_LINKS --min-re MIN_RE --max-link-density MAX_DENSITY
-h, --help Show this help message and exit
-n NODES, --nodes NODES
HMR contig node file (.hmr_nodes)
-r READS, --reads READS
HMR paired-reads file (.hmr_reads)
-a ALLELE_TABLE, --allele-table ALLELE_TABLE
Allele contig table (.hmr_allele_table)
-o OUTPUT, --output OUTPUT
Output graph prefix
-b BUFFER_SIZE, --buffer-size BUFFER_SIZE
HMR paired-reads buffer size (unit: K, default: 512)
--min-links MIN_LINKS
Minimum number of links for contig pair (default: 3)
--min-re MIN_RE Minimum number of RE sites in a contig (default: 10)
--max-link-density MAX_DENSITY
Maximum allowed link density (default: 2)
Cluster contigs based on the edge information generated in the 'draft' step. The algorithm is a Louvain-like community detection algorithm. We expected it to generate the following file:
-
.hmr_group
describes the contigs that should be from the same chromosome.
usage: hana_partition [-h] -n NODES -e EDGES -a ALLELE_TABLE -g GROUP -o OUTPUT -b BUFFER_SIZE --non-informative-ratio NON_INFO_RATIO
-h, --help Show this help message and exit
-n NODES, --nodes NODES
HMR contig node file (.hmr_nodes)
-e EDGES, --edges EDGES
HMR edges file (.hmr_edges)
-a ALLELE_TABLE, --allele ALLELE_TABLE
HMR allele table file (.hmr_allele_table)
-g GROUP, --group GROUP
Number of groups to be partitioned
-o OUTPUT, --output OUTPUT
Output partition file (.hmr_partition)
-b BUFFER_SIZE, --buffer-size BUFFER_SIZE
HMR edge buffer size (unit: K, default: 512)
--non-informative-ratio NON_INFO_RATIO
Skipped contigs recover cutoff (default: 3)
This is the first step of the AllHiC optimize. It reads one group generated by partition, and uses evolutionary algorithm to find out the order. We expected it to generate the following file:
-
.hmr_seq
describes the sequence of the contigs in the same group.
usage: hana_ordering [-h] -n NODES -e EDGE -g GROUP -o OUTPUT -t THREAS -b BUFFER_SIZE -s SEED --mutapb MUTATION --ngen NUM_OF_GENERATION --max-gen MAX_GENERATION --npop NUM_OF_POP
-h, --help Show this help message and exit
-n NODES, --nodes NODES
HMR contig node file (.hmr_contig)
-e EDGE, --edge EDGE HMR edge file (.hmr_edge)
-g GROUP, --group GROUP
HMR contig group file (.hmr_group)
-o OUTPUT, --output OUTPUT
Output ordered contig sequence file (.hmr_seq)
-t THREAS, --threads THREAS
Number of threads (default: 1)
-b BUFFER_SIZE, --buffer-size BUFFER_SIZE
HMR edge buffer size (unit: K, default: 512)
-s SEED, --seed SEED Fixed random seed, 0 for no special seed (default: 0)
--mutapb MUTATION Mutation probability (default: 0.2)
--ngen NUM_OF_GENERATION
Number of generations for convergence (default: 5000)
--max-gen MAX_GENERATION
Limits of trial generations (default: 1000000)
--npop NUM_OF_POP Candidate sequences size (default: 100)
This is the first step of the AllHiC optimize. It reads all the sorted group sequences and use gradient of each contigs to decide the orientation. We expect it to generate the following file:
-
.hmr_chromo
describes the sequence of the contigs and their directions.
usage: hana_orientation [-h] -n NODES -r READS -s SEQ 1, SEQ 2... -b BUFFER_SIZE
-h, --help Show this help message and exit
-n NODES, --nodes NODES
HMR contig node file (.hmr_contig)
-r READS, --reads READS
HMR paired-reads file (.hmr_reads)
-s SEQ 1, SEQ 2..., --seq SEQ 1, SEQ 2...
HMR sorted contig sequence file (.hmr_seq)
-b BUFFER_SIZE, --buffer-size BUFFER_SIZE
HMR paired-reads buffer size (unit: K, default: 512)
This step converts the chromosome results into FASTA file. We expect it to generate the following file:
-
.fasta
which is the assembled sequences.
usage: hana_build [-h] -f FASTA -c CHROMO 1, CHROMO 2... -o OUTPUT
optional arguments:
-h, --help Show this help message and exit
-f FASTA, --fasta FASTA
Contig FASTA file (.fasta/.fasta.gz)
-c CHROMO 1, CHROMO 2..., --chromosome CHROMO 1, CHROMO 2...
Hi-C reads mapping files (.bam/.hmr_mapping)
-o OUTPUT, --output OUTPUT
Output build fasta file (.fasta)