Nextflow pipeline for scaffolding genome assemblies with Hi-C reads
wget http://hicfiles.tc4ga.com.s3.amazonaws.com/public/juicer/juicer_tools_1.11.09_jcuda.0.8.jar
This pipeline requires the following inputs:
- A fasta file containing assembled contigs (
--contigs
) - Hi-C reads in paired-end fastq(.gz) format (
--r1Reads
and--r2Reads
)
It then performs the following tasks:
- Aligns the Hi-C reads to the contigs using chromap
- Scaffolds the contigs using yahs
- Prepares all the files you need to do manual curation in Juicebox
and produces the following outputs:
- Alignments in bam format (
out/chromap/aligned.bam
) - A scaffolded assembly in both agp and fasta formats
(
out/scaffolds/yahs.out_scaffolds_final.[agp,fa]
) .hic
and.assembly
files for loading in Juicebox Assembly Tools (out/juicebox_input/out_JBAT.[hic,assembly]
)
If you're running this on the Lewis cluster, I've already got a profile set up
with everything you need, so just add -profile lewis
to the command and
you're good to go.
This pipeline has the following dependencies:
Nextflow must be in your path. You can get nextflow to make a conda environment
containing chromap and yahs for you with -profile conda
(note one dash!).
JuicerTools is distributed as a jar file, so you need to tell the pipeline
where it is by adding the argument --juicer-tools-jar /path/to/jar
(note two
dashes!). You can also add this stuff to a config file called nextflow.config
in the directory from which you're running it (see nextflow documentation).
nextflow run WarrenLab/hic-scaffolding-nf \
--contigs contigs.fa \
--r1Reads hic_reads_R1.fastq.gz \
--r2Reads hic_reads_R2.fastq.gz
Kutral example
nextflow run hic-scaffolding-nf/main.nf \
--contigs sl_female_ont_purge_r2.fasta \
--r1Reads DDU_AAOSDF_4_1_HFYVJDSX7.UDI488_clean.fastq.gz \
--r2Reads DDU_AAOSDF_4_2_HFYVJDSX7.UDI488_clean.fastq.gz \
-profile uoh
You'll need to add a couple options depending on your configuration (see section above).