Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QUESTIONS in code of SV genotyping #9

Open
NMUzhoujun opened this issue Mar 21, 2022 · 2 comments
Open

QUESTIONS in code of SV genotyping #9

NMUzhoujun opened this issue Mar 21, 2022 · 2 comments

Comments

@NMUzhoujun
Copy link

Hi,

When I was converting the cram to the fastq, I found the code in your WDL workflow:

seq 0 ~{in_nb_chunks} | head -n ~{in_max_chunks} | parallel -j ~{in_cram_convert_cores} "samtools collate -k {} -K ~{in_nb_chunks} --reference ~{in_ref_file} -Ouf ~{in_cram_file} {} | samtools fastq -1 reads.{}.R1.fastq.gz -2 reads.{}.R2.fastq.gz -0 reads.{}.o.fq.gz -s reads.{}.s.fq.gz -c 1 -N -"

However, it seems that samtools collate doesn't have the parameter "k" or "K". Could you please make an explanation for this and check which parameter was used in this step

Thanks!

@glennhickey
Copy link

This line seems to come from vg_mapgaffe_call_sv_cram.wdl:

 seq 0 ~{in_nb_chunks} | head -n ~{in_max_chunks} | parallel -j ~{in_cram_convert_cores} "samtools collate -k {} -K ~{in_nb_chunks} --reference ~{in_ref_file} -Ouf ~{in_cram_file} {} | samtools fastq -1 reads.{}.R1.fastq.gz -2 reads.{}.R2.fastq.gz -0 reads.{}.o.fq.gz -s reads.{}.s.fq.gz -c 1 -N -"
    >>>
    output {
        Array[File] output_read_chunks_1 = glob("reads.*.R1.fastq.gz")
        Array[File] output_read_chunks_2 = glob("reads.*.R2.fastq.gz")
    }
    runtime {
        cpu: in_cram_convert_cores
        memory: "50 GB"
        disks: "local-disk " + in_cram_convert_disk + " SSD"
        docker: "jmonlong/samtools-jm:release-1.19jm0.2.2"
        preemptible: in_preemptible
    }

Which specifies this image: docker: "jmonlong/samtools-jm:release-1.19jm0.2.2". And the collate in there has -k

docker run jmonlong/samtools-jm:release-1.19jm0.2.2 samtools collate
Usage: samtools collate [-Ou] [-o <name>] [-n nFiles] [-l cLevel] <in.bam> [<prefix>]

Options:
      -O       output to stdout
      -o       output file name (use prefix if not set)
      -u       uncompressed BAM output
      -f       fast (only primary alignments)
      -r       working reads stored (with -f) [10000]
      -l INT   compression level [1]
      -n INT   number of temporary files [64]
      -k INT   the read chunk to output during CRAM conversion. In [0,N-1]. Used if N>0.
      -K INT   the number of read chunks to consider during CRAM conversion. 0 (default) means no chunking.
      --input-fmt-option OPT[=VAL]
               Specify a single input file format option in the form
               of OPTION or OPTION=VALUE
      --output-fmt FORMAT[,OPT[=VAL]]...
               Specify output format (SAM, BAM, CRAM)
      --output-fmt-option OPT[=VAL]
               Specify a single output file format option in the form
               of OPTION or OPTION=VALUE
      --reference FILE
               Reference sequence FASTA FILE [null]
  -@, --threads INT
               Number of additional threads to use [0]
  <prefix> is required unless the -o or -O options are used.

This is a customized samtools: https://github.com/jmonlong/samtools-jm

@NMUzhoujun
Copy link
Author

Thank you. It has been solved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants