Skip to content

qiukunlong/seeksv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

seeksv

an accurate tool for structural variation and virus integration detection

Usage-Single sample SV

Step 1: Get soft-clipped reads from original bam file.

Inputs

  • input.bam
    • The bam file should be sorted by alignment coordinate. It is recommended to mark duplicate reads by picard. If the bam is marked duplicates by picard, duplicate reads in the bam would be ignored by seeksv.

Command

seeksv getclip -o /path/to/outputs/prefix input.bam

Outputs

  • prefix.clip.fq.gz
    • Fastq file of clipped sequences.
  • prefix.clip.gz
  • prefix.unmapped_1.fq.gz
    • This file is for further analysis
  • prefix.unmapped_2.fq.gz
    • This file is for further analysis

Step 2: Align the clipped sequences to the reference genome

Use bwa to align the clipped sequences get from the original bam file to the reference genome. Use samtools to convert the alignment results to bam. The reference genome should be same with the one the original bam used.

Inputs

  • reference.fa
  • prefix.clip.fq.gz
    • This file is generated by Step 1

Command

  bwa mem  /path/to/reference.fa /path/to/prefix.clip.fq.gz | \
  samtools view  -Sb -o /path/to/outputs/prefix.clip.bam -

Outputs

  • prefix.clip.bam

Step 3: Get final SVs.

Inputs

  • prefix.clip.bam
    • Generated by Step 2
  • input.bam
    • Original bam file
  • prefix.clip.gz
    • Generated by Step 1

Command

seeksv getsv /path/to/prefix.clip.bam \
             /path/to/input.bam \
             /path/to/prefix.clip.gz \
             /path/to/outputs/output.sv.txt \
             /path/to/outputs/output.unmapped.clip.fq.gz

Outputs

  • output.sv.txt
    • Final SV output
  • output.unmapped.clip.fq.gz
    • This file is for further analysis

Call Somatic SV

Inputs

  • normal.bam
    • Original bam file of the normal sample (control sample)
  • normal.clip.gz
    • soft-clipped reads file of the normal sample, it is generated by seeksv getclip
  • tumor.sv.txt
    • SV results of the tumor sample, it is generated by seeksv getsv

Command

seeksv somatic /path/to/normal.bam \
               /path/to/normal.clip.gz \
               /path/to/tumor.sv.txt \
               /path/to/outputs/tumor.somatic.sv.txt

Outputs

  • tumor.somatic.sv.txt
    • Final Somatic SVs

Call Virus Integration

seeksv can be used to detect virus integration, such as HBV integration, EBV integraion, HPV integration. First , merge human reference genome and virus reference genome together to build a hybid reference genome. You can use linux command cat to achieve the goal. Then use bwa to align the reads to the hybid reference genome. Use picard to sort and mark duplicate the bam file. After that ,you get the orignal bam file to call virus integration. Finally, use seeksv to call virus integration. The commands are same as call single sampe SV, please refer to "Usage-Single sample SV". Note that the reference genome you use is the hybid reference genome.