Skip to content

Latest commit

 

History

History
32 lines (25 loc) · 1.87 KB

README.md

File metadata and controls

32 lines (25 loc) · 1.87 KB

EM-MUL

  EM-MUL is an effective tools which resolves ambiguous bisulfite-treated reads, making use of information we have.

To run this program, we needs to have samtools, perl, bedtools,and g++ first.
The inputs of this tool consists four parts.

  • -r is the reference genome to be aligned.
  • -u is the unique reads.
  • -m is multireads,which align to multiple locations of the reference genome ambiguously.
  • -o is the unique reads that overlapped with multireads.

Among them, the unique reads and the multireads are obtained by aligning the original BS reads to bismark. Overlappedfile can be obtained through the unique reads and multireads, the processing flow refers to BAM_ABS, the commad is:

  • Convert unique_reads.sam to unique_reads.bam.

    • samtools view -bS all_unique_reads.sam > all_unique_reads.bam
  • Run Covert_to_bed_unite.pl to covert ambiguous read file to bed formate with --ambiguous option.

    • perl Convert_to_bed_unite.pl --ambiguous ambiguous_file.sam
  • Run samtools to get overlapped unique reads in sam format.

    • samtools view -L ambiguous_file.bed all_unique_reads.bam -q 20 > unique_reads.sam
  • To get rid of duplicates from the unique reads.

    • sort -n -r -k3,3 -k4,4 -k5,5 unique_reads.sam|uniq -u > unique_reads_nodup.sam
  • Convert unique read file to bed format with --unique option.

    • perl Convert_to_bed_unite.pl --unique unique_reads_nodup.sam
  • Get overlapped unique reads by using Bedtools and run the following command in the bedtools folder to get the overlappedfile we use.

    • ./intersectBed -a ambiguous_file.bed -b unique_reads_nodup.bed -wb -wa > overlapfile.txt
  • Score the multireads using EM-MUL.

    • python3.6 new_score_all_and_coverage_human -r hg38 -u unique_reads_nodup.sam -m multireads.sam -o overlapfile.txt