-
Notifications
You must be signed in to change notification settings - Fork 0
License
lufuhao/ReadCleaner4Scaffolding
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
ReadCleaner4Scaffolding Description: This pipeline and the included scripts are used to filter useful reads for SSPACE 1. Remove duplicates 2. Remove high-depth reads (needs to define threshold) Please refer to SOPRA to defined your data-specific cut-off threshold 3. Apply Window size for a group of reads to remove low-confidence reads and contigs Requirements: Linux: perl Scripts: picardrmdup Source: https://github.com/lufuhao/FuhaoBin Location: FuhaoBin/FuhaoBash/ bowtie_dedup_mates_separately.sh Source: https://github.com/lufuhao/FuhaoBin Location: FuhaoBin/FuhaoBash/ bam_extract_readname_using_region.pl Source: https://github.com/lufuhao/FuhaoBin Location: FuhaoBin/FuhaoPerl/ bam_filter_by_readname_file.pl Source: https://github.com/lufuhao/FuhaoBin Location: FuhaoBin/FuhaoPerl/ bam_BamKeepBothMates.pl Source: https://github.com/lufuhao/FuhaoBin Location: FuhaoBin/FuhaoPerl/ SizeCollectBin_luf.pl Source: https://github.com/lufuhao/FuhaoBin Location: FuhaoBin/FuhaoPerl/ list2_compar.pl Source: https://github.com/lufuhao/FuhaoBin Location: FuhaoBin/FuhaoPerl/ bam_scaffolding_separately.stats_noBioDBSAM.pl Included bam_exclude_reads_by_windows_size_and_numpairs.pl Included sspace_seqid_conversion.pl Source: https://github.com/lufuhao/FuhaoBin Location: FuhaoBin/FuhaoPerl/ sspace_link_scaffolding_ID_to_contig_IDs.pl Source: https://github.com/lufuhao/FuhaoBin Location: FuhaoBin/FuhaoPerl/ Perl Modules: FuhaoPerl5Lib Source: https://github.com/lufuhao/FuhaoPerl5Lib Programs: SSPACE https://www.baseclear.com/services/bioinformatics/basetools/sspace-standard/ Bowtie http://bowtie-bio.sourceforge.net/index.shtml SAMtools https://github.com/samtools bedtools http://bedtools.readthedocs.io/en/latest/ bamaddrg https://github.com/ekg/bamaddrg seqtk https://github.com/lh3/seqtk Main script: readCleaner4Scaffolding.sh. Options: -h Print this help message -1 Fastq R1 files -2 Fastq R2 files -l Libraries names -x Bowtie index -r Reference fasta -e Maximum depth threshold; -m Maximum insert size between mates for bowtie -X, default: 300 -p Output file prefix, without path -d Delete temporary files -t Number of threads, default: 1 -sspace path to SSPACE First Part: Get mapped reads, deduplication and clean $ readCleaner4Scaffolding.sh -1 lib1.R1.fastq,lib2.R1.fastq -2 lib1.R2.fastq,lib2.R2.fastq -l lib1,lib2 -x bowtie.index -f reference.fasta-p out_prefix -d -t 5 Will do: 1. bowtie_dedup_mates_separately.sh for each library Mapping each mate (-1 or -2) to reference (-r / -x) Merge both mates picardrmdup remove duplicates Merge libaries 2. picardrmdup again 3. SAMtools depth calculate average depth and STDEV You need to define depth threshold to remove reads in repeat region Note: picardrmdup will deduplicate each mate separately, so, some reads may have two duplicates Second Part: Run with -e option: extract reads, remap, and deduplicate $ readCleaner4Scaffolding.sh -1 lib1.R1.fastq,lib2.R1.fastq -2 lib1.R2.fastq,lib2.R2.fastq -l lib1,lib2 -x bowtie.index -f reference.fasta-p out_prefix -d -t 5 -e 5 Will do 4. Remove high-depth reads 5. Extract non-high-depth reads 6. Bowtie mapping 7. Clean high depth again Keep both mate mapped reads 8. Extract reads again [1] Bowtie map as mate rmdup Get read names with FLAG 1024 Exclude these reads from [1] * For further analysis bowtie map as single again * For further analysis merge BAMs * For further analysis 9. Extract reads and reference seq for SSPACE 10. run SSPACE Additional scripts with build-in help message: bam_orientations_by_windowsize_and_numpairs.pl Estimate region size with orientation error in each sequences bam_improper_insert_size_by_windowsize_and_numpairs.pl Estimate region size with insert/deletion error in each sequences bam_breaks_by_windowsize_and_numpairs.pl Estimate region size with paring error in each sequences Author: Fu-Hao Lu Post-Doctoral Scientist in Micheal Bevan laboratory Cell and Developmental Department, John Innes Centre Norwich NR4 7UH, United Kingdom E-mail: Fu-Hao.Lu@jic.ac.uk Copyright (c) 2016-2018 Fu-Hao Lu
About
No description, website, or topics provided.
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published