Target region variant calling

This is pipeline script I use for variant calling in amplicon based sequencing data, mainly gene exon sequencing.

Importtant: I am a novice programmer and I bear no responsibility whatsoever if any of these scripts that I have written wipes your computer, destroys your data, crashes your car, or otherwise causes mayhem and destruction. USE AT YOUR OWN RISK.

Dependency

FASTQ preprocessor: fastp
Primers cutter: cutPrimers
reads aligner: minimap2
sam sort: samtools
coverage stats: bamstats04
variant caller: strelka2
Analyze, share, reproduce: R markdown
R packages: dplyr, stringi, Biostrings
variant annotation: SnpSift
script template: shell-scripts
DBSNP vcf annotation file, should be declared in the script sample_file
Latex packages:

  sudo apt install texlive-latex-extra texlive-fonts-recommended texlive-xetex
  wget http://mirrors.ctan.org/macros/latex/contrib/titling.zip
  unzip titling.zip     # (might need to sudo yum install unzip)
  cd titling
  latex titling.ins
  sudo mkdir -p /usr/share/texlive/texmf-dist/tex/latex/titling
  sudo cp titling.sty /usr/share/texlive/texmf-dist/tex/latex/titling/
  sudo texhash

Usage

Importtant currently the pipeline is design for pair end data only, single end parameters temporarily not working. pair reads should be longer than 70bp, so PE100, PE150, PE300 should be ok.

Input files

sample_file one sample id/name per aligner
primer_file interleaved 5p/3p primers of amplicon PCR design
gene_file one gene symbol per line, should be official gene symbol
gene_gtf ensembl gene GTF file downloaded in ensembl website
fastq_files stored in data_dir, named as [sample id]_R1.fastq.gz, [sample_id]_R2.fastq.gz
ref_fasta reference genome fasta file, must use a relative path format, also should have a minimap2 index file in the same location, with extension mmi. To create the minimap2 index use command minimap2 -d ref.mmi ref.fa

Parameters

target_region_variant_calling.sh [OPTION]... [FILE]...

This is pipeline script I use for variant calling in amplicon based sequencing data, mainly gene exon sequencing.

 Options:
  -s, --sample_file sample list file, one sample per line
  -p, --primer_file interleaved 5p/3p primers of amplicon PCR design
  -d, --data_dir    fastq data dir, should contain [sample]_R1/2.fastq.gz, default: data
  -a, --adapter1    the adapter for read1. For SE data, if not specified, the adapter will be auto-detected. For PE data, this is used if R1/R2 are found not overlapped. (string [=auto])
  -A, --adapter2    the adapter for read2 (PE data only). This is used if R1/R2 are found not overlapped. If not specified, it will be the same as <adapter1> (string [=])
  -g, --gene_file   one gene symbol per line, should be official gene symbol
  -f, --gene_gtf    ensembl gene GTF file downloaded in [ensembl website](http://asia.ensembl.org/info/data/ftp/index.html), ungziped file
  -r, --ref_fasta   alignment referece fasta file, must use a relative path format, like ../broad-references/hg38/v0/Homo_sapiens_assembly38.fasta
  -o, --out_dir     output dir name, default is a random name outputXXXX
  -t, --threads     threads to use, default: 4
  --force           Skip all user interaction.  Implied 'Yes' to all actions.
  -q, --quiet       Quiet (no output)
  -l, --log         Print log to file
  -s, --strict      Exit script with null variables.  i.e 'set -o nounset'
  -v, --verbose     Output more information. (Items echoed to 'verbose')
  -b, --debug       Runs script in BASH debug mode (set -x)
  -h, --help        Display this help and exit
      --version     Output version information and exit

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
LICENSE		LICENSE
command.txt		command.txt
configureStrelkaGermlineWorkflow.py.ini		configureStrelkaGermlineWorkflow.py.ini
fastp_report.R		fastp_report.R
forcedGT2report.R		forcedGT2report.R
piscesGT2report.R		piscesGT2report.R
pisces_exon+rs.R		pisces_exon+rs.R
pisces_geneGT2result.R		pisces_geneGT2result.R
pisces_germlineGT2result.R		pisces_germlineGT2result.R
pisces_posfile.R		pisces_posfile.R
pisces_vcf.R		pisces_vcf.R
pisces_xlsx.R		pisces_xlsx.R
primer_report.Rmd		primer_report.Rmd
qcreport.Rmd		qcreport.Rmd
readme.md		readme.md
reduced2report.R		reduced2report.R
result_report.Rmd		result_report.Rmd
target_region_variant_calling.sh		target_region_variant_calling.sh
txgr.RData		txgr.RData
基于扩增片段靶向测序变异分析.docx		基于扩增片段靶向测序变异分析.docx
基于扩增片段靶向测序变异分析.zip		基于扩增片段靶向测序变异分析.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Target region variant calling

Dependency

Usage

Input files

Parameters

About

Releases

Packages

Languages

License

ray1919/target_region_variant_calling

Folders and files

Latest commit

History

Repository files navigation

Target region variant calling

Dependency

Usage

Input files

Parameters

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages