Analysis of MiSeq amplicon data from CRISPR experiments.
This pipeline utilises BWA to map short paired-end amplicon data to a genome sequence. After postprocessing with samtools, the data is further analysed using the CrispRVariants R package. In addition to CrispRVariants, the following R packages need to be installed:
- rtracklayer
- Biostrings
- seqinr
- GenomicFeatures
- glue
All sample information needs to be provided in a comma-separted info_file.csv
. It has to include the sample name, gene name, guide sequence excluding PAM, chromosome location of the guide sequence, start position of the guide sequence, end position of the guide sequence, strand of the guide sequence, and how many base pairs up- and downstream of the guide sequence should be analysed. An example of such a file is in info_file.csv
.
The directory where genome.fasta
is located also needs to contain the BWA index, as well as a FAI index.
It is crucial that the chromosome names in the info_file.csv
match those in genome.fasta
(e.g. chrY and >chrY, rather than Y and >chrY or vice-versa).
The pipeline can be run using the following command:
./crispr.sh -s sample_name -i info_file.csv -m -g genome.fasta -f fastqDir -o outDir
-s is the sample name
-i contains information about the guide sequence (see above)
-m is a flag indicating whether the mapping step should be executed (omit if you don't want to map)
-g is the full path to the genome file (bwa and fai index files need to be in the same directory)
-f is the directory with the fastq files
-o is the directory for the results