A tool for analyzing ecDNA structure using data from PFGE amplicon separation.
CRISPRCATCH has been tested on Ubuntu 20.04.
This tool uses pre-developed tools and the user must install them before using this tool. These tool are:
-
jluebeck/PrepareAA is mandatory and bashrc variable
$PreAA
should set to the directory you install PrepareAA. For adding this variable you can change your directory to where you install PrepareAA and run the following commands:git clone https://github.com/jluebeck/PrepareAA cd PrepareAA echo export PreAA=$PWD >> ~/.bashrc source ~/.bashrc
-
The jluebeck/AmpliconArchitect is mandatory and variable
$AA_SRC
should set in the bashrc. You can find more information about how to install AA and set$AA_SRC
on the AA Github page. -
The jluebeck/CycleViz is mandatory and the variable
$CV_SRC
should set in the bashrc. You can find more information about how install CycleViz and set$CV_SRC
on CycleViz Github page. -
samtools (CRISPRCATCH supports versions >= 1.0)
-
bwa mem (BWA MEM should be on the system path after installing so it can be called by PrepareAA)
-
picard. Variable
$PICARD
should be set in the bashrc file to the directory picard is installed. For doing that, change the directory to where you have picard jar file and run the following commands:cd path/to/picard/ echo export PICARD=$PWD >> ~/.bashrc source ~/.bashrc
-
intervaltree (
pip install intervaltree
)
After installing prerequisites, please clone the PFGE repo consider running the following to add a line to your .bashrc
file:
cd PFGE/
echo export PFGE=$PWD >> ~/.bashrc
source ~/.bashrc
To ensure that you installed all requiements correctly, you may download these two PFGE CRISPRCATCH test fastq files: fastq1 and fastq2. Then run the following command. If every things installed properly, after ~30 min the pipeline should generate the GBM39 amplicon structure.
python3 wrapper.py -f1 /path/to/fastq1 -f2 /path/to/fastq2 -b i -sname GBM39 -o /path/to/output/dir -t 1 -r hg19 -bulk test_data/GBM39_AA_graph.txt -lmax 1370 -lmin 1200 -chr chr7 -g_start 55256396 -g_end 55256397
PFGE takes as an input sequence data comes from PFGE experiment for a band. Sequence data in .fastq
format, expected length of band and outputed best candidates structure for band's structure.
-f1
Path to band_r1.fastq file-f2
Path to band_r2.fastq file-b
Analyzed band label name-sname
Analyzed sample name-o
Output folder directory for saving results-t
Number of threads-r
Reference genome version. It can be hg19 or hg38.-bulk
Path to the AA breakpoint graph file for bulk WGS (non-AA)-lmax
Maximum estimated length for structure in terms of kbp.-lmin
Minimum estimated length for structure in terms of kbp.-chr
Chromosome of target site. Please write 'chr' + chromosome number.-g_start
CRISPR target site starting position (in bp).-g_end
CRISPR target site ending position (in bp).-bed
path to bed file that spcifying amplicon region. It is not a required argument, PFGE can generate this from AA breakpoint graph file on bulk.-min_sup
specify the minimum number of reads for calling a breakpoint. Default value is 2. If you have high coverage data (e.g. Novaseq) please set it to 4.-sdv
specify insert_size standard deviation for filtering reads for calling breakpoints. Default value is 8.5. This should be raised if there are an excess of artifactual breakpoints detected by AA.
example command:
python3 wrapper.py -f1 SNU16_i_R1.fastq -f2 SNU16_i_R2.fastq -b i -sname SNU16 -o output/ -t 10 -r hg19 -bulk SNU16_AA_amplicon1_graph.txt -lmax 1810 -lmin 1660 -chr chr10 -g_start 123353331 -g_end 123353350 -sdv 8.5 -min_sup 2
If you want to run the pipeline for bunch of bands together (multiple fastq pairs), you can use 's_wrapper.py'.
-sname
Analyzed Cell-line name-o
Output folder directory for saving results-t
Number of threads-r
Reference genome version. It can be hg19 or hg38.-bulk
path to the AA breakpoint graph file for bulk cell-line-bed
path to bed file that spcifying amplicon region. It is not required field, PFGE can generate this from AA breakpoint graph file on bulk.-min_sup
specifying minimum number of reads for calling a breakpoint. Default value is 2. If you have high coverage data please set it to 4.-sdv
specifying insert_size sdv for filtering reads for calling breakpoints. Default value is 8.5.-csv
specifying a csv file containing bands information. Please take a look at 'example.csv' and fill the information as needed. Please use the absolute path for column read1 and read2. example command:
python3 s_wrapper.py -csv SNU16_i.csv -sname SNU16 -t 10 -r hg19 -o output/ -bulk SNU16_AA_amplicon1_graph.txt -sdv 8.5 -min_sup 2 -bed DNU16.bed