delfies
is a tool for the detection of DNA breakpoints with de-novo telomere addition.
It identifies genomic locations where double-strand breaks have occurred followed by telomere addition. It was initially designed and validated for studying the process of Programmed DNA Elimination in nematodes, but should work for other clades and applications too.
delfies
takes as input a genome fasta (gzipped supported) and an indexed SAM/BAM of
sequencing reads aligned to the genome.
delfies --help
samtools index <aligned_reads>.bam
delfies <genome>.fa.gz <aligned_reads>.bam <output_dir>
cat <output_dir>/breakpoint_locations.bed
Using pip
(or equivalent - poetry, etc.):
# Install latest release from PyPI
pip install delfies
# Or install a specific release from PyPI:
pip install delfies==0.6.0
# Or clone and install tip of main
git clone https://github.com/bricoletc/delfies/
pip install ./delfies
delfies --help
- Do use the
--threads
option if you have multiple cores/CPUs available. - [Breakpoints]
- There are two types of breakpoints: see detailed docs.
- Nearby breakpoints can be clustered together to account for variability in breakpoint location (
--clustering_threshold
).
- [Region selection]: You can select a specific region to focus on, specified as a string or as a BED file.
- [Telomeres]
- Specify the telomere sequence for your organism using
--telo_forward_seq
. If you're unsure, I recommend the tool telomeric-identifier for finding out.
- Specify the telomere sequence for your organism using
- [Aligned reads]
- To analyse confidently-aligned reads only, you can filter reads by MAPQ (
--min_mapq
) and by bitwise flag (--read_filter_flag
). - You can tolerate more or less mutations in the telomere sequences (and in the reads) using
--telo_max_edit_distance
and--telo_array_size
.
- To analyse confidently-aligned reads only, you can filter reads by MAPQ (
The two main outputs of delfies
are:
breakpoint_locations.bed
: a BED-formatted file containing the location of identified elimination breakpoints.breakpoint_sequences.fasta
: a FASTA-formatted file containing the sequences of identified elimination breakpoints
I highly recommend visualising your results! E.g., by loading your input
fasta and BAM and output delfies
' output breakpoint_locations.bed
in
IGV.
- The fasta output enables looking for sequence motifs that occur at breakpoints, e.g. using MEME.
- The BED output enables classifying a genome into retained and eliminated regions. The 'strand' of breakpoints is especially useful for this: see detailed docs.
- The BED output also enables assembling past somatic telomeres: for how to do this, see detailed docs.
For more details on delfies
, including outputs and applications, see detailed_docs.
Contributions always welcome!
Please see CONTRIBUTING.md for how (reporting issues, requesting features, contributing code).