Skip to content
xonq edited this page Jul 19, 2024 · 9 revisions

CLOCI

NOTE

Extensive alpha testing has been conducted, though this software is in a beta state. Errors are expected, often rerunning without changing parameters is sufficient to resume appropriately. Kindly raise git issues for errors - if you can find the bug, even better! Documentation is currently in the works.


PURPOSE

The most common gene cluster detection algorithms focus on canonical “core” biosynthetic functions many gene clusters encode, while overlooking uncommon or unknown cluster classes. These overlooked clusters are a potential source of novel natural products and comprise an untold portion of overall gene cluster repertoires. Unbiased, function-agnostic detection algorithms therefore provide an opportunity to reveal novel classes of gene clusters and more broadly define genome organization. CLOCI (Co-occurrence Locus and Orthologous Cluster Identifier) is an algorithm that identifies gene clusters using multiple proxies of selection for coordinated gene evolution. In the process, CLOCI circumscribes loci into homologous locus groups, which is an extension of orthogroups to the locus-level. Our approach generalizes gene cluster detection and gene cluster family circumscription, improves detection of multiple known functional classes, and unveils noncanonical gene clusters. CLOCI is suitable for genome-enabled specialized metabolite mining, and presents an easily tunable approach for delineating gene cluster families and homologous loci.


INSTALL

conda create -n cloci cloci

CLOCI is hosted at Bioconda and PyPi


CITING

Zachary Konkel, Laura Kubatko, Jason C Slot, CLOCI: unveiling cryptic fungal gene clusters with generalized detection, Nucleic Acids Research, 2024;, gkae625, https://doi.org/10.1093/nar/gkae625


ON THE ALGORITHM

Pipeline

CLOCI

Recovery of 68 reference clusters

Recovery of 68 reference clusters

Boundary assessment of 33 reference clusters

Boundary assessment of 33 reference clusters

Reference cluster characterization

TMD: total microsynteny distance, PDS: phylogenetic distribution sparsity, GCL: Gene commitment to the locus, CSB: conservative subsitution bias