Skip to content

Functional Families generation using embedding distance matrices

License

Notifications You must be signed in to change notification settings

UCLOrengoGroup/eMMA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

eMMA

Functional Families generation using embedding distance matrices

Overview

New CATH-Gemma algorithm to generate trees of relationships between sequences and functions.

The key change allows to use distances from pLM embeddings or structural distances instead of HMM-vs-HMM comparisons.

Main features

  • Revised protocol to use MMseqs2 instead of CD-HIT.
  • Python CLI generating SGE or local jobs
  • embedding distances or 1/bitscore distances from Foldseek as data source for functional relationships
  • Faster, low memory footprint. (i.e. For the HUPS Superfamily (3.40.50.620) 22 hours to 6 hours).

This repo is part of the FunFams pipeline as an intermediate step before FunFHMMER.

The eMMA version of FunFHMMER can be found at funfhmmer-emma

See the GeMMA Wiki for documentation on GEMMA and check out the step-by-step walkthrough here

About

Functional Families generation using embedding distance matrices

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published