KEGG databases description

MetQy relies mainly on three KEGG databases for analysing physiological functions: KEGG orthology, enzyme, module and genome. See below for brief descriptions.

Use of KEGG data

MetQy contains in-built KEGG data (downloaded 20/02/2018) which is hidden from the user, in compliance with the KEGG FTP licence. Users with FTP access can use the parsing functions to process the KEGG database files and to provide up-to-date information to the query functions.

MetQy includes the following data entries:

DATABASE	NUMBER OF ENTRIES	NOTES
KEGG orthology	21,800
KEGG genome	5,244	Genomes without annotations were removed. Genomes prn (T04692) and con (T04096) are not included due to limitations of the Windows OS folder naming convention.
KEGG enzyme	6,087
KEGG module	780	Modules M00611 to M00618 have been removed, as these have KEGG module definitions that involve other modules.

KEGG orthology

Modified from http://www.kegg.jp/kegg/ko.html

KEGG orthology contains information on individual genes and their functional orthologs, where individual orthologs are identified by a unique K number.

KEGG genome

Modified from http://www.kegg.jp/kegg/genome.html

KEGG genome is a repository of complete genomes identified by a unique T number and by a 3-4 letter code (Kanehisa 2017). These genomes are annotated for their gene content using KEGG orthology (i.e. K numbers), with 99.9% of the annotated genomes come from the RefSeq and GenBank databases.

KINGDOM	Number of genomes
Eukaryota	434
Bacteria	4548
Archaea	262

Enzyme Commission (EC) numbers have been mapped to KEGG orthologs (KOs). Hence, KEGG genomes also have both KEGG orthologs (K numbers) and EC numbers associated with them.

EC numbers

The EC (Enzyme Commission) nomenclature consists of 4 numerical positions separated by periods (e.g. "1.10.3.9" or "6.5.1.3"). The first position refers to the enzyme class and can be one of 6:

EC 1 - Oxidoreductases
EC 2 - Transferases
EC 3 - Hydrolases
EC 4 - Lyases
EC 5 - Isomerases
EC 6 - Ligases

The remaining positions provide more information, depending on the enzyme class.

See http://www.enzyme-database.org/class.php to investigate the classes, subclasses and sub-subclasses.

KEGG module

Modified from http://www.kegg.jp/kegg/module.html

Finally, KEGG module is an expert-curated database that groups K numbers into modules.

There are four types of modules:

pathway modules refer to functional units in KEGG metabolic pathway maps,
structural complexes refer to molecular machines or complexes,
functional sets describe other essential sets, and
signature modules are groups of genes associated with a phenotype.

Examples of modules are those for the TCA cycle, nitrogen assimilation or methane oxidation.

KEGG module definition

Each KEGG module is defined by a logical expression of the involved KEGG orthologs. For example, the cysteine biosynthesis module (M00021) has two blocks, each composed of the following genes:

K00640
K01738|K12339|K13034|K17069

Note that the pipe (|) denotes an OR operation. In other examples, the ampersand (&) denotes an AND operation

The block-based definition of modules facilitates the evaluation of whether a genome contains a given module by assessing each module block. Here, we define the module completeness fraction (mcf) for each module, which is calculated as the number of fully complete blocks divided by the total number of blocks. A genome with a complete gene set would result in a mcf of 1.

REFERENCES

Kanehisa, M. et al., 2017. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Research, 45(D1), pp.D353–D361.
http://www.kegg.jp/kegg/ko.html
http://www.kegg.jp/kegg/genome.html
http://www.kegg.jp/kegg/module.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly