Skip to content

MetQy functions and usage examples – Parsing functions

Andrea Martinez Vernon edited this page Feb 28, 2018 · 4 revisions

The parsing functions allow users with current KEGG FTP access to increase the usability of the KEGG databases and to provide up-to-date data to the query family of functions. As per KEGG's license, these data are hidden from the user and can only be accessed by MetQy functions, allowing direct usage of MetQy as downloaded.

MetQy features two generic parsing functions that deal with the two main KEGG file types:

MetQy also contains file-specific functions that use these.


parseKEGG_file

parseKEGG_file automatically detects the entry types of the KEGG data and transforms these into variables, stored in an R data frame.

parseKEGG database specific functions

MetQy contains six KEGG database-specific functions that use parseKEGG_file to generate R data frames followed by a series of data formatting steps specific to the KEGG database.

These functions are called parseKEGG followed by the specific KEGG database name:

  • parseKEGG_compound
  • parseKEGG_enzyme
  • parseKEGG_genome
  • parseKEGG_ko
  • parseKEGG_module
  • parseKEGG_reaction

There is also an umbrella function, parseKEGG_execute_all, that allows automatic execution of these individual parsing functions (and those described in the next section).

Usage example

> KEGG_path <- "~/KEGG" # MODIFY TO KEGG PARENT FOLDER!

# The parent folder should contain the following (KEGG FTP structure):    
#	brite/
#	genes/
#	ligand/
#	medicus/
#	module/
#	pathway/
#	README.kegg
#	RELEASE
#	xml/
> compound_reference_table <- parseKEGG_compound(KEGG_path)
# A .txt file is written to "output/" (relative to current working directory)

parseKEGG_file.list

parseKEGG_file.list transforms the file containing the relationship between two KEGG database entries in form of a binary matrix, where a 1 indicates the relationship between the two entries (0 means no relationship). For example, the mapping between K numbers and EC numbers is contained in the ko_enzyme.list file and shows which K numbers correspond to which EC numbers.

parseKEGG_file.list database specific functions

MetQy contains two KEGG file-specific functions that use parseKEGG_file.list to generate R data frames.

These functions are called parseKEGG followed by the specific KEGG file name (not including extension):

  • parseKEGG_ko_enzyme (relationship between KEGG orthologs and EC numbers)
  • parseKEGG_ko_reaction (relationship between KEGG orthologs and KEGG reactions)

There is also an umbrella function, parseKEGG_execute_all, that allows automatic execution of these two parsing functions along with the ones mentioned before.

Usage example

> KEGG_path     <- "~/KEGG" # MODIFY TO KEGG PARENT FOLDER!
> ko_enzyme_map <- parseKEGG_ko_enzyme(KEGG_path)
# A .txt file (tab separated) is written to output/ (relative to current working directory)