-
Notifications
You must be signed in to change notification settings - Fork 9
MetQy functions and usage examples – Parsing functions
The parsing functions allow users with current KEGG FTP access to increase the usability of the KEGG databases and to provide up-to-date data to the query family of functions. As per KEGG's license, these data are hidden from the user and can only be accessed by MetQy functions, allowing direct usage of MetQy as downloaded.
MetQy features two generic parsing functions that deal with the two main KEGG file types:
- parseKEGG_file - files without extension
- parseKEGG_file.list - '.list' files
MetQy also contains file-specific functions that use these.
parseKEGG_file automatically detects the entry types of the KEGG data and transforms these into variables, stored in an R data frame.
MetQy contains six KEGG database-specific functions that use parseKEGG_file to generate R data frames followed by a series of data formatting steps specific to the KEGG database.
These functions are called parseKEGG followed by the specific KEGG database name:
- parseKEGG_compound
- parseKEGG_enzyme
- parseKEGG_genome
- parseKEGG_ko
- parseKEGG_module
- parseKEGG_reaction
There is also an umbrella function, parseKEGG_execute_all, that allows automatic execution of these individual parsing functions (and those described in the next section).
> KEGG_path <- "~/KEGG" # MODIFY TO KEGG PARENT FOLDER!
# The parent folder should contain the following (KEGG FTP structure):
# brite/
# genes/
# ligand/
# medicus/
# module/
# pathway/
# README.kegg
# RELEASE
# xml/
> compound_reference_table <- parseKEGG_compound(KEGG_path)
# A .txt file is written to "output/" (relative to current working directory)
parseKEGG_file.list transforms the file containing the relationship between two KEGG database entries in form of a binary matrix, where a 1 indicates the relationship between the two entries (0 means no relationship). For example, the mapping between K numbers and EC numbers is contained in the ko_enzyme.list file and shows which K numbers correspond to which EC numbers.
MetQy contains two KEGG file-specific functions that use parseKEGG_file.list to generate R data frames.
These functions are called parseKEGG followed by the specific KEGG file name (not including extension):
- parseKEGG_ko_enzyme (relationship between KEGG orthologs and EC numbers)
- parseKEGG_ko_reaction (relationship between KEGG orthologs and KEGG reactions)
There is also an umbrella function, parseKEGG_execute_all, that allows automatic execution of these two parsing functions along with the ones mentioned before.
> KEGG_path <- "~/KEGG" # MODIFY TO KEGG PARENT FOLDER!
> ko_enzyme_map <- parseKEGG_ko_enzyme(KEGG_path)
# A .txt file (tab separated) is written to output/ (relative to current working directory)