-
Notifications
You must be signed in to change notification settings - Fork 1
Isotopic Profile DataBase (IPDB)
NOTE: IDSL.UFA v1.8 requires new IPDBs with the new structure
The annotation step in the IDSL.UFA software package depends on pre-calculated Isotopic Profile DataBases (IPDB) to efficiently annotate chromatographic peaks with molecular formulas. IPDB libraries can be saved and re-used in similar workflows which is also necessary for consistency in population-size studies. Generally, IPDB objects are R lists consisting of eight primary objects including:
logIPDB: Parameters used to create the IPDB object
AggregatedList: A list of rounded mass and IDs
MassMAIso: A vector of mass of the most abundant isotopologues
MolecularFormula: A vector of molecular formula ions
IsotopicProfile: A list of theoretical isotopic profiles
R13C: A vector of theoretical R13C values
IndexMAIso: A vector of indices of the most abundant isotopologues in the isotopic profiles
IPsize: A vector of number of isotopologues in the isotopic profiles
Retention Time: An optional feature to include retention times of the molecular formulas to annotate using a retention time window
Two approaches embedded in the IDSL.UFA workflow to generate IPDBs:
In many instances, a chemical space for an analysis can be predicted with sample preparation and instrumental methods. When boundaries of a chemical space is known, the chemical space can be generated using the enumerated_chemical_space
tab in the UFA parameter spreadsheet to detect unknown molecular formulas. The vast enumerated chemical space can be optimized with five intelligent molecular formula prioritization rules and additional user-defined conditional rules. An IPDB can cover up to 108 molecular formulas from a chemical space. Prior to performing a complete chemical space enumeration, IDSL.UFA attempts to measure required time for iteration loops to prevent memory overflow.
Follow these steps to generate an IPDB from a list of known molecular formulas
-
Select the chemical space boundaries and the criteria in the
enumerated_chemical_space
tab in the UFA parameter spreadsheet -
Select YES for PARAM0001 and PARAM0002 in the
parameters
tab in the UFA parameter spreadsheet -
Run this command in R or Rstudio console or terminal:
IDSL.UFA::UFA_workflow("Address of the UFA parameter spreadsheet")
IPDBs can be generated using the formula_source
tab in the UFA parameter spreadsheet when a number of suspect molecular formulas are avialable. This IPDB generation approach allows including Retention Time values for a narrower screening using a retention time tolerance in addition to isotopic profile screening. Additioanlly, we generated IPDBs consistent with IDSL.UFA >= 1.8 for molecular formulas of the following databases.
- Blood exposome
- EPA CompTox chemicals dashboard
- FDA substance registry
- IDSL.Exposome
- LIPID MAPS
- RefMet
- PubChem databases
These IPDB libraries can be accessed using this link for positive and negative modes. These IPDB libraries were generated presuming occurance of c("[M+H]+", "[M+Na]+", "[M-H2O+H]+") and c("[M-H]-", "[M-H2O-H]-") ionization pathways in positive and negative modes, respectively. Therefore, numbers of molecular formula ions in IPDBs are approximately a factor of the number of ionization pathways multiplied by the number of intact molecular formulas. Non-carbon-containing compounds are excluded from IPDBs since IDSL.IPA cannot detect non-carbon-containing compounds in the first place.
Follow these steps to generate an IPDB from a list of known molecular formulas
-
Prepare the list of molecular formulas in a file with .csv/.xlsx/.txt format in one column. The .csv/.xlsx files may have a second column for retention time values in minutes to match peaks using retention times as well. Do not use headers for the .csv/.xlsx files.
-
Select the parameters in the
formula_source
tab in the UFA parameter spreadsheet -
Select YES for PARAM0001 and PARAM0003 in the
parameters
tab in the UFA parameter spreadsheet -
Run this command in R or Rstudio console or terminal:
IDSL.UFA::UFA_workflow("Address of the UFA parameter spreadsheet")
There are more databases to extract molecular formulas and create your own IPDBs based on your analyses' needs. For example, We recommend following known sources of molecular formula for human specimens.
-
MeSH Database: NLM - MeSH ontology and linked compunds in the PubChem database.