Skip to content
View sajfb's full-sized avatar

Block or report sajfb

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sajfb/README.md

CRAN stats Python PyTorch LinkedIn Google Scholar

I'm an award-winning data scientist bridging cheminformatics and metabolomics focusing on small molecule discovery and mass spectrometry data sciences (see my award news from Metabolomics Association of North America (MANA) and my presentation details here).

I've crafted multiple computational pipelines designed for untargeted mass spectrometry data processing across diverse research domains including metabolomics, lipidomics, exposomics, and environmental studies. My software development philosophy emphasizes on maximal automation, highest precision, multi-platform compatibility, and user-friendly interfaces to minimize lab-based experiments.

I am always driven to advance next-generation AI for chemistry and biological applications.

Developing AI-Powered Digital Twins for Bioreactors at Aropha

I am currently leading the development of digital twins for bioreactors at Aropha utilizing advanced AI models to simulate bioprocesses. By creating virtual replicas of our bioreactor systems, we aim to predict performance and scale up the company’s capacity effectively. This work integrates cutting-edge AI engines with bioprocess engineering.

Completed projects

Mass Spectrometry Data Processing Workflows at the Integrated Data Science Laboratory for Metabolomics and Exposomics

image description

Tools shown in this diagram form a comprehensive pipeline for full-scale untargeted metabolomics workflow to efficiently process, and annotate large-scale mass spectrometry data. The integration of peak detection, formula annotation, fragmentation analysis, and data parsing facilitates any muti-omics or untartgeted compound discovery projects. IDSL_MINT (Mass INTerpretator) utilizes deep learning and cheminformatics to interpret MS/MS data. IDSL.IPA (Intrinsic Peak Analysis) is a chromatographic peak-picking software capable of detecting low-intensity signals (S/N > 2), pairing isotopologues with a fixed distance (e.g. ΔC = 13C - 12C = 1.003354835336 Da), correcting retention time drifts, aligning peaks across large studies (N > 200), filling gaps, and visualizing extracted and total ion chromatograms. IDSL.FSA (Fragmentation Spectra Analysis) rapidly annotates fragmentation data files (.msp and .mgf) using spectral entropy or cosine similarity, even without reliable precursor values, and can process bottom-up proteomics data. IDSL.CSA (Composite Spectra Analysis) deconvolutes fragmentation spectra from various acquisition methods like DDA and DIA (SWATH-MS, MSE, AIF). IDSL.UFA (United Formula Annotation) and its exhaustive version IDSL.UFAx annotate chromatographic peaks with molecular formulas using isotopic profile matching; IDSL.UFA handles up to 108 formulas efficiently, while IDSL.UFAx can screen 1027 formulas using 15 elements, though it is less computationally fast. IDSL.SUFA simplifies isotopic profile and adduct formula calculations without dependencies on other R packages. IDSL.NPA (Nominal Peak Analysis) processes nominal mass spectrometry data to create and annotate .msp files for untargeted MS/MS workflows. Lastly, IDSL.MXP (Mass Spectrometry Parser) is a lightweight and fast parser for mass spectrometry data files, capable of reading corrupted mass spectrometry files.

Computational mass spectrometry pipelines for environmental cheminformatics projects as part of my doctoral research

  • An IPDC (Isotopic Profile Deconvoluted Chromatogram) algorithm to screen biologically complex environmental matrices for unknown contaminants using chemometric methods. The IPDC algorithm was successfully employed in five different projects during my PhD.

Pinned Loading

  1. idslme/IDSL_MINT Public

    A Deep Learning Framework to Interpret Raw Mass Spectrometry (m/z) Data

    Python 19 1

  2. idslme/IDSL.IPA Public

    Intrinsic Peak Analysis (IPA) pipeline for peak-picking in large-scale untargeted small molecule analysis including metabolomics, lipidomics, exposomics, and environmental studies.

    R 13 1

  3. idslme/IDSL.UFA Public

    United Formula Annotation (UFA) for LC-HRMS data

    R 8 1

  4. idslme/IDSL.CSA Public

    Composite Spectra Analysis

    R 5