Skip to content

Latest commit

 

History

History
162 lines (114 loc) · 6.96 KB

devnotes.md

File metadata and controls

162 lines (114 loc) · 6.96 KB

Developer notes

This documents describes the general structure of the package and provides helpful references to code and files for contributors. Preferably read the full document.

General info

What is this package good for?

  • The Spectra package (and the Spectra class) provides a powerful infrastructure for mass spectrometry (MS) data in R (possibly see the SpectraTutorials for more information, in particular the Spectra-backends vignette for a description of the data structure).

  • Powerful MS data algorithms algorithms are also available in Python, e.g. provided by the matchms library.

  • Why re-implement what's already available?

  • This package translates an R Spectra object into the matchms Python Spectrum data structure and allows you to call functions of the matchms package and translate the results back into R data objects.

General package structure

Where to find what?

  • The R folder contains all R source files.

    • R/conversion.R contains functions to convert between R and Python data structures (e.g. between Spectra::Spectra and matchms.Spectrum). The conversion of the Python result into an R data type is handled by R's reticulate package, which can convert all basic data types between R and Python.

    • R/compareSpectriPy.R contains the mass spectral similarity calculation functions. The core function is the internal .compare_spectra_python() function that manages the Anaconda environment, translates the data to Python data structures and calls the Python command using py_run_string(). The Python command itself is generated by the python_command() (e.g. this) command called on the parameter object CosineGreedyParam. To use a new similarity calculation function or a new Python functionality/algorithm, ideally a new param object is implemented with the python_command() method, which returns the python command that is specific to the new algorithm/Python functionality to run in Python.

    • R/basilisk.R cointains the Python environment definition and required/used Python libraries (see below for more information).

  • The tests folder contains all unit tests. A general testthat.R file that configures and sets up the tests and a unit test file for each R source file (named test_.R) within the testthat folder.

  • The vignettes folder contains an R markdown document that explains the use of the SpectriPy package using examples. This is a good starting point to explore the package and its functionality.

Python setup and configuration

Where are python libraries defined?

  • SpectriPy uses the R reticulate package for conversion between (basic) R and Python data types.and relies on Bioconductor's basilisk package to setup and manage the Python envrionment.

  • The Python environment and required libraries are defined in the R/basilisk.R file. Different environments can be defined in that file with the required libraries (including versions).

  • To execute Python code from a certain library, the basiliscRun() function is used, with the respective environment providing this library being enabled and disabled with the basiliskStart() and basiliskStop() functions.

  • The reticulate r_to_py() and py_to_r() functions are used for conversion of basic data types between R and Python and vice versa. To use these functions, an Python environment with the matchms library must be used (or the one defined in SpectriPy and managed by basilisk needs to be activated first using cl <- basiliskStart(SpectriPy:::matchms_env) (see package vignette for an example).

Test data

What data could be used in tests?

  • The package does not contain any test data files. Test and example data are created manually by defining m/z and intensity values of MS peaks. Data files could be added (e.g. in MGF format) if needed and put into a inst/extdata folder.

  • Alternatively, example files in mzML format would be available in Bioconductor's msdata package.

  • To test the package and newly created functionality: add the respective unit tests to the tests/testthat folder and evaluate them e.g. by running rcmdcheck::rcmdcheck(args = "--no-manual") in an R session started within the package folder.

Potential contributions and extensions

What could be implemented?

  • Add some new similarity calculation functionality to SpectriPy. See also issue #19.

  • Integrate other Python libraries? More a discussion - see issue #24.

  • Integrate functionality for spectra processing, downstream analysis (e.g. cleaning), ... See also issue #20.

  • Ability to translate additional data structures. See also issue #18.

  • More efficient translation of data structures. Better handling of metadata. See also issue #17.

  • Improve documentation. See also issue #25.

  • Define a use case analysis (or ideally several): show how data can be analyzed with the SpectriPy package and contrast that with a "quarto" or "Jupyter Notebook" document directly combining the R and Python code: is there really need for additional convenience functionality within an R package, or can the same, or more, be achieved with e.g. "quarto"? What are the benefits of bundling/wrapping Python functionality into R functions? See also issue #21.

  • Add more use cases and examples to the package vignette

    (vignettes/SpectriPy.Rmd) file. See also issue #26.

Contributing

How to contribute?