This documents describes the general structure of the package and provides helpful references to code and files for contributors. Preferably read the full document.
What is this package good for?
-
The Spectra package (and the
Spectra
class) provides a powerful infrastructure for mass spectrometry (MS) data in R (possibly see the SpectraTutorials for more information, in particular the Spectra-backends vignette for a description of the data structure). -
Powerful MS data algorithms algorithms are also available in Python, e.g. provided by the matchms library.
-
Why re-implement what's already available?
-
This package translates an R
Spectra
object into the matchms PythonSpectrum
data structure and allows you to call functions of the matchms package and translate the results back into R data objects.
Where to find what?
-
The R folder contains all R source files.
-
R/conversion.R contains functions to convert between R and Python data structures (e.g. between
Spectra::Spectra
andmatchms.Spectrum
). The conversion of the Python result into an R data type is handled by R's reticulate package, which can convert all basic data types between R and Python. -
R/compareSpectriPy.R contains the mass spectral similarity calculation functions. The core function is the internal
.compare_spectra_python()
function that manages the Anaconda environment, translates the data to Python data structures and calls the Python command usingpy_run_string()
. The Python command itself is generated by thepython_command()
(e.g. this) command called on the parameter objectCosineGreedyParam
. To use a new similarity calculation function or a new Python functionality/algorithm, ideally a new param object is implemented with thepython_command()
method, which returns the python command that is specific to the new algorithm/Python functionality to run in Python. -
R/basilisk.R cointains the Python environment definition and required/used Python libraries (see below for more information).
-
-
The tests folder contains all unit tests. A general testthat.R file that configures and sets up the tests and a unit test file for each R source file (named test_.R) within the testthat folder.
-
The vignettes folder contains an R markdown document that explains the use of the SpectriPy package using examples. This is a good starting point to explore the package and its functionality.
Where are python libraries defined?
-
SpectriPy uses the R reticulate package for conversion between (basic) R and Python data types.and relies on Bioconductor's basilisk package to setup and manage the Python envrionment.
-
The Python environment and required libraries are defined in the R/basilisk.R file. Different environments can be defined in that file with the required libraries (including versions).
-
To execute Python code from a certain library, the
basiliscRun()
function is used, with the respective environment providing this library being enabled and disabled with thebasiliskStart()
andbasiliskStop()
functions. -
The reticulate
r_to_py()
andpy_to_r()
functions are used for conversion of basic data types between R and Python and vice versa. To use these functions, an Python environment with the matchms library must be used (or the one defined in SpectriPy and managed by basilisk needs to be activated first usingcl <- basiliskStart(SpectriPy:::matchms_env)
(see package vignette for an example).
What data could be used in tests?
-
The package does not contain any test data files. Test and example data are created manually by defining m/z and intensity values of MS peaks. Data files could be added (e.g. in MGF format) if needed and put into a inst/extdata folder.
-
Alternatively, example files in mzML format would be available in Bioconductor's msdata package.
-
To test the package and newly created functionality: add the respective unit tests to the tests/testthat folder and evaluate them e.g. by running
rcmdcheck::rcmdcheck(args = "--no-manual")
in an R session started within the package folder.
What could be implemented?
-
Add some new similarity calculation functionality to
SpectriPy
. See also issue #19. -
Integrate other Python libraries? More a discussion - see issue #24.
-
Integrate functionality for spectra processing, downstream analysis (e.g. cleaning), ... See also issue #20.
-
Ability to translate additional data structures. See also issue #18.
-
More efficient translation of data structures. Better handling of metadata. See also issue #17.
-
Improve documentation. See also issue #25.
-
Define a use case analysis (or ideally several): show how data can be analyzed with the SpectriPy package and contrast that with a "quarto" or "Jupyter Notebook" document directly combining the R and Python code: is there really need for additional convenience functionality within an R package, or can the same, or more, be achieved with e.g. "quarto"? What are the benefits of bundling/wrapping Python functionality into R functions? See also issue #21.
-
Add more use cases and examples to the package vignette
(vignettes/SpectriPy.Rmd) file. See also issue #26.
How to contribute?
-
Ideally fork the github repository, implement extensions and make a pull request to the main branch.
-
Follow the coding style guidelines and adhere to the code of conduct.