The alabaster framework implements methods to save a variety of R/Bioconductor objects to on-disk representations. This is a more robust and portable alternative to the typical approach of saving objects in RDS files.
- By separating the on-disk representation from the in-memory object structure, we can more easily adapt to changes in S4 class definitions.
This improves robustness to R environment updates, especially when
updateObject()
is not correctly configured. - By using standard file formats like HDF5 and JSON, we ensure that Bioconductor objects can be easily read from other languages like Python and Javascript. This improves interoperability between application ecosystems.
- By breaking up complex Bioconductor objects into their components, we enable modular reads and writes to the backing store. We can easily read or update part of an object without having to consider the other parts.
The alabaster.base package defines the base generics to read and write the file structures along with the associated metadata. Implementations of these methods for various Bioconductor classes can be found in the other alabaster packages like alabaster.se and alabaster.bumpy.
First, we'll install the alabaster.base package. This package is available from Bioconductor, so we can use the standard Bioconductor installation process:
# install.packages("BiocManager")
BiocManager::install("alabaster.base")
The simplest example involves saving a DataFrame
inside a staging directory.
Let's mock up an object:
library(S4Vectors)
df <- DataFrame(X=1:10, Y=letters[1:10])
## DataFrame with 10 rows and 2 columns
## X Y
## <integer> <character>
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
## 6 6 f
## 7 7 g
## 8 8 h
## 9 9 i
## 10 10 j
Then we can save it to the staging directory:
tmp <- tempfile()
library(alabaster.base)
saveObject(df, tmp)
We can copy the directory to another location, over a network, etc., and then easily load it back into a new R session:
readObject(tmp)
## DataFrame with 10 rows and 2 columns
## X Y
## <integer> <character>
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
## 6 6 f
## 7 7 g
## 8 8 h
## 9 9 i
## 10 10 j
Check out the user's guide for more details.
The saving/reading process can be applied to a range of data structures, provided the appropriate alabaster package is installed.
Package | Object types | BioC-devel | BioC-release |
---|---|---|---|
alabaster.base | list , factor , DataFrame , List |
||
alabaster.matrix | matrix , Matrix objects, DelayedArray |
||
alabaster.ranges | GRanges , GRangesList and related objects |
||
alabaster.se | SummarizedExperiment , RangedSummarizedExperiment |
||
alabaster.sce | SingleCellExperiment |
||
alabaster.mae | MultiAssayExperiment |
||
alabaster.string | XStringSet |
||
alabaster.spatial | SpatialExperiment |
||
alabaster.bumpy | BumpyMatrix objects |
||
alabaster.vcf | VCF objects |
||
alabaster.files | Common bioinformatics files, e.g., FASTQ, BAM |
All packages are available from Bioconductor and can be installed with the usual BiocManager::install()
process.
Alternatively, to install all packages in one go, users can install the alabaster umbrella package.
Developers can extend this framework to support more R/Bioconductor classes by creating their own alabaster package. Check out the extension section for more details.
Developers can also customize this framework for specific applications, most typically to add bespoke metadata in the staging directory. The metadata can then be indexed by database systems like SQLite and MongoDB to provide search capabilities. Check out the applications section for more details.