-
Notifications
You must be signed in to change notification settings - Fork 1
Step 1: Getting Started
BEANIE is a powerful tool for comparing differences between groups that share a subpopulation of cells. It is recommended to use BEANIE after fine-grained cell-type annotations are done using scanpy/seurat or an equivalent single-cell data analysis pipeline.
Before running BEANIE, it is necessary to subset the data to a particular cell subpopulation of interest using the subset() function in Seurat or the [ ] operator in scanpy. For example:
# subset seurat object (sobj) to particular cell subpopulation
sobj_subset = subset(sobj, idents=c("tumor_subpopulation1"))
# subset scanpy object (adata) to particular cell subpopulation
adata_subset = adata[adata.obs.cell_type=="tumor_subpopulation1",:]
BEANIE uses the (genes x cells) counts matrix as input. .csv
, .tsv
, .h5ad
file formats are acceptable. It should also be specified whether the input counts matrix is normalised or not, using the parameter normalised
when creating a BEANIE object.
For a seurat object sobj_subset
, export the counts matrix as a .csv file as follows:
# export as a .csv file
write.csv(sobj_subset@assays$RNA@counts, "counts.csv", quote=F)
Counts matrix prepared in this way is usually already normalised if the Seurat workflow is followed. Therefore, normalised = True
must be set.
It is recommended to use the anndata object directly as input to BEANIE for faster run-time. In this case, BEANIE uses the .raw
layer, if present, otherwise uses the default layer. It is important that the data in this layer is only normalised and NOT scaled.
# export as .h5ad file
adata_subset.write_h5ad("adata_subset.h5ad")
Alternatively, a .csv
or .tsv
formatted input file may also be extracted as follows (though it slows the run time) -
# export as .csv format
counts_df = pd.DataFrame(adata_subset.raw.X, index = adata_subset.raw.var_names, columns = adata_subset.raw.obs_names)
counts_df.T.to_csv("counts.csv")
Counts matrix prepared in this way is usually already normalised if the scanpy workflow is followed. Therefore, normalised = True
must be set.
This file should contain two columns, sample_id
and group_id
, corresponding to each cell present in the counts matrix. The method currently supports comparisons between two groups. .csv
and .tsv
formats are accepted.
For a seurat object sobj_subset
, export the metadata file as a .csv
file as follows:
write.csv(sobj_subset@meta.data, "metad.csv", quote=F)
For a scanpy object adata_subset
, export the metadata file as a .csv
file as follows:
adata_subset.obs.to_csv("metad.csv")
The test signatures file is a file that contains a list of genes for each gene signature that needs to be tested. Example files can be found in the test_data
folder. The test signatures file can be in one of the following acceptable formats:
-
.gmt
- This format has each row containing the gene names of a particular signature, with row names corresponding to the signature's name. -
.csv
/.tsv
- This format has every column containing the gene names of a particular signature, with column names corresponding to the signature's name.