Skip to content

Step 1: Getting Started

Shreya Johri edited this page Apr 10, 2023 · 4 revisions

Prepare Input files

BEANIE is a powerful tool for comparing differences between groups that share a subpopulation of cells. It is recommended to use BEANIE after fine-grained cell-type annotations are done using scanpy/seurat or an equivalent single-cell data analysis pipeline.

Subsetting the data

Before running BEANIE, it is necessary to subset the data to a particular cell subpopulation of interest using the subset() function in Seurat or the [ ] operator in scanpy. For example:

# subset seurat object (sobj) to particular cell subpopulation
sobj_subset = subset(sobj, idents=c("tumor_subpopulation1"))
# subset scanpy object (adata) to particular cell subpopulation
adata_subset = adata[adata.obs.cell_type=="tumor_subpopulation1",:]

Counts Matrix

BEANIE uses the (genes x cells) counts matrix as input. .csv, .tsv, .h5ad file formats are acceptable. It should also be specified whether the input counts matrix is normalised or not, using the parameter normalised when creating a BEANIE object.

Preparing Counts Matrix from a Seurat Object

For a seurat object sobj_subset, export the counts matrix as a .csv file as follows:

# export as a .csv file
write.csv(sobj_subset@assays$RNA@counts, "counts.csv", quote=F)

Counts matrix prepared in this way is usually already normalised if the Seurat workflow is followed. Therefore, normalised = True must be set.

Preparing Counts Matrix from an Anndata Object

It is recommended to use the anndata object directly as input to BEANIE for faster run-time. In this case, BEANIE uses the .raw layer, if present, otherwise uses the default layer. It is important that the data in this layer is only normalised and NOT scaled.

# export as .h5ad file
adata_subset.write_h5ad("adata_subset.h5ad")

Alternatively, a .csv or .tsv formatted input file may also be extracted as follows (though it slows the run time) -

# export as .csv format
counts_df = pd.DataFrame(adata_subset.raw.X, index = adata_subset.raw.var_names, columns = adata_subset.raw.obs_names)
counts_df.T.to_csv("counts.csv")

Counts matrix prepared in this way is usually already normalised if the scanpy workflow is followed. Therefore, normalised = True must be set.

Meta data File

This file should contain two columns, sample_id and group_id, corresponding to each cell present in the counts matrix. The method currently supports comparisons between two groups. .csv and .tsv formats are accepted.

Preparing metadata file from a Seurat object

For a seurat object sobj_subset, export the metadata file as a .csv file as follows:

write.csv(sobj_subset@meta.data, "metad.csv", quote=F)

Preparing metadata files from an Anndata object

For a scanpy object adata_subset, export the metadata file as a .csv file as follows:

adata_subset.obs.to_csv("metad.csv")

Test Signatures File

The test signatures file is a file that contains a list of genes for each gene signature that needs to be tested. Example files can be found in the test_data folder. The test signatures file can be in one of the following acceptable formats:

  • .gmt - This format has each row containing the gene names of a particular signature, with row names corresponding to the signature's name.
  • .csv/.tsv - This format has every column containing the gene names of a particular signature, with column names corresponding to the signature's name.