A pipeline and GUI for determining which ROIs match features in a fluorescence image. It is a common challenge in biology to determine whether a particular ROI (i.e. a collection of weighted pixels representing an inferred structure in an image) overlaps with features of a fluorescence image co-registered to the ROI. For example, in neuroscience, we might use suite2p to extract ROIs indicating active cells using a functional fluorophore like GCaMP, but want to know if the cells associated with those ROIs contain a secondary fluorophore like tdTomato. This package helps you do just that!
The package itself is somewhat simple, but we spent lots of time thinking about how to do this in the most reliable way. The standard pipeline computes a set of standard features for each ROI in comparison to a reference image which are useful for determining whether an ROI maps onto fluorescence. We provide a GUI for viewing the ROIs, the reference images, a distribution of feature values for each ROI, and an interactive system for deciding where to draw cutoffs on each feature to choose the ROIs that contain fluorescence. There's also a system for manual annotation if the automated system doesn't quite get it all right.
This package is installable with pip from PyPI. It is a lightweight package with minimal
dependencies, so is probably compatible with other python environments you might use.
If you're starting from scratch, first make a python environment, activate it, and
install cellector
with pip. If you are using an existing environment, skip the first
two steps and just do pip install from within the environment.
conda create -n cellector
conda activate cellector
pip install cellector
The basic workflow of this package is as follows:
- Construct an
RoiProcessor
object. - Use the
SelectionGUI
. - Save the data.
- Repeat (or use scripting to speed up).
If you want to see the basic workflow in a notebook, look here. Otherwise, read the instructions below or look at the documentation which is very far from complete at the moment.
We've provided a few functions to make RoiProcessor
objects that work differently
depending on what kind of data you are starting with. For an exhaustive list, go
here. If you are working directly on the output of suite2p, use:
from cellector.io import create_from_suite2p
suite2p_dir = # define your suite2p path - the one with plane0, plane1, ... in it
roi_processor = create_from_suite2p(suite2p_dir)
Then, open the SelectionGUI
as follows:
from cellector.gui import SelectionGUI
gui = SelectionGUI(roi_processor)
Then, use the GUI and hit save! Instructions for the GUI are here.
The GUI works, but it can be a bit tedious to open it over and over again when you know you want the same settings for a group of sessions. To enable quick application of feature criteria settings to many sessions, we have included scripting tools.
from cellector.io import propgate_criteria
from cellector.manager import CellectorManager
# Copy criteria from suite2p_dir to all the other directories
other_directories = [Path(r"C:\Path\to\other\suite2p"), Path(r"C:\Path\to\another\suite2"), ...] # as many as you like
success, failure = propagate_criteria(suite2p_dir, *other_directories)
for directory in other_directories:
# Make an roi_processor for each session(directory), this will compute features and save the data
roi_processor = create_from_suite2p(directory) # or whichever method you used to create the roi_processor
# Make a manager instance
manager = CellectorManager.make_from_roi_processor(roi_processor)
# this will save the updated criteria and idx_selection to cellector directory
# it will also save empty manual label arrays if they don't exist
manager.save_all()
Several changes will prevent or complicate backwards compatibility. Here's what you need to know:
targetcell.npy
→idx_selection.npy
: the name convention of the main output has been changed from targetcell.npy to idx_selection.npy- Manual selection shape from
(num_rois, 2)
to(2, num_rois)
: manual selection is changed from a stack across labels and active_label - Feature files from
{feature_name}.npy
to{feature_name}_feature.npy
: feature files now have a suffix for automatic identification - Criteria files from
{feature_name}_criteria.npy
to{feature_name}_featurecriteria.npy
: criteria files suffix changed
To address these changes, version 1.0.0 includes migration utilities to fix existing data
structures. You can use identify_celector_folders
to get all folders that contain a
cellector directory. The other three functions operate on this list and fix the filenames
or data structure (transposing manual selection) on all cellector files. These functions
are in cellector/io/operations.
from pathlib import Path
from cellector.io import identify_cellector_folders
from cellector.io import update_feature_paths
from cellector.io import update_manual_selection_shape
from cellector.io import update_idx_selection_filenames
top_level_dir = Path(r"C:\some\path\that\has\all\the\cellector\directories\beneath\it")
root_dirs = identify_cellector_folders(top_level_dir)
update_idx_selection_filenames(root_dirs)
update_manual_selection_shape(root_dirs)
update_feature_paths(root_dirs)
Note that a few other things have changed in version 1.0.0, see the CHANGELOG for more detailed descriptions!
There are a few "hyperparameters" to the package, including filtering parameters, the eps value for phase correlation, and size parameters for centered stacks. We need to enable hyperparameter optimization for these, which a user can supervise themselves. Idea: The user could open a GUI that compares masks with reference images for some sample "true" data and in addition for any data they've loaded in. One idea: For a particular set of hyperparameters (filtering, for example), the user could get a histogram of feature values for all the features for all the masks. They could use cutoff lines to pick a range of feature values for that particular set of hyperparameters, and then scroll through mask matches that come from within that range. This way, they could determine how the hyperparameters affect the feature values at each part of the distribution and select hyperparameters that give good separation. In addition, there could be some automated tuning, for example, to pick the eps a user could just input the maximum size ROI, and then measuring the average power for higher spatial frequencies.
To help choose hyperparameters and see how it's working, I'm going to build some tools to visualize masks and the reference image under different conditions.
Feel free to contribute to this project by opening issues or submitting pull requests. It's already a collaborative project, so more minds are great if you have ideas or anything to contribute!
This project is licensed under the GNU License. If you use this repository as part of a publication, please cite us. There's no paper associated with the code at the moment, but you can cite our GitHub repository URL or email us for any updates about this issue.