This is some documentation for the analysis code of vrAnalysis. The main goal of this file is to remind me what I've written so I don't forget. There may be room for explaining how it works, but at the moment, I'll just focus on using it.
The analysis of data is organized into modules that define analysis objects which group together analyses that are about the same thing and usually depend on the same basic data. This is very useful, because not only does it streamline the data loading and analysis process, but it provides a good organization for me to remember what analyses I've performed.
The analysis library is in it's own directory which you can find here if you want to look at the code in detail.
Each analysis module defines an analysis object which is a child of the
standardAnalysis
class defined
here. The standard analysis code
is pretty simple - it's job is just to add a few methods that are commonly
used by most analyses.
Key methods:
analysisDirectory()
: gets whatever path is hardcoded in the fileManagement module.saveDirectory()
: appends the name of the analysis type and the name of the specific analysis to the main analysis directory and returns itsaveFigure()
: saves an open matplotlib figure in the saveDirectory()
Here is a (hopefully mostly complete) list of analysis modules. It explains the main analyses to do with each module and the standard usage for them.
All modules require you to import the analysis package:
from vrAnalysis import analysis
from vrAnalysis import database # this is also used
vrdb = database.vrDatabase() # open a database manager
Place Cell Multi Session -- link to module
Analyzing place cells across sessions requires a "tracker" object which
processes ROI tracking data from the ROICaT package. The tracker module is
here. Tracker objects operate for an entire mouse
so open one up for whatever mouse you want to analyze, then use that as input
to the placeCellMultiSession
analysis object (pcm
). Here, I'm using
autoload=False
because otherwise it'll analyze all the place cell data
from all the tracked sessions, which takes a while.
track = tracking.tracker('ATL022') # get tracker object for mouse
pcm = analysis.placeCellMultiSession(track, autoload=False) # open up place cell multi session analysis object (don't autoload!!!)
pcm.track.session_table() # this prints a dataframe of the sessions in the tracker!
The simplest analysis to do with the pcm
object is to plot snakes across
sessions using tracked cells that are reliable on a particular session (in a
particular environment). Here, I plot snakes from environment 4 in sessions 12
to 16, sorting and choosing ROIs by their activity on session 16. The other
inputs are similar to those in place cell single session, which is described
below. Note that this will load whatever sessions you request, which can take
a bit of time, as I mentioned. This method calls another method named
make_snake_data
which retrieves the data used by plot_snake
.
# choose range of sessions based on dataframe, pick environment, and plot snake
envnum = 4
idx_ses = range(12,17)
sortby = 16 # which session to sort by
pcm.plot_snake(envnum, idx_ses=idx_ses, sortby=sortby, cutoffs=(0.5, 0.8), method='max', normalize=10, rewzone=True, interpolation='none', withShow=True, withSave=False)
NOTE THAT THIS SECTION IS NOW OBSOLETE, BUT WILL BE REPLACED!!!
This module can also analyze how place fields change across sessions. The
plot_pfplasticity
method does so by making a histogram of the changes in
place field position (with 'max'
or 'com'
, see method for details) across
sessions. You can do it for all cells or split by control or red cells with
split_red
. You can plot absolute difference or plot actual difference
(including negative changes) with abs_val
. This calls measure_pfplasticity
to make the data.
pcm.plot_pfplasticity(self, envnum, idx_ses=idx_ses, cutoffs=(0.5, 0.8), method='max', absval=True, split_red=False, withShow=True, withSave=False):
This contains a program for measuring place field plasticity across sessions as well. This needs more explanation!!!! but for now I'll just show the main workflow:
track = tracking.tracker('CR_Hippocannula7') # get tracker object for mouse
pcm = analysis.placeCellMultiSession(track, autoload=False) # open up place cell multi session analysis object (don't autoload!!!)
for envnum in [1, 2]:
for present in ['r2', 'pc']:
pcm.hist_pfplasticity(envnum, idx_ses=None, cutoffs=(0.3, 0.6), present=present, split_red=True, withShow=False, withSave=True)
Place Cell Single Session -- link to module
This analysis module is used for analyzing place fields in single sessions
(with some support for cross-session analysis as methods). The pcss
object
loads behavioral and spiking data, remaps it to a spatial map (rather than the
x-axis being time, the x-axis is space and the data is warped from temporal to
spatial coordinates). It automatically divides data by which virtual
environment the mouse was in. After loading data, it has methods for analyzing
spatial reliability of spiking, creating cross-validated "snake plots" of
neural activity in different environments, and more.
The main uses of pcss
objects are to:
- Generate spatial maps of behavioral data (occupancy, speed, licking).
- Generate spatial maps of firing patterns for each neuron.
- Measure reliability for each ROI within each environment.
- Create cross-validated snake plots within and across environments.
- Compare data across sessions using extensions in the module.
To open a placeCellSingleSession analysis object (pcss
), do this:
import random
# choose a session randomly that has registered imaging data
vrexp = random.choice(vrdb.iterSessions(imaging=True, vrRegistration=True))
print(vrexp.sessionPrint()) # show which session you chose
# create place cell analysis object
pcss = analysis.placeCellSingleSession(vrexp)
There are several options for creating a pcss
object. Most of these options
are utilized in the functions
module, for more
explanation look there.
- keep_planes: a list of planes to keep, usually ignoring the flyback plane
- distStep: the spatial resolution to measure occupancy, speed, licks, and spiking maps with.
- speedThreshold: as is standard in the field, this sets a threshold at how fast the mouse is moving (in centimeters) for including data points.
- standardizeSpks: if True, will use a median/normalization (subtract by the median, divide by the standard deviation). It's a good idea to use this because deconvolved spikes don't have clear units.
To retrieve snake plots from a session, there are two methods. One makes train and test snakes for each environment independently. The ROIs used for the snakes are determined based on their reliability on training trials, and the sort of both train & test snakes is determined based on training trials.
# envnum: requested environments to make snake for, if None, uses all
# with_reliable: whether or not to use only reliable cells as defined by cutoffs
# cutoffs: [0]=relmse cutoff, [1]=relcor cutoff (see below)
# method: how to sort -- if 'max', uses max activity, if 'com', uses center of mass
train_snake, test_snake = pcss.make_snake(envnum=None, with_reliable=True, cutoffs=(0.5, 0.8), method='max')
Alternatively, you can retrieve cross-validated snakes comparing activity
across environments. This retrieves an (N x N) grid of snakes where the ROIs
and sort in each row are determined by the environment index of the row, and
each column uses trials from the environment index of the column. Train/test
splits are used for the diagonal, and full trial lists are used for
off-diagonal. For example, row i
column j
uses ROIs that are reliable in
environment i
, sorts them based on activity in environment i
, and plots
their activity on environment j
.
# kwargs same as above, automatically uses all environments in the session.
remap_snakes = pcss.make_remap_data(with_reliable=True, cutoffs=(0.5, 0.8), method='max')
To plot the data, there are convenience methods. These use the standard
withSave=False
and withShow=True
kwargs so you can choose whether to save
or show the figure (or both).
# normalize: determines what colormap limits to use (-norm, +norm), if 0 uses max activity
# rewzone: whether or not to plot the reward zone as a patch
# interpolation: how to make the imshow plot
# withShow: whether or not to show the figure
# withSave: whether or not to save the figure using the standard analysis method (see above)
# force_single_env: whether or not to make a remap_snake_figure even if there is only a single environment in the session
plot_snake(self, envnum=None, with_reliable=True, cutoffs=(0.5, 0.8), method='max', normalize=0, rewzone=True, interpolation='none', withShow=True, withSave=False)
plot_remap_snakes(self, with_reliable=True, cutoffs=(0.5, 0.8), method='max', normalize=0, rewzone=True, interpolation='none', force_single_env=False, withShow=True, withSave=False)
To get a list of which ROIs are reliable for each environment, pcss
uses two
measures of reliability, one based on mean-square error and one based on
correlation. For both methods, the trials are divided into training and
testing sets, and the average activity (across trials) is compared.
For MSE, a measure based on deviance is used:
For COR, the pearson correlation is used:
To retrieve indices of reliable cells in each environment, use this:
# envnum is a list of environment indices
# cutoffs is a two-element tuple of reliability cutoffs
idx_reliable = get_reliable(envnum, cutoffs=None)
There are additional tools for analyzing data across sessions that are in the
module. Generally, these
tools require pcss
objects for many sessions. This can take a bit of time
and memory, so I start by doing the following:
# this is a session iterable (a list of vrExperiment objects) that I want to
# analyze, in this case I'm ignoring mouse ATL023 and only getting sessions
# that used an experimentID > 0.
ises = vrdb.iterSessions(imaging=True, vrRegistration=True, mouseName=('ATL023','!='), experimentID=(0, '>'))
# then for each session, I make a `pcss` object.
ipcss = []
for ses in tqdm(ises):
ipcss.append(analysis.placeCellSingleSession(ses))
ses.clearBuffer() # clearing the buffer removes data loaded in each session object which is unnecessary and usually quite large
To plot the difference in the fraction of reliable cells between red cells (that express tdTomato and have a conditional KO) and control cells, use this:
# make sure to feed in the preloaded `ises` and `ipcss` objects.
# cutoffs determine the reliability cutoff (see above)
# include_manual determines whether to use manually annotated cells in the red idx.
# note: you can also request to only use the suite2p index -- see the module for details
analysis.plot_reliable_difference(ises=ises, ipcss=ipcss, cutoffs=(0.5, 0.8), include_manual=True, withSave=False, withShow=True)
To plot the difference in the fraction of reliable cells between red cells
(that express tdTomato and have a conditional KO) and control cells, use this.
This will make a separate plot for each mouse in ises
/ipcss
and
automatically make different curves for each environment.
# make sure to feed in the preloaded `ises` and `ipcss` objects.
# cutoffs determine the reliability cutoff (see above)
# include_manual determines whether to use manually annotated cells in the red idx.
# note: you can also request to only use the suite2p index -- see the module for details
analysis.plot_reliable_difference(ises=ises, ipcss=ipcss, cutoffs=(0.5, 0.8), include_manual=True, withSave=False, withShow=True)
To plot the fraction of red reliable cells and control reliable cells for each environment, use this:
# same input structure as above
analysis.plot_reliable_fraction(ises=ises, ipcss=ipcss, cutoffs=(0.5, 0.8), include_manual=True, withSave=False, withShow=True)
Session Stats -- link to module
This analysis module is used for analyzing basic session data using a session iterable. As of now, this entire module is based on non-class methods, which I hope to change in the near future but I needed them quickly and this was easiest.
The data are generally produced by the sessionStats
method as follows. It
measures various things including which environments were used in each
session, how many trials per environment, how many ROIs per session, and how
many of those ROIs are considered red, according to the standard red detection
method or by suite2p alone. It organizes the outputs by mouse and envrionment.
ises = vrdb.iterSessions(imaging=True, vrRegistration=True) # get iterable of all registered imaging sessions
stats = analysis.sessionStats(ises=ises, keep_planes=[1,2,3,4], include_manual=True, use_s2p=False, s2p_cutoff=0.65) # measure session stats
miceInSession, env_counter, trial_counter, total_cell_count, red_cell_count = stats # unpack tuple
To plot the stats about which environments were visited and in how many trials
by each mouse, use this. Additional input arguments are the same as
sessionStats
.
analysis.plot_environmentStats(ises=ises, keep_planes=[1,2,3,4])
To plot the stats about how many ROIs were imaged and how many of those are
red in each session, use this. Additional input arguments are the same as
sessionStats
.
analysis.plot_roiCountStats(ises=ises, keep_planes=[1,2,3,4])
Same Cell Candidates -- link to module
This analysis module attempts to identify imaging ROIs that are probably the same cell. This can happen for two reasons:
- Suite2p sometimes oversplits cells within a plane - either erroneously or because they are dendritic ROIs and far apart from each other.
- Cells have ROIs in multiple planes, either because the planes are too close to each other or the cell's dendrite extends and is captured by suite2p.
To open a sameCellCandidate analysis object, create a vrExperiment object and choose whatever filters you want.
import random
# choose a session randomly using the vrDatabase module
vrexp = random.choice(vrdb.iterSessions(imaging=True, vrRegistration=True))
print(vrexp.sessionPrint()) # show which session you chose
# create sameCellCandidates object
# keep_planes determines which planes to use, others will be considered the flyback and ignored in all further analyses
scc = analysis.sameCellCandidates(vrexp, keep_planes=[1,2,3,4])
To observe clusters of cells based on a variety of criteria, use this block. It opens up a mediocre matplotlib GUI to look through ROI clusters.
clusterExplorer = analysis.clusterExplorer(scc, corrCutoff=0.5, maxCutoff=None, distanceCutoff=30, minDistance=None, keep_planes=[1,2,3,4])
To look at how many pairs of ROIs are found across planes that are near each other and above a certain correlation, use this:
# withSave and withShow can be reversed if doing this programmatically to save figures for many sessions
scc.planePairHistograms(corrCutoff=[0.5, 0.6, 0.7, 0.8], distanceCutoff=50, withSave=False, withShow=True)
To look at a scatter plot of the distance between ROIs vs. their correlation coefficient, use this. Beware of the alpha value.
scc.scatterForThresholds(keep_planes=[1,2,3,4], distanceCutoff=250);