Skip to content

Latest commit

 

History

History
122 lines (71 loc) · 5.2 KB

scheme2.md

File metadata and controls

122 lines (71 loc) · 5.2 KB

Scheme 2

This scheme is generally preferred over scheme 1 because it can handle larger datasets that cannot be fully loaded into memory, and also has other advantages in terms of accurately detecting and labeling spikes.

In phase 1, the first scheme is used as a training step, performing unsupervised clustering on a subset of the dataset. Then in phase 2, a set of classifiers are trained based on the labels of the training step. Then in phase 3, the classifiers are used to label all spikes in the recording.

Usage

from tempfile import TemporaryDirectory
import numpy as np
import spikeinterface as si
import spikeinterface.preprocessing as spre
import mountainsort5 as ms5
from mountainsort5.util import create_cached_recording

recording = ... # load your recording using SpikeInterface

# Make sure the recording is preprocessed appropriately

# Note that if the recording traces are of float type, you may need to scale
# it to a reasonable voltage range in order for whitening to work properly
# recording = spre.scale(recording, gain=...)

# lazy preprocessing
recording_filtered = spre.bandpass_filter(recording, freq_min=300, freq_max=6000, dtype=np.float32)
recording_preprocessed: si.BaseRecording = spre.whiten(recording_filtered)

with TemporaryDirectory() as tmpdir:
    # cache the recording to a temporary directory for efficient reading
    recording_cached = create_cached_recording(recording_preprocessed, folder=tmpdir)

    # use scheme 2
    sorting = ms5.sorting_scheme2(
        recording=recording_cached,
        sorting_parameters=ms5.Scheme2SortingParameters(
            detect_sign=-1,
            phase1_detect_channel_radius=200,
            detect_channel_radius=50,
            # other parameters...
        )
    )

# Now you have a sorting object that you can save to disk or use for further analysis

Parameters

phase1_detect_channel_radius

The detect_channel_radius parameter for phase 1 (see scheme 1).

detect_channel_radius

This is just like the phase1_detect_channel_radius parameter, but it is used for phase 2.

phase1_detect_threshold

The detect_threshold parameter for phase 1 (see scheme 1).

detect_threshold

This is just like the phase1_detect_threshold parameter, but it is used for phase 2.

phase1_detect_time_radius_msec

The detect_time_radius_msec parameter for phase 1 (see scheme 1).

detect_time_radius_msec

This is just like the phase1_detect_time_radius_msec parameter, but it is used for phase 2.

phase1_npca_per_channel

The npca_per_channel parameter for phase 1 (see scheme 1).

phase1_npca_per_subdivision

The npca_per_subdivision parameter for phase 1 (see scheme 1).

detect_sign

Same as in scheme 1 (see scheme 1).

snippet_T1 and snippet_T2

Same as in scheme 1 (see scheme 1).

snippet_mask_radius

Same as in scheme 1 (see scheme 1).

max_num_snippets_per_training_batch

The maximum number of snippets to use for training the classifier in each batch. See below for more details on what constitutes a batch.

classifier_npca

The number of principal components to use for each neighborhood classifier. If None (the default), then the number of principal components will be automatically determined as max(12, M * 3) where M is the number of channels in the neighborhood.

training_duration_sec

The duration of the training data (in seconds). See also the training_recording_sampling_mode parameter.

training_recording_sampling_mode

How to sample the training data. If 'initial', then the first training_duration_sec of the recording will be used. If 'uniform', then the training data will be sampled uniformly in 10-second chunks from the recording.

classification_chunk_sec

Amount of data (in seconds) loaded into memory at a time during the extraction / classification step of scheme 2

Algorithm

Phase 1. First, a subset of the recording is extracted for training, according to the training_duration_sec and training_recording_sampling_mode parameters. Then, scheme 1 detection and clustering is performed on this training subset.

Phase 2. In this phase, a series of classifiers, one for each channel, are trained using the snippets and labels obtained from phase 1. The procedure used here is difficult to describe, so for now I will refer the reader to the documented source code in sorting_scheme2.py.

Phase 3. Once the classifiers have been fit, new events are detected throughout the entire recording using the detect_threshold and detect_channel_radius parameters (possibly different from phase1_detect_threshold and phase1_detect_channel_radius). In order to detect and sort more events that overlap in time, it is desirable to choose detect_channel_radius to be smaller than phase1_detect_channel_radius, so that events not included in phase 1 can be included in this final phase. The classifiers trained in phase 2 are then used to label each event. Again, for details we refer to the reader to the documented source code in sorting_scheme2.py.