Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to correctly preprocess data in spikeinterface before running kilosort #3483

Open
chongtianyifa opened this issue Oct 17, 2024 · 5 comments
Assignees
Labels
question General question regarding SI

Comments

@chongtianyifa
Copy link

Hi,

I am confused about the preprocessing chain before running kilosort. In nearly all provided examples, 'highpass_filter', 'phase_shift' and 'common_reference' were run and sorters were called like:
sorting = si.run_sorter('kilosort2_5', rec, output_folder=base_folder / 'kilosort2.5_output', docker_image=True, verbose=True, **params_kilosort2_5)
Here 'car' and 'skip_kilosort_preprocessing' should be True and False by default, which means the filtering and common_reference were calculated again in kilosort. What is the meaning of the preprocessing in spikeinterface?

I am trying to do all preprocessing and drift correction using 'dreg' in spikeinterface before calling kilosort by:
sorting = si.run_sorter(sorter_name='kilosort2_5', recording=raw_rec, folder=r'F:\B12M2S10_g0\B12M2S10_g0_imec0\kilo25-3\sort', skip_kilosort_preprocessing=True, scaleproc=200, do_correction=False, docker_image=False, verbose=True)
The running failed, most likely due to the data not being whitened by setting skip_kilosort_preprocessing as True. Any suggestion about my pipeline?

@zm711
Copy link
Collaborator

zm711 commented Oct 17, 2024

Sorry I'm trying to understand this part of your statement:

Here 'car' and 'skip_kilosort_preprocessing' should be True and False by default, which means the filtering and common_reference were calculated again in kilosort. What is the meaning of the preprocessing in spikeinterface?

The default is to do the Kilosort preprocessing. This is true for all sorters. We want the default to just be what the sorter wants do to. Ie you just put in a raw_recording in order to let the sorter handle everything. We also give you the option to do your own preprocessing so you can fine-tune in which case you do the preprocessing yourself and then set the values to car=False, skip_kilosort_preprocessing=True.

As far as your second issue. You raise a great point that whitening is super important for many spike sorters these days. We have a whitening function you can read the params here and you could add that to kilosort workflow. Since I don't know the dredge workflow super well I'll tag @cwindolf to see if he has advice on plugging in his code into a kilosort workflow or whether we should tag someone else on the team.

@zm711 zm711 added the question General question regarding SI label Oct 17, 2024
@cwindolf
Copy link
Collaborator

Thanks @zm711 -- whitening is definitely important for KS, so adding a si.whiten() to your preprocessing chain would be important if you have skip_kilosort_preprocessing=True. I actually haven't used it before, so I'm not sure whether you need to do that with int_scale=200 -- maybe someone else can comment? Can't find an example of a workflow like this in the docs.

@JoeZiminski
Copy link
Collaborator

Hey @chongtianyifa could you link some of the examples that were followed? I agree with @zm711 this is the expected behaviour. It might be useful to update these examples so kilosort is called with car=False and freq_min=<some value lower than the low bound on the spikeinterface bandpass> (unfortunately you can't turn it off entirely in KS) so it is clear what is going on. Doing these two preprocessing steps again should have little effect (both filters are zero phase) but might as well skip it if you can.

I think @cwindolf is correct you should run si.whiten with int_scale=200 to replicate the kilosort behaviour. As this workflow is relatively new (all preprocessing in spikeinterface, sorting only with KS) I would suggest comparing the outputs of kilosort using the two workflows. Unfortunately it is not simple to get access to the KS preprocessed data (#2650 , though this may have changed in KS4) but you can get a rough idea by looking at the template shapes (in general, their order will unlikely correspond 1:1 across runs) and plotting spikes directly from the kilosort output with kilosort drift map. I'm currently porting this to python / spikeinterface, should be able to finalise the PR by today / tomorrow so will keep you updated if useful.

That being said, it depends what you want to do. If you want to leverage the spikeinterface postprocessing toolkit, you will want a recording object to pass to the sorting analyzer that matches as closely as possible the preprocessed recording. As an extreme example, it would be a mistake to load raw data into spikeinterface, run the sorting with all kilosort preprocessing steps, then load the sorting output and the raw (unpreprocessed) spikeinterface recording into a sorting analyzer.

However, if you don't want to use spikeinterface postprocessing, you can just pass the raw data to kilosort (using it's own preprocessing) and work with the kilosort output directly.

@chongtianyifa
Copy link
Author

chongtianyifa commented Oct 17, 2024

@JoeZiminski These two examples preprocessed recording and called kilosort allowing it to preprocess again:
https://github.com/SpikeInterface/spikeinterface/blob/main/examples/get_started/quickstart.py
https://github.com/SpikeInterface/spikeinterface/blob/main/examples/how_to/analyze_neuropixels.py.
According to your explanation, can I say that repeating preprocessing should be ok as CAR and highpass filtering should not affect preprocessed data too much?

By summary, there are three options about preprocessing. Please correct me @cwindolf @JoeZiminski @zm711 if my understanding is not right.

  1. Do not preprocess, just feed the raw recording into kilosort. Then you cannot use postprocessing toolkits.
  2. Do all preprocessing in spikeinterface, including 'highpass_filter', 'phase_shift', 'common_reference' and 'whiten'. Feed this rec_preprocess into kilosort by calling si.run_sorter(sorter_name='kilosort2_5', recording=rec_preprocess, folder=r'F:\B12M2S10_g0\B12M2S10_g0_imec0\kilo25-3\sort', car=False, skip_kilosort_preprocessing=True, scaleproc=200). Then you can perfectly combine rec_preprocess and kilosort output using sorting analyzer.
  3. Partially preprocess data in spikeinterface, including 'highpass_filter', 'phase_shift', 'common_reference' without 'whiten'. Feed this rec_preprocess_part into kilosort by calling si.run_sorter(sorter_name='kilosort2_5', recording=rec_preprocess_part , folder=r'F:\B12M2S10_g0\B12M2S10_g0_imec0\kilo25-3\sort', car=False, freq_min=<some value lower than the low bound on the spikeinterface bandpass>, scaleproc=200). This is in fact what most examples do. My question is how to combine rec_preprocess_part and kilosort output using sorting analyzer as data is further whitened by kilosort.

@JoeZiminski JoeZiminski self-assigned this Oct 22, 2024
@JoeZiminski
Copy link
Collaborator

Hi @chongtianyifa that looks correct to me. For 2. you may also want to run a drift-correction step prior to whitening.

For 3. this step will also be run by kilosort before whitening. The whitening step changes the waveforms significantly so it is usual to un-whiten the data before waveform extraction, so that's not a problem. However, the drift correction could be, depending on how much drift is in your data.

In this case, 3. spikeinterface will be performing postprocessing using the information from the sorting output, and on a recording that has not been drift corrected (while the sorting estimation is on corrected data). I'm not 100% sure of the practical implications of this, you may have noise propagated through the data. My understanding is, as the waveforms (across all channels) will be extracted based on the spike times from the sorting. Then these will be averaged for template generation and maximum loading channels estimated. If there is uncorrected drift, then the templates and position estimates will be noisier. @zm711 @alejoe91 @samuelgarcia will be able to confirm if this is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question General question regarding SI
Projects
None yet
Development

No branches or pull requests

4 participants