-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spike holes: Spikes within the batching edges are not detected #594
Comments
This seems to be fixed in Kilosort4 which is live now. Please update and reopen this issue if you still have a problem. |
For reference, this is how we find such holes using the spike times, code from @oliche :
The 7ms is arbitrary. It may scale according to the batch size. For example, it would be interesting to try 3.5ms if the batch size is 1 second. Please note that the absence of such hole with this method isn’t a conclusive diagnostic, as there could be a few spikes detected nonetheless within the batch edge - this is depending on the neural signal strength (and thereby how it was processed). We would be curious to know how this is conclusively tested for in future Kilosort versions. |
I do the st%NT and histogram that. It shows clearly missing spikes for KS 2.5 and 3, but not 1, 2 and 4, though I haven't tested a lot of datasets yet. I found a possible place for the bug in the creation of the drift corrected file in 2.5 and 3. |
Sure, but I think the st%NT takes care of these confounds and gets more directly at the source of the problem. In the datasets I looked at, it was a lot more clear than when I tried your method. |
Hello, the explanation was in response to the message above, as we would not want one to think the effect is more present in pyKilosort based on the histogram graphs above. If anything, it is a good thing to see these horizontal lines only at the batch size. Using spike times modulo the batch size works also as a detection method for holes, but you have to be perfectly sure of your NT value so as to see a dip at 0. (please correct me if I misunderstand your approach !) |
Not sure I understand this argument because it is relatively indirect. It can't be a good thing to see the lines at the batch size, then you know for sure it's a problem. When you don't see the lines, you don't know if it's because we can't see the problem or there's no problem. Anyway, all the more reason to look at st%NT? Unless someone explicitly changed the batch size, it should be the same as the default in config384. |
Ah, thank you so much Adrian, I think this might solve the mystery. I was indeed able to replicate the problem in KS2.5 with non-default batch size, and it's not there with default batch size. I will continue to investigate this in different version of Kilosort. Correction: I replicated what Adrian is saying in Kilosort2, but not in Kilosort2.5, where I am still getting the holes with the default batch size. |
@GaelleChapuis can you please check whether the batch size was changed for the IBL runs? It should be 65,600 in Kilosort 2, 2.5 and 3. You mention 65,536 above. |
I found the main bug in Kilosort 2.5 and 3. The batch size was manually changed inside trackAndSort (another ntbuff was added). The patch0 releases fix this. |
Thanks, we 'll try this out ! |
If it works for you with the Matlab version of Kilosort2.5, you can see the two locations where I made changes in the trackAndSort script and adapt that for pykilosort. |
zm711 and I noticed that the bug fix branches (at least for KS2.5) is fixing the spike holes, but there seem to be a misalignment with the spike times, so that the extracted waveforms don't look right. See this example: the auto/cross correlograms (bottom right) look correct, but the waveforms are just noise. Any idea where the misalignment could take place? |
I've also tested with KS3 and see a similar set of noise waveforms in Phy. The same dataset analyzed pre-patch has nice waveforms. |
Can you please check if an offset of ntbuff samples realigns the spikes? |
Yeah it looks like if I shift by ntbuff samples the waveforms reappear in Phy. They are not exactly the same as pre-patch waveforms, but they look pretty close by eye. |
I have fixed both bugs that were causing the spike holes problem, one in a CUDA file and one in a Matlab file. I think the spike times are correct now, in versions 2.5.2 or 3.0.2 (new releases) or on the branches kilosort25 and kilosort3. One of the two bugs also affects Kilosort1 and 2. I will try to fix them there as well. |
@marius10p I'm still getting the spike holes with v2.5.2 (commit - f2e570c) Here are the spike counts histogram aligned with BATCH_SIZE: |
@alejoe91 thanks, did you change ops.NT to the new value? Also, ntbuff should be 64 and nt should be 61. |
I used default values:
|
The new default value for NT is 64 * 1024 - ntbuff. I can make that more clear in the release notes. |
let me try that |
Will test KS3 as well :) |
Here the results for v3.0.2: There seem to be a drop in spike counts in the same "spike hole" window. Is it possible? Note that this is the same recording as #594 (comment) |
I didn't see it in my own test and it shouldn't behave differently unless I forgot something, can you please run a longer/different recording to see if it replicates? NT has to be the same new value, and ntbuff nt should be the defaults. |
Sure I'll run on another longer session.
Yep, using new defaults! |
@marius10p I've run it on a different session (~30min long) and the drop in spike counts doesn't seem to be there, so all good! Thanks for the fix! |
Sorry for the delay on my end. We tested a 90min recording and also seemed fine. I've got a few people in my lab using KS2 for low channel count probes. Is the CUDA error in those fundamental in that they should update as soon as you update or is it super minor bug? |
@zm711 for 2.5 and 3 the holes are more extreme than for 2. But they are still there for 2.0, something like 30% reduced spikes in a window of ~4ms. I did updated the 2.0 as well, it's in the releases section. |
Thanks @marius10p, I'll let them know! |
Can I get some clarification on this? On the Kilosort 2 patch1 it states this issue has been fixed and I see that the CUDA files have been modified. However, configFile384.m still has ops.NT defined as Am I correct that this is wrong? That we need to subtract ops.ntbuff to avoid the 'holes' issue? And if so what is the correct formula for calculating acceptable NT sizes? Is |
Thanks for catching that, I now fixed ops.NT in configFile384. https://github.com/MouseLand/Kilosort/releases/tag/v2.0.2 The rule should be P.S. Next time, try: can I please get some clarification on this. |
Thanks for the quick reply, I appreciate it. |
No worries, let us know if anything else is unclear or if something does not work. We appreciate the feedback. |
I am going to keep this issue open so more people see it. |
Hi all, I just came across the warning on the main readme. Thanks for you attention to this. Can I ask a question? I've read the whole thread, and it's still ambiguous to me whether I need to rerun recordings that were sorted with KS2 if I used the default batch sizes. |
The problem was less severe in 2.0, but it was still there, and if you want to get rid of it, you would have to rerun with the new version in the releases. On the other hand, if you want better quality sorting, you might as well use Kilosort4, which never had this problem and which performs better in most cases. |
Thanks for the response! We haven't found KS4 to work better with our
setup, so we're sticking with 2 for now.
|
Feel free to open an issue describing your setup and @jacobpennington can take a look to see why KS4 may not be performing well. |
As raw electrophysiology data is heavy, the spike sorting algorithm truncates the data into smaller portions (called batches, of size ~2.18 seconds for the latest versions of Kilosort) to work with it.
To ensure information continuity, the edges of these batches are processed slightly differently than the middle portion of the batch. We found that at the edge of the spike sorting batches, ~7ms of spikes are going undetected. We call this effect "spike holes".
Below we show an example of raw electrophysiology data with such "spike hole" highlighted in blue.
The voltage traces are displayed as grey scale; the y-axis represents the electrode channels, and the x-axis a short snippet (~80ms) of time. Spikes detected by Kilosort are marked with a green or red dots for good or bad units respectively. What you can see on this example is that spikes are not detected within this blue window, that corresponds precisely to the edge of a spike sorting batch.
Specifically, every 65 536 samples (~2.18 seconds at 30 kHz) spike detection is less likely for ~7 ms - and in many cases there is no spike detection at all during this period. This represents a loss of 0.3% of the data.
One way to detect those occurrences working on the spike trains is to compute the derivative of the spike times. We retain the samples that have a derivative > 7ms (i.e. there are no spikes detected for at least 7ms, equivalent to a “spike hole”), and compute the time difference between this sample and the next sample with such a derivative > 7ms.
To visualize the batch effect, we then plot the time difference between samples that are marked as having a 7ms hole, but divided by the batch size (~2.18 seconds), as a histogram.
Below is an example of such a plot for one recording, in blue. The decline in the height of the bars as the number of batches increases to the left is expected, as it is more likely to find “spike holes” separated by 1 batch. Indeed, a bar on this graph at e.g. 5 batches indicates that for 4 consecutive batches the neural signals were strong enough for a few spikes to be detected nonetheless within the batch edge.
The problem is more visible when plotting the same graph as above but across several recordings.
Below is a plot of >600 recordings (IBL, run with pyKilosort), where each recording is a vertical line (x-axis: PID number, y-axis: batch). The batch effect is seen as horizontal lines at integer values (1, 2, 3, 4).
The same effect is also visible on IBL recordings spike sorted using the Matlab version of the algorithm (see below). Note that here the effect is seen at every ½ batch size.
We used 91 datasets from Steinmetz and performed the same analysis using the original spike sorting output (likely version: Kilosort 1.0 - Matlab). Indeed, “spike holes” are visible, as per the graph below. Note the effect at ½ batch size.
We also saw this effect using the Allen dataset (visual coding; likely version: Kilosort 2.0):
Note that when re-sorting the Steinmetz dataset using pyKilosort, the issue is visible at the batch size:
This issue is therefore present on the earliest versions of the code, thereby affecting a wide range of datasets.
We are working on identifying the core CUDA function that would underlie this error.
For a minimal reproducible example, see this repo
The “spike hole” effect is an effect that occurs regularly, i.e. every ~2.18 seconds, for a duration of ~7ms. This effect is unlikely to drastically affect scientific results published with the error.
The text was updated successfully, but these errors were encountered: