Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kilosort3 on dandiset recording #2073

Closed
magland opened this issue Oct 4, 2023 · 20 comments
Closed

kilosort3 on dandiset recording #2073

magland opened this issue Oct 4, 2023 · 20 comments
Labels
bug in sorter The bug is in sorter code itself, not in SI

Comments

@magland
Copy link
Member

magland commented Oct 4, 2023

I'm having a hard time getting kilosort3 to run successfully on various datasets from DANDI. This may or may not be a spikeinterface problem, but I'd like to open this issue here to see if we can track down the problem.

First I'll just paste the runtime trace and maybe someone will recognize where this is coming from @alejoe91 @samuelgarcia

{
    "sorter_name": "kilosort3",
    "sorter_version": "git-d55179f4bed4",
    "datetime": "2023-10-04T19:45:22.746712",
    "runtime_trace": [
        "",
        "< M A T L A B (R) >",
        "Copyright 1984-2022 The MathWorks, Inc.",
        "R2022b (9.13.0.2049777) 64-bit (glnxa64)",
        "August 24, 2022",
        "",
        "",
        "To get started, type doc.",
        "For product information, visit www.mathworks.com.",
        "",
        "Time   0s. Computing whitening matrix..",
        "Getting channel whitening matrix...",
        "Channel-whitening matrix computed.",
        "Time   5s. Loading raw data and applying filters...",
        "Time  23s. Finished preprocessing 28 batches.",
        "Drift correction ENABLED",
        "----------------------------------------Error using indexing",
        "Subscript indices must either be real positive integers or logicals."
    ],
    "error": true,
    "error_trace": "Traceback (most recent call last):\n  File \"/mnt/home/magland/miniconda3/envs/dev/lib/python3.10/site-packages/spikeinterface/sorters/basesorter.py\", line 234, in run_from_folder\n    SorterClass._run_from_folder(sorter_output_folder, sorter_params, verbose)\n  File \"/mnt/home/magland/miniconda3/envs/dev/lib/python3.10/site-packages/spikeinterface/sorters/external/kilosortbase.py\", line 215, in _run_from_folder\n    raise Exception(f\"{cls.sorter_name} returned a non-zero exit code\")\nException: kilosort3 returned a non-zero exit code\n",
    "run_time": null
}

This is how I was running the sorter

import spikeinterface.sorters as ss

import h5py
import pynwb
import remfile

from NwbRecording import NwbRecording

# Comes from DANDI: 000409/sub-CSH-ZAD-001/sub-CSH-ZAD-001_ses-3e7ae7c0-fe8b-487c-9354-036236fa1010-chunking-27307-192_behavior+ecephys.nwb
nwb_url = 'https://api.dandiarchive.org/api/assets/b97bc304-e1e1-4164-91c4-ce7df98dd78f/download/'
nwb_remf = remfile.File(nwb_url)
file = h5py.File(nwb_remf, 'r')

recording = NwbRecording(file=file, electrical_series_path='/acquisition/ElectricalSeriesAp')

recording = recording.frame_slice(start_frame=0, end_frame=int(recording.get_sampling_frequency() * 60))

sorting = ss.run_sorter('kilosort3', recording)

and this is NwbRecording.py

from typing import Union, List
import numpy as np
import h5py
import spikeinterface as si


class NwbRecording(si.BaseRecording):
    def __init__(self,
        file: h5py.File,
        electrical_series_path: str
    ) -> None:
        electrical_series = file[electrical_series_path]
        electrical_series_data = electrical_series['data']
        dtype = electrical_series_data.dtype

        # Get sampling frequency
        if 'starting_time' in electrical_series.keys():
            t_start = electrical_series['starting_time'][()]
            sampling_frequency = electrical_series['starting_time'].attrs['rate']
        elif 'timestamps' in electrical_series.keys():
            t_start = electrical_series['timestamps'][0]
            sampling_frequency = 1 / np.median(np.diff(electrical_series['timestamps'][:1000]))
        
        # Get channel ids
        electrode_indices = electrical_series['electrodes'][:]
        electrodes_table = file['/general/extracellular_ephys/electrodes']
        channel_ids = [electrodes_table['id'][i] for i in electrode_indices]
        
        si.BaseRecording.__init__(self, channel_ids=channel_ids, sampling_frequency=sampling_frequency, dtype=dtype)
        
        # Set electrode locations
        if 'x' in electrodes_table:
            channel_loc_x = [electrodes_table['x'][i] for i in electrode_indices]
            channel_loc_y = [electrodes_table['y'][i] for i in electrode_indices]
            if 'z' in electrodes_table:
                channel_loc_z = [electrodes_table['z'][i] for i in electrode_indices]
            else:
                channel_loc_z = None
        elif 'rel_x' in electrodes_table:
            channel_loc_x = [electrodes_table['rel_x'][i] for i in electrode_indices]
            channel_loc_y = [electrodes_table['rel_y'][i] for i in electrode_indices]
            if 'rel_z' in electrodes_table:
                channel_loc_z = [electrodes_table['rel_z'][i] for i in electrode_indices]
            else:
                channel_loc_z = None
        else:
            channel_loc_x = None
            channel_loc_y = None
            channel_loc_z = None
        if channel_loc_x is not None:
            ndim = 2 if channel_loc_z is None else 3
            locations = np.zeros((len(electrode_indices), ndim), dtype=float)
            for i, electrode_index in enumerate(electrode_indices):
                locations[i, 0] = channel_loc_x[electrode_index]
                locations[i, 1] = channel_loc_y[electrode_index]
                if channel_loc_z is not None:
                    locations[i, 2] = channel_loc_z[electrode_index]
            self.set_dummy_probe_from_locations(locations)

        recording_segment = NwbRecordingSegment(
            electrical_series_data=electrical_series_data,
            sampling_frequency=sampling_frequency
        )
        self.add_recording_segment(recording_segment)

class NwbRecordingSegment(si.BaseRecordingSegment):
    def __init__(self, electrical_series_data: h5py.Dataset, sampling_frequency: float) -> None:
        self._electrical_series_data = electrical_series_data
        si.BaseRecordingSegment.__init__(self, sampling_frequency=sampling_frequency)

    def get_num_samples(self) -> int:
        return self._electrical_series_data.shape[0]

    def get_traces(self, start_frame: int, end_frame: int, channel_indices: Union[List[int], None]=None) -> np.ndarray:
        if channel_indices is None:
            return self._electrical_series_data[start_frame:end_frame, :]
        else:
            return self._electrical_series_data[start_frame:end_frame, channel_indices]

For info on remfile, see https://github.com/magland/remfile

tagging @bendichter

@samuelgarcia
Copy link
Member

Hi Jeremy.
we have a bug (I need to fix quickly) when sorting with a Nwb file which is streamed (like you do) because the file_path is an url and the internal machinery try to make a relative to the folder path.

What is strange in your case is that you were able to start the matlab! The bug I am mentioning should be triggered before. I have the intuition that your bug is from KS itself.
If you think the bug is on KS side the best is to run manually the "kilosort_master.m" file inside matlab and start a funny matlab debugging session! Good luck. I feel your pain.

@alejoe91
Copy link
Member

alejoe91 commented Oct 5, 2023

@magland is there a reason why you're not using the built-in neb extractor? It also supports streaming!

@samuelgarcia
Copy link
Member

@alejoe91 : I think Jeremy make a remfile package to replace the fsspecwhich is too slow.
I confirm this is too slow for real use.

@magland : do you think you could pacth our implementation https://github.com/SpikeInterface/spikeinterface/blob/main/src/spikeinterface/extractors/nwbextractors.py#L109
and add a new stream_mode == "remfile" ? This would be super super usefull.

Also a good way to test the sorting with remote streamed file is to run it with "spykingcircus2" which normally do not make any local copy.

@alejoe91 alejoe91 added the bug in sorter The bug is in sorter code itself, not in SI label Oct 5, 2023
@alejoe91
Copy link
Member

alejoe91 commented Oct 5, 2023

@magland can you check if the problem persists if you have more than 60 seconds? Can you try 10 minutes?

@magland
Copy link
Member Author

magland commented Oct 5, 2023

Thanks @alejoe91 and @samuelgarcia !

Yeah the error seems to be ocurring internally with kilosort, that's difficult to troubleshoot! I am getting the same error with ks 2.5, so it must be something specific to the dataset. Do you know if it's okay to have negative numbers in the data?

@magland is there a reason why you're not using the built-in neb extractor? It also supports streaming!

I wanted something simpler where I could control exactly what was being loaded and how. And I didn't want to use NWBHDF5IO, but rather use h5py directly as I didn't want the overhead.

@magland can you check if the problem persists if you have more than 60 seconds? Can you try 10 minutes?

I tried 10 minutes and got the same error.

Also a good way to test the sorting with remote streamed file is to run it with "spykingcircus2" which normally do not make any local copy.

Thanks I'll try that.

I feel your pain.

Thanks!

@magland
Copy link
Member Author

magland commented Oct 5, 2023

It seems you don't have unit tests for the matlab kilosorts. Is that correct? Is this working for anyone? I tried on toy_example(...) and got a different error.

EDIT: ks 2.5 ran successfully on toy example with 64 channels, but crashed with 32. This is difficult!

@alejoe91
Copy link
Member

alejoe91 commented Oct 5, 2023

Yes we have tests, but they're manually triggered. They are broken now because we need to update was credentials. Will fix soon!

@samuelgarcia
Copy link
Member

@magland : not that we have now generate_ground_truth_recording() which give better control on the generated recording/sorting pair than toy_example().
We plan to move all tests using generate_ground_truth_recording() but keep toy_example() (which is a wrapper on top of it in the example and documentation.

@magland
Copy link
Member Author

magland commented Oct 5, 2023

I'm working through the source code for ks3, hoping to find the culprit, and maybe there will be a parameter we can adjust to make this more stable.

Tagging: @luiztauffer

@magland
Copy link
Member Author

magland commented Oct 5, 2023

I found one of the issues which I have submited to the ks repo

MouseLand/Kilosort#566

This crash occurs when no isolated spikes are found within a ks batch.

@alejoe91
Copy link
Member

alejoe91 commented Oct 5, 2023

I don't think that will ever be fixed since ks4 is on the rise. You can try to increase the batch size!

@magland
Copy link
Member Author

magland commented Oct 5, 2023

I don't think that will ever be fixed since ks4 is on the rise. You can try to increase the batch size!

We could fork it though and recreate the docker image

@magland
Copy link
Member Author

magland commented Oct 5, 2023

Here's another issue I found:

MouseLand/Kilosort#567

This one is causing a crash for an IBL dataset, because as far as I can tell the electrode location is duplicated -- unless I am reading the data incorrectly. I need to double-check that.

@magland
Copy link
Member Author

magland commented Oct 5, 2023

Seems that the problem was that I was reading the x, y, z fields in the NWB field rather than rel_x, rel_y. The SpikeInterface extractor reads the rel_x, rel_y.

I'm curious about what the difference is between x/y/z and rel_x/rel_y (and ibl_x/ibl_y/ibl_z for that matter), and why the x/y/z includes duplicate locations. @bendichter ?

@alejoe91
Copy link
Member

alejoe91 commented Oct 5, 2023

Yeah, x y and z are supposed to be stereotaxic coordinates, while rel_x, rel_y, and rel_z are relative to the probe (that's why I would recommend using the built-in Nwb wrapper! It's heavily tested and developed by Nwb core devs ;) )

@magland
Copy link
Member Author

magland commented Oct 5, 2023

Yeah, x y and z are supposed to be stereotaxic coordinates, while rel_x, rel_y, and rel_z are relative to the probe (that's why I would recommend using the built-in Nwb wrapper! It's heavily tested and developed by Nwb core devs ;) )

Fair point. But right now I need more control over the implementation / options. We'll need to merge later on.

@alejoe91
Copy link
Member

alejoe91 commented Oct 5, 2023

Then copy paste what you need from our implementation ;)

@magland magland closed this as completed Oct 5, 2023
@samuelgarcia
Copy link
Member

@magland do you have somewhere benchmark of streaming speed remfile vs fsspec ?

@magland
Copy link
Member Author

magland commented Oct 5, 2023

@samuelgarcia The only thing I have right now is this N=1 example

example1.py - took ~6 seconds on my machine

example1_compare_fsspec.py - took ~32 seconds on my machine

Of course it's going to depend on a lot of factors. I'd be interested to see what timings you get if you run those.

They are completely stand-alone scripts! Just pip install remfile

EDIT: This last time I just ran it I got ~10 seconds and 104 seconds.

@samuelgarcia
Copy link
Member

very impressive. congrats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug in sorter The bug is in sorter code itself, not in SI
Projects
None yet
Development

No branches or pull requests

3 participants