Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion Error when using Phy2 #284

Closed
Idavr opened this issue Nov 18, 2020 · 28 comments · Fixed by #595
Closed

Assertion Error when using Phy2 #284

Idavr opened this issue Nov 18, 2020 · 28 comments · Fixed by #595

Comments

@Idavr
Copy link

Idavr commented Nov 18, 2020

Hi,

Recently after upgrading to Kilosort2.5 I am having trouble opening the spike sorted outputs to phy (standard and development version). I ran "--debug" on the command and I get this output + the error:

10:04:20.400 [D] init:68 Start capturing exceptions.
10:04:20.494 [W] model:591 Unreferenced clusters found in templates (generally not a problem)
10:04:20.547 [D] model:607 Loading spike clusters.
10:04:20.607 [W] model:613 Unreferenced clusters found in spike_clusters (generally not a problem)
10:04:20.663 [D] model:557 No channel shank file found.
10:04:20.664 [D] model:680 Loading templates.
10:04:20.666 [D] model:709 Templates are dense.
10:04:20.666 [W] model:655 Skipping spike waveforms that do not exist, they will be extracted on the fly from the raw data as needed.
10:04:20.667 [D] model:715 Loading the whitening matrix.
10:04:20.669 [D] model:722 Loading the inverse of the whitening matrix.
10:04:20.676 [W] model:55 4908/1507984 values are nan in C:\Users\idavalik\Data\Raw_SpikeGLX\Social_track_g0\Social_track_g0_imec0\similar_templates.npy, replacing by zero.
10:04:20.680 [D] model:751 Loading features.
10:04:20.682 [D] model:766 Features are sparse.
10:04:20.683 [D] model:788 Loading template features.
10:04:20.685 [D] model:798 Template features are sparse.
10:04:20.686 [D] model:492 Load cluster_Amplitude.tsv.
10:04:20.690 [D] model:492 Load cluster_ContamPct.tsv.
10:04:20.694 [D] model:492 Load cluster_group.tsv.
10:04:20.699 [D] model:492 Load cluster_KSLabel.tsv.
10:04:20.996 [D] context:100 Initialize joblib cache dir at C:\Users\idavalik\Data\Raw_SpikeGLX\Social_track_g0\Social_track_g0_imec0\.phy.
10:04:20.998 [D] context:101 Reducing the size of the cache if needed.
10:04:21.001 [D] base:102 Add filter high_pass.
10:04:21.002 [D] config:31 Load config file C:\Users\idavalik\.phy\phy_config.py.
10:04:21.003 [D] plugin:145 Loading 0 plugins.
10:04:21.006 [D] context:209 The file C:\Users\idavalik\Data\Raw_SpikeGLX\Social_track_g0\Social_track_g0_imec0\.phy\new_cluster_id.pkl doesn't exist.
10:04:21.089 [D] context:185 Save data to C:\Users\idavalik\Data\Raw_SpikeGLX\Social_track_g0\Social_track_g0_imec0\.phy\spikes_per_cluster.pkl.
10:04:21.227 [D] gui:463 Creating GUI.
10:04:21.229 [D] state:46 Load C:\Users\idavalik.phy\TemplateGUI\state.json for GUIState.
10:04:21.230 [D] state:46 Load C:\Users\idavalik\Data\Raw_SpikeGLX\Social_track_g0\Social_track_g0_imec0.phy\state.json for GUIState.
10:04:21.595 [E] init:62 An error has occurred (AssertionError):
Traceback (most recent call last):
File "C:\Users\idavalik\Anaconda3\envs\phy2dev\Scripts\phy-script.py", line 33, in
sys.exit(load_entry_point('phy', 'console_scripts', 'phy')())
File "C:\Users\idavalik\Anaconda3\envs\phy2dev\lib\site-packages\click\core.py", line 829, in call
return self.main(*args, **kwargs)
File "C:\Users\idavalik\Anaconda3\envs\phy2dev\lib\site-packages\click\core.py", line 782, in main
rv = self.invoke(ctx)
File "C:\Users\idavalik\Anaconda3\envs\phy2dev\lib\site-packages\click\core.py", line 1259, in invoke
return process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\idavalik\Anaconda3\envs\phy2dev\lib\site-packages\click\core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\idavalik\Anaconda3\envs\phy2dev\lib\site-packages\click\core.py", line 610, in invoke
return callback(*args, **kwargs)
File "C:\Users\idavalik\Anaconda3\envs\phy2dev\lib\site-packages\click\decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "c:\users\idavalik\phy\phy\apps_init
.py", line 159, in cli_template_gui
template_gui(params_path, **kwargs)
File "c:\users\idavalik\phy\phy\apps\template\gui.py", line 218, in template_gui
gui = controller.create_gui()
File "c:\users\idavalik\phy\phy\apps\base.py", line 1697, in create_gui
self.supervisor.attach(gui)
File "c:\users\idavalik\phy\phy\cluster\supervisor.py", line 946, in attach
self._create_views(
File "c:\users\idavalik\phy\phy\cluster\supervisor.py", line 761, in _create_views
gui, data=self.cluster_info, columns=self.columns, sort=sort)
File "c:\users\idavalik\phy\phy\cluster\supervisor.py", line 921, in cluster_info
return [self.get_cluster_info(cluster_id) for cluster_id in self.clustering.cluster_ids]
File "c:\users\idavalik\phy\phy\cluster\supervisor.py", line 921, in
return [self.get_cluster_info(cluster_id) for cluster_id in self.clustering.cluster_ids]
File "c:\users\idavalik\phy\phy\cluster\supervisor.py", line 746, in get_cluster_info
out[key] = func(cluster_id)
File "c:\users\idavalik\phy\phy\apps\base.py", line 1194, in get_best_channel_label
return self._get_channel_labels([self.get_best_channel(cluster_id)])[0]
File "c:\users\idavalik\phy\phy\utils\context.py", line 154, in memcached
out = f(*args, **kwargs)
File "c:\users\idavalik\phy\phy\apps\base.py", line 1188, in get_best_channel
channel_ids = self.get_best_channels(cluster_id)
File "c:\users\idavalik\phy\phy\utils\context.py", line 154, in memcached
out = f(*args, **kwargs)
File "c:\users\idavalik\phy\phy\apps\template\gui.py", line 151, in get_best_channels
template = self.model.get_template(template_id)
File "c:\users\idavalik\phylib\phylib\io\model.py", line 928, in get_template
return self._get_template_dense(
File "c:\users\idavalik\phylib\phylib\io\model.py", line 867, in get_template_dense
channel_ids
, amplitude, best_channel = self._find_best_channels(
File "c:\users\idavalik\phylib\phylib\io\model.py", line 844, in _find_best_channels
assert best_channel in channel_ids
AssertionError

I have no problems opening up earlier files that were processed with KS2, so from the help I have been getting trying to solve this it seems like there is a mismatch between something in phy and something in the way KS2.5 saves the output, but I am having trouble figuring out exactly where the error is coming from. If you have any insight into this I would greatly appreciate it!

@LaurenzMuessig
Copy link

channel_map.npy is a row vector in KS2.5 whereas it's a column vector in KS2 - that might be the issue (I had a similar issue when trying to use part of the Allen ecephys repo after upgrading to KS2.5).

Cheers

Laurenz

@Idavr
Copy link
Author

Idavr commented Nov 18, 2020

Ah, I see! May I ask how you fixed it? I am still quite new to program-language and e-phys...

@LaurenzMuessig
Copy link

In Matlab you can readNPY('YourChannelMap.npy'); then make the variable you loaded a column vector by e.g. transposing someVariable'; then overwrite the channel_map by writeNPY(someVariable,'YourChannelMap.npy'); (readNPY and writeNPY are functions from the npy_matlab repo which KS uses, so you should have them).
Not sure this is really your problem btw. I can load my data into phy without doing that, but your error message reminded me of the issue I had with the Allen repo. I hope that helps

Laurenz

@Idavr
Copy link
Author

Idavr commented Nov 19, 2020

Hmm... Thanks for the suggestion, but looking over the channel_map.npy files of two different files I found that they were similar, but one files is possible to open with phy, while the second is not... I got someone else to look at the KS2.5 outputs and they had not problems opening the files so it isn't (luckily) anything wrong with the recordings themselves, but this new error that has popped up is very, very confusing as - for some unknown reason - I am unable to open any of my more recent files with phy. Whatever more suggestions to which files or avenues to look into to troubleshoot this is greatly appreciated!

@LaurenzMuessig
Copy link

Sorry to hear you still can't load your data into phy. But if you are saying someone else can load your data with no issue on a different machine (did I understand that correctly?), I would just delete phy on your machine and do a clean install.

@Idavr
Copy link
Author

Idavr commented Nov 19, 2020

You did understand correctly, and this is where it get's peculiar. Not only have I done a clean install by deleting phy (normal and developers version) as well as the virtual environments on the computer, but I have a second computer that has the exact same problem with the exact same files...

@LaurenzMuessig
Copy link

Okay. but you can open the data on someone elses machine who is running a recent version of phy?

@Idavr
Copy link
Author

Idavr commented Nov 19, 2020

Correct. I updated to the most recent versions as well and I still can not open them.

@LaurenzMuessig
Copy link

Have you deleted the .phy folder from your data directory before trying to reopen in phy after the clean install?

@Idavr
Copy link
Author

Idavr commented Nov 19, 2020

Yes.

@LaurenzMuessig
Copy link

Okay, The last thing I can suggest is to go and 'hack' the python script where the error occurs. You can e.g. temporarily add some print commands around the line where the error occurs, to print the variable contents of the things that cause the issue (best_channel and channel_ids judging from the error message you posted above) to your terminal. That might give you a better idea what's going wrong.

@Idavr
Copy link
Author

Idavr commented Nov 19, 2020

Thanks for the assistance, I'll look into it. Troubleshooting since yesterday morning indicates that there is definitely something wonky with my personal phy set-up, but exactly what and how to fix it is vexing. Appreciate you taking the time to answer!

@LaurenzMuessig
Copy link

No worries. I hope you find the issue quickly. Good luck

@Idavr
Copy link
Author

Idavr commented Nov 21, 2020

Update: It does seem like there is some kind of issue with the KS2.5 output, as even if I specify the correct channels the params.py file gets saved with 383 channels (instead of 385) and the dat-path is set to be the temp_wh.dat file instead of the raw .bin file. Running the same files through KS2 I had no trouble accessing them with phy and params.py was saved correctly.

@LaurenzMuessig
Copy link

I think it's supposed to be like that in KS 2.5. The output binary is the drift corrected raw data, so you should use that for the curation in phy (although tbh I don't see a big difference to the uncorrected raw data but I guess that depends how big the drift is in the dataset). That data will only contain the channels that are contained in the channel map, so it's missing at least the reference and the sync channel from a 1.0 probe.

@Idavr
Copy link
Author

Idavr commented Nov 21, 2020

Ah, ok. The error does seem to be related to the channels then, and there is nothing wrong with the files themselves (like I said, I can open them without problems when run through with KS2), but could this have something to do with some warnings that were thrown up when doing MexGPUall.m regarding spikedetector3.cu? It finished with Mex Compiled Successfully but noticing that all of the warnings referenced channel-related variables (like NchanMax and NchanUp) it made me curious.

@marius10p
Copy link
Contributor

marius10p commented Nov 22, 2020

@ldavr, the mex compilation warnings have nothing to do with it. I'm trying to reproduce the problem on my end, will keep you posted.

@Idavr
Copy link
Author

Idavr commented Nov 23, 2020

Thank you very much @marius10p ! Really appreciate it. This showed up in the terminal right before the Assertion Error:

4908/1507984 values are nan in C:\Users\idavalik\Data\Neuropixels\Social_track_g0\Social_track_g0_imec0\similar_templates.npy, replacing by zero.�[0m
�[31m12:24:36.108 [E] init:62

In case it might help.

@Idavr
Copy link
Author

Idavr commented Nov 24, 2020

I tried messing around with this more as well and the issue is indeed inconsistent. For this particular data set I have posted about here I found one bad channel that I tried disconnecting, but that still gave the same Assertion Error. However, when I connected channel 192 and also included the bad channel in the analysis (thus 0 channels being disconnected) I was able to open this specific file with phy.

@torbenott
Copy link

We see the same issue, here are more details.

Our error occurs on a slightly different line number (but it seems to be the same stack trace) reproduced here anyway

(phy2) D:\Neurodata\TQ01\20201222_165239\20201222_165239.kilosort>phy template-gui params.py ←[33m18:50:20.914 [W] model:545 Unreferenced clusters found in templates (generally not a problem)←[0m ←[33m18:50:20.945 [W] model:567 Unreferenced clusters found in spike_clusters (generally not a problem)←[0m ←[33m18:50:20.955 [W] model:54 212668/565504 values are nan in D:\Neurodata\TQ01\20201222_165239\20201222_165239.kilosort\similar_templates.npy, replacing by zero.←[0m ←[31m18:50:21.234 [E] __init__:62 An error has occurred (AssertionError): Traceback (most recent call last): File "c:\users\adam\anaconda3\envs\phy2\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "c:\users\adam\anaconda3\envs\phy2\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\Adam\anaconda3\envs\phy2\Scripts\phy.exe\__main__.py", line 7, in <module> sys.exit(phycli()) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\click\core.py", line 829, in __call__ return self.main(*args, **kwargs) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\click\core.py", line 782, in main rv = self.invoke(ctx) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\click\core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\click\core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\click\core.py", line 610, in invoke return callback(*args, **kwargs) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\click\decorators.py", line 21, in new_func return f(get_current_context(), *args, **kwargs) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\apps\__init__.py", line 159, in cli_template_gui template_gui(params_path, **kwargs) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\apps\template\gui.py", line 199, in template_gui gui = controller.create_gui() File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\apps\base.py", line 1638, in create_gui self.supervisor.attach(gui) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\cluster\supervisor.py", line 942, in attach gui=gui, sort=gui.state.get('ClusterView', {}).get('current_sort', None)) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\cluster\supervisor.py", line 760, in _create_views gui, data=self.cluster_info, columns=self.columns, sort=sort) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\cluster\supervisor.py", line 916, in cluster_info return [self.get_cluster_info(cluster_id) for cluster_id in self.clustering.cluster_ids] File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\cluster\supervisor.py", line 916, in <listcomp> return [self.get_cluster_info(cluster_id) for cluster_id in self.clustering.cluster_ids] File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\cluster\supervisor.py", line 745, in get_cluster_info out[key] = func(cluster_id) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\apps\base.py", line 1150, in get_best_channel_label return self._get_channel_labels([self.get_best_channel(cluster_id)])[0] File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\utils\context.py", line 154, in memcached out = f(*args, **kwargs) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\apps\base.py", line 1144, in get_best_channel channel_ids = self.get_best_channels(cluster_id) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\utils\context.py", line 154, in memcached out = f(*args, **kwargs) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phy\apps\template\gui.py", line 149, in get_best_channels template = self.model.get_template(template_id) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phylib\io\model.py", line 855, in get_template return self._get_template_dense(template_id, channel_ids=channel_ids) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phylib\io\model.py", line 798, in _get_template_dense channel_ids_, amplitude, best_channel = self._find_best_channels(template) File "c:\users\adam\anaconda3\envs\phy2\lib\site-packages\phylib\io\model.py", line 787, in _find_best_channels assert best_channel in channel_ids AssertionError ←[0m

Following the advice in the Github issue on the Phy page, we have tried to debug this error a number of ways.

  1. We changed the parameters nblocks = 0 and minFr = 1/600, this change fixed our issue for recordings from one rat, however it does not fix our issue for recordings for a second rat.
  2. We tried downgrading to KS2.0. This always works (for both rats) however the clustering results are worse for the rat we are able to run in KS2.5, and we would really like to check KS2.5 in the other rat. In particular we see what looks like instability and noisy events in good clusters in KS2.0, even though to our eyes the recording looks fine, and even is stable across days. A short recording from the rat in question is here https://drive.google.com/file/d/1_RazxXIi_Zx0szf7ndJ1drZT-iL2zum1/view?usp=sharing, and all the associated KS files can be found here: https://drive.google.com/drive/folders/1JsMntEZVkOgcIqvT7dMnDyASxnYsyFOA?usp=sharing
  3. We have tried to do some debugging of the issue ourselves. It seems that there are some templates where all the channels are completely nan over all time, causing the assertion error in phy in _find_best_channels. In particular, this means that max_amplitude will be nan, which causes obvious problems. We've tried to hack around this a little, but keep running into other problems - we think this is a kilosort issue, as it's the saved output of KS2.5 (in particular templates.npy, although we haven't checked exhaustively that more files don't contain nans) that contains a bunch of nans. To reiterate, there are some templates - even templates that are referenced in spike_templates - that exclusively contain nans! This seems to create an issue for Phy, not sure if it's a change in this version of KS.

@torbenott
Copy link

We looked into the KS2.5 code and found that in trackAndSort (called by runTemplates which is called by learnAndSolve8b) the number of spikes per template nsp is 0 for many templates. This will create NaNs in rez.dWU in line 226 (dividing 0 by 0!), which later will be inherited to W and therefore to templates.

@achristensen56
Copy link

just wanted to link this issue: cortex-lab/phy#1060 -- I believe it's the same problem as we (@torbenott & I) see with our data.

@rossant
Copy link
Contributor

rossant commented Jan 7, 2021

@brykko
Copy link

brykko commented Jan 16, 2021

I tried to locate the origin of this problem and independently came to the same conclusion as @torbenott. I found a simple workaround which solved the Phy problem for me –

Here's line 226 from "trackAndSort.m":

rez.dWU = dWU1 ./ reshape(nsp, [1,1,Nfilt]);

I replaced it with the version below, which only performs the division for templates with at least one spike:

valid = nsp > 0;
rez.dWU(:, :, valid) = dWU1(:, :, valid) ./ reshape(nsp(valid), [1,1,sum(valid)]);

@Hankslab-ABG
Copy link

@brykko

This seemed to work for me as well, though I'm a bit puzzled as it yielded rather different results when I compared KS2.5 with this workaround to KS2.5 as is with the same dataset. Have you had a similar experience?

@Hankslab-ABG
Copy link

Actually never mind on that front. I had my test dataset mislabeled as KS2.5 when really it was KS2. Your fix seems to produce identical results, at least in that dataset.

@marius10p
Copy link
Contributor

Thanks for working this out everyone. In principle there shouldn't be any spikes with nsp = 0, but I guess it sometimes happens. I pushed a fixed adding a small constant to the normalization in line 236 of trackAndSort. Reopen this issue if there are still problems.

@Hankslab-ABG
Copy link

Hankslab-ABG commented Mar 16, 2021

Sorry to comment on a closed issue, but the fix @marius10p implemented isn't working for me using KS 2.5. Oddly, that fix results in a CUDA error, but changing it as follows seems to work fine:

`% rez.dWU = dWU1 ./ (1e-10 + single(reshape(nsp, [1,1,Nfilt])));

rez.dWU = dWU1 ./ (1e-10 + reshape(nsp, [1,1,Nfilt]));`

So apparently making this not single precision alleviates the CUDA error. Any idea why this might be the case? The error is a CUDA illegal address in reference to the following line from splitAllClusters.m (line 189)

[WtW, iList] = getMeWtW(single(rez.W), single(rez.U), Nnearest); % we re-compute similarity scores between templates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants