Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choose which GPU to run Kilosort #570

Closed
AntonioST opened this issue Oct 18, 2023 · 0 comments · Fixed by #595
Closed

Choose which GPU to run Kilosort #570

AntonioST opened this issue Oct 18, 2023 · 0 comments · Fixed by #595

Comments

@AntonioST
Copy link

Dear all,

Our organization installed the Kilosort2 and Kilosort3 on clusters to provide the spike sorting service. There are multiple GPU cards installed on each node. Purposing it should able to run several sorting jobs on the same node. However, for my Neuropixels recordings (15 minutes recording, around 20 GB) as an example, if I run four sorting jobs at the same time, I frequently get four job failures and message CUDA_ERROR_ILLEGAL_ADDRESS in logs. Kilosort3 failed more often than Kilosort2. My workaround is reducing the NT value to 1/4 of default, to ensure all sorting jobs can be finished without any error. However, it increases the total running time 3 to 4 times. We guess all jobs use the same GPU device.

If our guess is correct, does assigning a specific GPU for each job help? How can we do that?

Our setup

Ubuntu/Linux

  • node02: 4 physical GPUs (NVIDIA GeForce RTX 3060 Ti). Each GPU has 8GB of memory
  • node03,04,05 : 4 physical GPUs (NVIDIA GeForce GTX 980 Ti). Each GPU has 6GB of memory

Each node allows four sorting jobs to run simultaneously. 6-8 GB memory size is enough for one sorting job.

My parameters

We use spikeinterface as a wrapper to run the kilosort, and use json file to send the parameters to the kilosort via spikeinterface.

"detect_threshold": 6,
"projection_threshold": [9, 9],
"preclust_threshold": 8,
"car": false,
"minFR": 0.1,
"minfr_goodchannels": 0.0,
"freq_min": 300,
"sigmaMask": 30,
"nPCs": 3,
"ntbuff": 64,
"nfilt_factor": 4,
"NT": 16448, # workaround 
"keep_good_only": false

Error log

One example log file. kilosort3.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant