You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our organization installed the Kilosort2 and Kilosort3 on clusters to provide the spike sorting service. There are multiple GPU cards installed on each node. Purposing it should able to run several sorting jobs on the same node. However, for my Neuropixels recordings (15 minutes recording, around 20 GB) as an example, if I run four sorting jobs at the same time, I frequently get four job failures and message CUDA_ERROR_ILLEGAL_ADDRESS in logs. Kilosort3 failed more often than Kilosort2. My workaround is reducing the NT value to 1/4 of default, to ensure all sorting jobs can be finished without any error. However, it increases the total running time 3 to 4 times. We guess all jobs use the same GPU device.
If our guess is correct, does assigning a specific GPU for each job help? How can we do that?
Our setup
Ubuntu/Linux
node02: 4 physical GPUs (NVIDIA GeForce RTX 3060 Ti). Each GPU has 8GB of memory
node03,04,05 : 4 physical GPUs (NVIDIA GeForce GTX 980 Ti). Each GPU has 6GB of memory
Each node allows four sorting jobs to run simultaneously. 6-8 GB memory size is enough for one sorting job.
My parameters
We use spikeinterface as a wrapper to run the kilosort, and use json file to send the parameters to the kilosort via spikeinterface.
Dear all,
Our organization installed the Kilosort2 and Kilosort3 on clusters to provide the spike sorting service. There are multiple GPU cards installed on each node. Purposing it should able to run several sorting jobs on the same node. However, for my Neuropixels recordings (15 minutes recording, around 20 GB) as an example, if I run four sorting jobs at the same time, I frequently get four job failures and message
CUDA_ERROR_ILLEGAL_ADDRESS
in logs. Kilosort3 failed more often than Kilosort2. My workaround is reducing theNT
value to 1/4 of default, to ensure all sorting jobs can be finished without any error. However, it increases the total running time 3 to 4 times. We guess all jobs use the same GPU device.If our guess is correct, does assigning a specific GPU for each job help? How can we do that?
Our setup
Ubuntu/Linux
Each node allows four sorting jobs to run simultaneously. 6-8 GB memory size is enough for one sorting job.
My parameters
We use spikeinterface as a wrapper to run the kilosort, and use json file to send the parameters to the kilosort via spikeinterface.
Error log
One example log file. kilosort3.log
The text was updated successfully, but these errors were encountered: