-
Recently, I have heard from several folks using the clusters at JHU about the following issue. Consider a situation where you are fine-tuning one of the SSL models (e.g. Hubert) on your own data. Suppose the input data is represented as cuts of long recording --- for example, the recordings are ~30min long and cuts may be ~10s. Since the models work with raw audio, we don't precompute any features, but just load the audio on-the-fly in the dataloader. The problem seems to be that repeatedly fetching the full recording and extracting the cut segment from it creates an IO bottleneck since our clusters have a slow inter-node network. This leads to low GPU utilization. What would be the best strategy to overcome this? I was thinking if the Lhotse Shar archives may help here? Since there are some folks who are new to Lhotse, it would be great to have a small example for using Lhotse Shar. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Tagging @m-wiesner and @efrathason. |
Beta Was this translation helpful? Give feedback.
-
Cuts are already implemented this way, i.e. they load only the relevant subset of audio data from disk, not the full recording Lhotse Shar is definitely an answer to that, but I currently can't find a spare moment to write up the tutorial. But in many cases it will be sufficient to use WebDataset which offers pretty much the same I/O speed-up advantages. Please check out the Lhotse+WebDataset tutorial to get started, it will definitely help with reading speeds on the CLSP cluster (note: the webdataset export will be slow, but it's a one-time cost).
|
Beta Was this translation helpful? Give feedback.
Cuts are already implemented this way, i.e. they load only the relevant subset of audio data from disk, not the full recording
*
. But that's often not nearly enough on slow clusters with magnetic disks and slow interconnects. Usually you end up getting bottlenecked by random access reads, which can be even 100x slower than sequential reads, because the recording/other data is fragmented all over a magnetic disk and it takes quite a while to physically find it.Lhotse Shar is definitely an answer to that, but I currently can't find a spare moment to write up the tutorial. But in many cases it will be sufficient to use WebDataset which offers pretty much the same I/O speed-up advantages. Ple…