-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible workaround for HDF5 growing memory issue #527
Conversation
lhotse/dataset/speech_recognition.py
Outdated
@@ -92,6 +93,10 @@ def __getitem__(self, cuts: CutSet) -> Dict[str, Union[torch.Tensor, List[str]]] | |||
""" | |||
validate_for_asr(cuts) | |||
|
|||
if self.batch_counter > 0 and self.batch_counter % 100 == 0: | |||
close_cached_file_handles() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we reset self.batch_counter
in case it overflows if it runs for enough time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python uses special integers that can never overflow, I believe.
I cleaned up the code and added some docs. I'd appreciate if somebody can test it out on a large scale training and confirm that it does help; I was only able to test on small-to-medium sized problems so far. |
…-n-batches' into feature/close-file-handles-every-n-batches
In case anybody reads this: this workaround doesn't completely solve the issue, and the memory keeps growing, just slower. There also seem to be random spikes in memory usage around the time the HDF5 files are open/closed. I generally don't advise using LilcomHdf5Writer for data larger than ~1k hours as it will likely blow up the memory at some point. |
@pzelasko which is the recommended storage type for large data (5-10k hours)? |
The default one ( |
And what about cloud storage ? |
... alternatively, there is also |
Intended as a workaround for growing memory issues when using HDF5 with larger datasets.