Possible workaround for HDF5 growing memory issue #527

pzelasko · 2022-01-09T05:02:17Z

Intended as a workaround for growing memory issues when using HDF5 with larger datasets.

csukuangfj · 2022-01-09T05:47:04Z

lhotse/dataset/speech_recognition.py

@@ -92,6 +93,10 @@ def __getitem__(self, cuts: CutSet) -> Dict[str, Union[torch.Tensor, List[str]]]
        """
        validate_for_asr(cuts)

+        if self.batch_counter > 0 and self.batch_counter % 100 == 0:
+            close_cached_file_handles()


Shall we reset self.batch_counter in case it overflows if it runs for enough time?

Python uses special integers that can never overflow, I believe.

pzelasko · 2022-01-10T13:58:45Z

I cleaned up the code and added some docs. I'd appreciate if somebody can test it out on a large scale training and confirm that it does help; I was only able to test on small-to-medium sized problems so far.

…-n-batches' into feature/close-file-handles-every-n-batches

pzelasko · 2022-02-25T15:10:11Z

In case anybody reads this: this workaround doesn't completely solve the issue, and the memory keeps growing, just slower. There also seem to be random spikes in memory usage around the time the HDF5 files are open/closed.

I generally don't advise using LilcomHdf5Writer for data larger than ~1k hours as it will likely blow up the memory at some point.

desh2608 · 2022-02-25T16:34:18Z

@pzelasko which is the recommended storage type for large data (5-10k hours)?

pzelasko · 2022-02-25T17:17:43Z

The default one (LilcomChunkyWriter) should work as well as HDF5 but without memory leaks.

cyrta · 2022-02-25T17:36:39Z

And what about cloud storage ?

pzelasko · 2022-02-25T17:43:49Z

The recent PRs with WebDataset integration work really well with that; please refer to #582, #599, and #602. You'll need to read the documentation in the code because we don't have a high-level tutorial for these workflows yet.

pzelasko · 2022-02-25T17:48:54Z

... alternatively, there is also LilcomUrlWriter for features and AudioSource(type="url", ...) for audio which you can use with any cloud storage supported by smart_open library. It will be much less I/O efficient because it will spawn a new connection for each item, but for some types of use-cases might be viable. But WebDataset-based flow would be roughly 50-100x faster.

Example showing how to close file handles every N batches

d8d79fa

pzelasko mentioned this pull request Jan 9, 2022

Custom binary feature storage format #522

Merged

csukuangfj reviewed Jan 9, 2022

View reviewed changes

Clean up and document the HDF5 memory issue workaround

4049152

pzelasko changed the title ~~[do-not-merge] Example showing how to close file handles every N batches~~ Possible workaround for HDF5 growing memory issue Jan 10, 2022

Merge branch 'master' into feature/close-file-handles-every-n-batches

f430a53

pzelasko added 2 commits January 10, 2022 09:00

typo

82e9c87

Merge remote-tracking branch 'origin/feature/close-file-handles-every…

769bfd0

…-n-batches' into feature/close-file-handles-every-n-batches

pzelasko merged commit 79867a5 into master Jan 10, 2022

pzelasko mentioned this pull request Jan 14, 2022

RAM keep rising during training #518

Closed

pzelasko added this to the v1.0 milestone Jan 14, 2022

wgb14 mentioned this pull request Apr 11, 2022

GigaSpeech recipe k2-fsa/icefall#120

Merged

6 tasks

csukuangfj mentioned this pull request Oct 18, 2022

CSJ Data Preparation k2-fsa/icefall#617

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible workaround for HDF5 growing memory issue #527

Possible workaround for HDF5 growing memory issue #527

pzelasko commented Jan 9, 2022

csukuangfj Jan 9, 2022

danpovey Jan 9, 2022

pzelasko commented Jan 10, 2022

pzelasko commented Feb 25, 2022

desh2608 commented Feb 25, 2022

pzelasko commented Feb 25, 2022

cyrta commented Feb 25, 2022

pzelasko commented Feb 25, 2022

pzelasko commented Feb 25, 2022

Possible workaround for HDF5 growing memory issue #527

Possible workaround for HDF5 growing memory issue #527

Conversation

pzelasko commented Jan 9, 2022

csukuangfj Jan 9, 2022

Choose a reason for hiding this comment

danpovey Jan 9, 2022

Choose a reason for hiding this comment

pzelasko commented Jan 10, 2022

pzelasko commented Feb 25, 2022

desh2608 commented Feb 25, 2022

pzelasko commented Feb 25, 2022

cyrta commented Feb 25, 2022

pzelasko commented Feb 25, 2022

pzelasko commented Feb 25, 2022