Pickling support for LazyDict + minor fixes #292

pzelasko · 2021-05-03T04:10:42Z

This solves an issue that is hard to notice unless one uses very large manifests -- when the mmapped file gets pickled for transfer to dataloader's worker processes, Python reads the whole file and tries to pickle it which blows up RAM. I changed the pickling behaviour for that class to only transfer the path to the file and re-open it in the new process. After this change, the training takes approx. 5-6GB CPU RAM per GPU, and I'm able to run snowfall training on full MLS with 4xGPU and on-the-fly feature extraction with no I/O or memory issues (with shuffle=False).

danpovey · 2021-05-03T06:34:09Z

Cool!

…

On Mon, May 3, 2021 at 12:10 PM Piotr Żelasko ***@***.***> wrote: This solves an issue that is hard to notice unless one uses very large manifests -- when the mmapped file gets pickled for transfer to dataloader's worker processes, Python reads the whole file and tries to pickle it which blows up RAM. I changed the pickling behaviour for that class to only transfer the path to the file and re-open it in the new process. After this change, the training takes approx. 5-6GB CPU RAM per GPU, and I'm able to run snowfall training on full MLS with 4xGPU and on-the-fly feature extraction with no I/O or memory issues (with shuffle=False). ------------------------------ You can view, comment on, or merge this pull request online at: #292 Commit Summary - Improvements to training on larger data - Clean up the code with pickling support for LazyDict - Add a test for pickling LazyDict File Changes - *M* lhotse/audio.py <https://github.com/lhotse-speech/lhotse/pull/292/files#diff-253632414d6d8d9e8cf165bc027256f8ab116cd8294b340bdcebde8849ba03d8> (1) - *M* lhotse/bin/modes/manipulation.py <https://github.com/lhotse-speech/lhotse/pull/292/files#diff-e6d3004805d1468e214bbca90ada0d13f618ff102c99c22ad6d3112f7db7bb21> (3) - *M* lhotse/dataset/cut_transforms/perturb_speed.py <https://github.com/lhotse-speech/lhotse/pull/292/files#diff-be09090a989fc7a808298984ff3c048b1c9ef83cb24dde1e8e82bc7d9e14cc07> (4) - *M* lhotse/serialization.py <https://github.com/lhotse-speech/lhotse/pull/292/files#diff-f78ab932fd4fd27508aca6a51024965d6cdf8bcb567525b3702bf36dfa28e5c4> (56) - *M* test/test_serialization.py <https://github.com/lhotse-speech/lhotse/pull/292/files#diff-83000352a802d7e7229f542b79d6ebc486dce5314daee2431f6940173c550c17> (30) Patch Links: - https://github.com/lhotse-speech/lhotse/pull/292.patch - https://github.com/lhotse-speech/lhotse/pull/292.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#292>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLOYNQSM5JVD5L3B4SCDTLYO47ANCNFSM44ADRNOQ> .

pzelasko added 3 commits May 2, 2021 23:44

Improvements to training on larger data

2cae595

Clean up the code with pickling support for LazyDict

0b62a69

Add a test for pickling LazyDict

177a519

pzelasko added this to the v0.7 milestone May 3, 2021

pzelasko merged commit d5f23e8 into master May 3, 2021

pzelasko deleted the feature/improvements-large-data-training branch July 1, 2021 01:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pickling support for LazyDict + minor fixes #292

Pickling support for LazyDict + minor fixes #292

pzelasko commented May 3, 2021

danpovey commented May 3, 2021 via email

Pickling support for LazyDict + minor fixes #292

Pickling support for LazyDict + minor fixes #292

Conversation

pzelasko commented May 3, 2021

danpovey commented May 3, 2021 via email