[DeepSpeed-Chat] Fix OOM issue in dataloader #841

youkaichao · 2024-01-01T07:52:26Z

Currently, DeepSpeed-Chat directly saves tokenized tensors on disk, which consumes hundreds GB of memory. For each string, it will be converted to max_seq_len of attention_mask and input_ids, stored in int32 or int64.

If we count about 2~3 char per token, then tokenized tensors can take on average hundreds of byte in storage. This is very problematic, and when the prompt dataset becomes larger (say 1GB), the on-disk dataset can be hundreds of GB.

What's worse, DeepSpeed-Chat will load these data in memory, which can require hundreds of GB of memory.

Per my personal experience, my 1.1GB prompt dataset incurs OOM in a 512GB machine, even if I'm just using 512 as max_seq_len. If I want to use 2048 as max_seq_len, that would be four times more memory, i.e. 2TB :(

This PR only saves the string, and tokenizes the string on-the-fly. The saved data are about the same size of the input dataset.

youkaichao · 2024-01-01T07:53:09Z

@microsoft-github-policy-service agree

youkaichao · 2024-01-03T12:44:53Z

Hi, team, any feedback on this 👀

loadams · 2025-01-24T22:25:17Z

Hi, team, any feedback on this 👀

Hi @youkaichao - sorry we didn't get to this until now. Would you want to fix the merge conflicts?

use online tokenizer to avoid oom

17c8243

youkaichao requested review from tjruwase, ShadenSmith, conglongli, awan-10, eltonzheng, minjiaz, duli2012, mrwyattii, arashb and xiaoxiawu-microsoft as code owners January 1, 2024 07:52

loadams removed request for arashb, ShadenSmith, duli2012, conglongli, mrwyattii, eltonzheng, minjiaz and xiaoxiawu-microsoft January 24, 2025 22:11

loadams self-assigned this Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DeepSpeed-Chat] Fix OOM issue in dataloader #841

[DeepSpeed-Chat] Fix OOM issue in dataloader #841

youkaichao commented Jan 1, 2024 •

edited

Loading

youkaichao commented Jan 1, 2024

youkaichao commented Jan 3, 2024

loadams commented Jan 24, 2025

[DeepSpeed-Chat] Fix OOM issue in dataloader #841

Are you sure you want to change the base?

[DeepSpeed-Chat] Fix OOM issue in dataloader #841

Conversation

youkaichao commented Jan 1, 2024 • edited Loading

youkaichao commented Jan 1, 2024

youkaichao commented Jan 3, 2024

loadams commented Jan 24, 2025

youkaichao commented Jan 1, 2024 •

edited

Loading