Progress bar missing with `litdata.StreamingDataset` and wrong number of steps in an epoch #112

yhl48 · 2024-04-25T11:12:33Z

🐛 Bug

There are two separate issues here.

When training a model using litdata.StreamingDataset, the tqdm progress bar shows {steps}/? and the estimated time is missing.

Moreover, the total number of steps in an epoch seems to be independent of the number of GPUs. Instead of having total_steps = num_samples / (num_gpus * batch_size), the log returns total_steps = num_samples / batch_size

Expected behavior

The progress bar should show the estimated time and the fraction of steps that have been completed.

total_steps = num_samples / (num_gpus * batch_size)

cc @tchaton

The text was updated successfully, but these errors were encountered:

github-actions · 2024-04-25T11:12:58Z

Hi! thanks for your contribution!, great first issue!

yhl48 · 2024-04-25T22:20:41Z

I might be stating the obvious here but the issue seems to originate from this line where self.trainer.num_training_batches == inf

yhl48 · 2024-04-25T23:08:49Z

related issue Lightning-AI/pytorch-lightning#15734

yhl48 · 2024-05-06T23:10:45Z

With regard to the the data distribution across gpus, I believe this line

litdata/src/litdata/streaming/dataset.py

Line 167 in a09de86

self.distributed_env = _DistributedEnv.detect()

should be called again in __iter__ since the ddp process is initialised in Trainer, which is triggered after StreamingDataset is initialised, rendering this line

litdata/src/litdata/utilities/env.py

Line 48 in a09de86

if torch.distributed.is_available() and torch.distributed.is_initialized():

to always be False

@tchaton

yhl48 added bug Something isn't working help wanted Extra attention is needed labels Apr 25, 2024

yhl48 changed the title ~~Progress bar missing with litdata.StreamingDataset~~ Progress bar missing with litdata.StreamingDataset and wrong number of steps in an epoch Apr 25, 2024

This was referenced May 7, 2024

Resolve some bugs #121

Merged

Add support for iterate_over_all for the CombinedDataset #122

Merged

tchaton closed this as completed in #122 May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Progress bar missing with `litdata.StreamingDataset` and wrong number of steps in an epoch #112

Progress bar missing with `litdata.StreamingDataset` and wrong number of steps in an epoch #112

yhl48 commented Apr 25, 2024 •

edited

Loading

github-actions bot commented Apr 25, 2024

yhl48 commented Apr 25, 2024

yhl48 commented Apr 25, 2024

yhl48 commented May 6, 2024

Progress bar missing with litdata.StreamingDataset and wrong number of steps in an epoch #112

Progress bar missing with litdata.StreamingDataset and wrong number of steps in an epoch #112

Comments

yhl48 commented Apr 25, 2024 • edited Loading

🐛 Bug

Expected behavior

github-actions bot commented Apr 25, 2024

yhl48 commented Apr 25, 2024

yhl48 commented Apr 25, 2024

yhl48 commented May 6, 2024

Progress bar missing with `litdata.StreamingDataset` and wrong number of steps in an epoch #112

Progress bar missing with `litdata.StreamingDataset` and wrong number of steps in an epoch #112

yhl48 commented Apr 25, 2024 •

edited

Loading