LightningDataModule with TensorDataset and GPU resident data #14616

vimalthilak · 2022-09-09T01:30:36Z

vimalthilak
Sep 9, 2022

Hello,

Is it possible for me to implement a Lightning data module (LightningDataModule) where-in I want the dataset to reside on GPU memory all the time? The reason for doing this is because I have a fairly small dataset and small models where the cost of transfer from host to device is larger than actual training steps. In raw PyTorch, I move data to GPU once and iterate on data using TensorDataset. How can I do something similar with PL?

krshrimali · 2022-09-09T08:13:16Z

krshrimali
Sep 9, 2022

Hi, @vimalthilak

Thanks for the question! If I understand you correctly (please correct me if I'm wrong), you can do something like:

from torchvision.datasets import MNIST
from torch.utils.data import random_split, DataLoader
from torchvision import transforms
import lightning as L


class MNISTDataModule(L.LightningDataModule):
    def __init__(self, data_dir: str = "path/to/dir", batch_size: int = 32):
        super().__init__()
        self.data_dir = data_dir
        self.batch_size = batch_size

    def setup(self):
        self.mnist_train = MNIST(self.data_dir, train=False, download=True)
        self.mnist_train.targets = self.mnist_train.targets.to("cuda")
        self.mnist_train.data = self.mnist_train.data.to("cuda")

    def train_dataloader(self):
        return DataLoader(self.mnist_train, batch_size=self.batch_size)


dataset_mnist = MNISTDataModule(data_dir="dir")
dataset_mnist.setup()
assert dataset_mnist.train_dataloader().dataset.data.device.type == "cuda"

The idea is, that you can transfer the dataset to GPU inside the data module if you want. The example above is specific to MNIST dataset where you get .data and .targets, but for the other cases, you can enumerate through the dataset - convert each tensor to a CUDA tensor, and then have these returned in the __getitem__ method of your dataset class.

In LightningDataModule, you have some hooks that you can use, more info here: https://pytorch-lightning.readthedocs.io/en/latest/data/datamodule.html#transfer-batch-to-device.

Sorry if this doesn't solve your problem, please feel free to ask counter questions, and we'll be happy to respond! cc: @awaelchli - If he has anything to add here :)

0 replies

arose13 · 2024-03-07T19:21:02Z

arose13
Mar 7, 2024

This solution causes a following error whenever the num_workers are set.

Please note that without num_workers set the training loop is 4x slower.

Eg:

Base Lightning (internal batch transfers & num_workers=4) = 4min per epoch
Lightning with tensors pre-tranfered to GPU (this soln & num_workers=1) = 16min per epoch
Lightning with tensor pre-transfered to GPU and num_workers=4 = FAIL

RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataset.py", line 208, in __getitem__
    return tuple(tensor[index] for tensor in self.tensors)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataset.py", line 208, in <genexpr>
    return tuple(tensor[index] for tensor in self.tensors)
RuntimeError: CUDA error: initialization error
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LightningDataModule with TensorDataset and GPU resident data #14616

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

LightningDataModule with TensorDataset and GPU resident data #14616

vimalthilak Sep 9, 2022

Replies: 2 comments

krshrimali Sep 9, 2022

arose13 Mar 7, 2024

vimalthilak
Sep 9, 2022

krshrimali
Sep 9, 2022

arose13
Mar 7, 2024