Skip to content

keep pil data in loading? #406

Closed Answered by deependujha
rxqy asked this question in Q&A
Discussion options

You must be logged in to vote

when you call optimize function, data is serialized before writing to the chunks. src/litdata/streaming/writer.py file checks which serializer can serialize the data (serializers.py file)

In your case, JPEGSerializer might be the best fit, and if you look at its deserialize code

def deserialize(self, data: bytes) -> Union["JpegImageFile", torch.Tensor]:
        if _TORCH_VISION_AVAILABLE:
            from torchvision.io import decode_jpeg
            from torchvision.transforms.functional import pil_to_tensor

            array = torch.frombuffer(data, dtype=torch.uint8)
            # Note: Some datasets like Imagenet contains some PNG images with JPEG extension, so we fallback to PIL

Replies: 5 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by rxqy
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants