You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From the documentation, both API reference and [user guide] (https://albumentations.ai/docs/getting_started/mask_augmentation/) sections, it's not straightforward to understand which kind of mask format is supported and more importantly, if different mask formats can lead to different transformation outputs due to some internal implementation details.
Take for example a semantic segmentation task with 3 classes: A, B, and C, each class has an associated mask Ma, Mb, Mc stored as a different file. Besides RLE encoding and similar sparse formats, the most basic ways to encode a dense mask, and augment a sample are:
Read Ma, Mb, and Mc as an np array and store them in a Python list, eg masks. The transform API allows to call transformed = transform(image=image, masks=masks) and gets the augmented image and mask pair.
Read Ma, Mb, and Mc as a np array and stack them in a mask np array of shape (H, W, C), where C=3 and each array's element is True or False. Let's refer to this as one-hot boolean encoding. The transform API allows to call transformed = transform(image=image, mask=mask) and gets the augmented image and mask pair.
Read Ma, Mb, and Mc as a np array and encode them in a mask array of shape (H, W), where each array's item represents the class index (0, 1, 2). Let's refer to this as integer tensor encoding. Then I can call transformed = transform(image=image, mask=mask) and get the augmented image and mask pair.
Now, my questions are:
Does Albumentations support all of the 3 types of encodings for every transform?
Does the encoding type affect the output of a given transformation?
Is one approach better than another in terms of performance?
The text was updated successfully, but these errors were encountered:
Albumentations supports all 3. Performance is similar, results will be the same. Same transform are used under the hood. The difference is only in forth and back format conversion.
Your Question
From the documentation, both API reference and [user guide] (https://albumentations.ai/docs/getting_started/mask_augmentation/) sections, it's not straightforward to understand which kind of mask format is supported and more importantly, if different mask formats can lead to different transformation outputs due to some internal implementation details.
Take for example a semantic segmentation task with 3 classes: A, B, and C, each class has an associated mask Ma, Mb, Mc stored as a different file. Besides RLE encoding and similar sparse formats, the most basic ways to encode a dense mask, and augment a sample are:
masks
. The transform API allows to calltransformed = transform(image=image, masks=masks)
and gets the augmented image and mask pair.mask
np array of shape (H, W, C), where C=3 and each array's element is True or False. Let's refer to this as one-hot boolean encoding. The transform API allows to calltransformed = transform(image=image, mask=mask)
and gets the augmented image and mask pair.mask
array of shape (H, W), where each array's item represents the class index (0, 1, 2). Let's refer to this as integer tensor encoding. Then I can calltransformed = transform(image=image, mask=mask)
and get the augmented image and mask pair.Now, my questions are:
The text was updated successfully, but these errors were encountered: