-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mosaic Transform #6534
base: main
Are you sure you want to change the base?
Mosaic Transform #6534
Conversation
@abhi-glitchhg Just checking with you to see if you got stack anywhere. :) Let me know if you face any issues. |
Hey @datumbox, Thanks for checking on me! 🤗; I was a bit busy for some time. I have gone through the mosaic implementation and have understood it; I have a basic implementation locally. Hopefully, by this weekend, I will clean up and update this PR. |
Still WIP |
first of all, I apologize for the inactivity on this pr. I'll be more regular from now on. I have used Pedestrian Dataset to check the implementation. Download the dataset import torch
from torchvision.prototype import transforms, datapoints
from torchvision.prototype.transforms import functional as F
from torchvision import utils
import os
import numpy as np
import torch
from PIL import Image
from references.detection.transforms import Mosaic
class PennFudanDataset(torch.utils.data.Dataset):
def __init__(self, root, transforms ):
self.root = root
self.transforms= transforms
# load all image files, sorting them to
# ensure that they are aligned
self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))
self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))
def __getitem__(self, idx):
# load images and masks
img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])
mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])
img = Image.open(img_path).convert("RGB")
img = F.pil_to_tensor(img)
# note that we haven't converted the mask to RGB,
# because each color corresponds to a different instance
# with 0 being background
mask = Image.open(mask_path)
# convert the PIL Image into a numpy array
mask = np.array(mask)
# instances are encoded as different colors
obj_ids = np.unique(mask)
# first id is the background, so remove it
obj_ids = obj_ids[1:]
# split the color-encoded mask into a set
# of binary masks
masks = mask == obj_ids[:, None, None]
# get bounding box coordinates for each mask
num_objs = len(obj_ids)
boxes = []
for i in range(num_objs):
pos = np.where(masks[i])
xmin = np.min(pos[1])
xmax = np.max(pos[1])
ymin = np.min(pos[0])
ymax = np.max(pos[0])
boxes.append([xmin, ymin, xmax, ymax])
# convert everything into a torch.Tensor
boxes = torch.as_tensor(boxes, dtype=torch.float32)
# there is only one class
labels = torch.ones((num_objs,), dtype=torch.int64)
masks = torch.as_tensor(masks, dtype=torch.uint8)
#image_id = torch.tensor([idx])
#area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
# suppose all instances are not crowd
#iscrowd = torch.zeros((num_objs,), dtype=torch.int64)
img = datapoints.Image(img)
boxes = datapoints.BoundingBox(boxes, format=datapoints.BoundingBoxFormat.XYXY, spatial_size=F.get_spatial_size(img) )
labels = datapoints.Label(labels)
if self.transforms is not None:
img, boxes, labels = self.transforms(img, boxes,labels)
return img, boxes, labels
def __len__(self):
return len(self.imgs)
def collate_fn(batch):
return tuple(zip(*batch))
dataset = PennFudanDataset(root="./../PennFudanPed", transforms= transforms.Resize((350,324)) ) #change the root parameter according to your dir structure.
data_loader = torch.utils.data.DataLoader(
dataset, batch_size=4, shuffle=True, num_workers=1,
collate_fn=collate_fn)
B = 16 # Batch size
counter=0
batched_images=[]
batched_boxes = []
batched_labels = []
for i in data_loader:
image,boxes, labels= i
image = torch.stack(image)
boxes = list(boxes)
labels = [*labels[0], *labels[1], *labels[2], *labels[3]]
batched_images.append(image)
batched_boxes.append(boxes)
batched_labels.append(labels)
counter+=1
if (counter>B):
break
batched_images= torch.stack(batched_images)
mosaic= Mosaic()
output = mosaic(batched_images, batched_boxes, batched_labels)
for i in range(B):
viz = utils.draw_bounding_boxes(F.to_image_tensor(output[0][i]), boxes= output[1][i])
F.to_pil_image(viz).show() |
super().__init__() | ||
self.min_frac = min_frac | ||
self.max_frac = max_frac |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we need to check if the min_frac and max_frac arguments are in between 0 and 1.
Aah we need to review, this. Well I will try my best to find time and review this 😄 as well as understand how this works :) |
yeah; sure! lmk if something is not clear |
Gentle ping for any updates. |
Part of #6323