Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Validation breaks cached image sizes #1934

Open
feffy380 opened this issue Feb 14, 2025 · 3 comments
Open

BUG: Validation breaks cached image sizes #1934

feffy380 opened this issue Feb 14, 2025 · 3 comments

Comments

@feffy380
Copy link
Contributor

The validation split helper function does not adjust the image sizes list, so images get stretched due to being assigned the wrong size.

sizes = [None] * len(img_paths)

# We want to create a training and validation split. This should be improved in the future
# to allow a clearer distinction between training and validation. This can be seen as a
# short-term solution to limit what is necessary to implement validation datasets
#
# We split the dataset for the subset based on if we are doing a validation split
# The self.is_training_dataset defines the type of dataset, training or validation
# if self.is_training_dataset is True -> training dataset
# if self.is_training_dataset is False -> validation dataset
if self.validation_split > 0.0:
# For regularization images we do not want to split this dataset.
if subset.is_reg is True:
# Skip any validation dataset for regularization images
if self.is_training_dataset is False:
img_paths = []
# Otherwise the img_paths remain as original img_paths and no split
# required for training images dataset of regularization images
else:
img_paths = split_train_val(
img_paths,
self.is_training_dataset,
self.validation_split,
self.validation_seed
)

@rockerBOO

@rockerBOO
Copy link
Contributor

I see the sizes are set as None but I'm not following how that is stretching the images. In further reading it picks up on the None and tries to get the image size later in the flow.

@feffy380
Copy link
Contributor Author

feffy380 commented Feb 15, 2025

The validation split function shuffles and truncates the list of image paths but not the corresponding list of sizes. So when we access sizes[i] later we get a random image size from the dataset, which gets passed to the image transforms and causes the stretching

@feffy380
Copy link
Contributor Author

The easiest fix is probably to zip the paths and sizes before doing the validation split and then separate them afterwards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants