Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Adding pre-commit hooks and black formatting #560

Merged
merged 11 commits into from
Sep 30, 2021
35 changes: 35 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
# See https://github.com/pre-commit/pre-commit-hooks/blob/master/.pre-commit-config.yaml for an example with more hooks

exclude: '^excluded_files_regex$'
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.0.1
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- id: check-ast
- id: check-merge-conflict
- id: debug-statements
- id: mixed-line-ending
args: [--fix=lf]

- repo: https://github.com/PyCQA/flake8
rev: 3.9.2
hooks:
- id: flake8
additional_dependencies: [flake8-typing-imports==1.7.0]

- repo: https://github.com/pre-commit/mirrors-autopep8
rev: v1.5.7
hooks:
- id: autopep8

- repo: https://github.com/ambv/black
rev: 21.9b0
hooks:
- id: black
language_version: python3.7
31 changes: 16 additions & 15 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ jobs that run in AzureML.
ensemble) using the parameter `model_id`.
- ([#554](https://github.com/microsoft/InnerEye-DeepLearning/pull/554)) Added a parameter `pretraining_dataset_id` to
`NIH_COVID_BYOL` to specify the name of the SSL training dataset.
- ([#560](https://github.com/microsoft/InnerEye-DeepLearning/pull/560)) Added pre-commit hooks.
- ([#559](https://github.com/microsoft/InnerEye-DeepLearning/pull/559)) Adding the accompanying code for the ["Active label cleaning: Improving dataset quality under resource constraints"](https://arxiv.org/abs/2109.00574) paper. The code can be found in the [InnerEye-DataQuality](InnerEye-DataQuality/README.md) subfolder. It provides tools for training noise robust models, running label cleaning simulation and loading our label cleaning benchmark datasets.

### Changed
Expand Down Expand Up @@ -51,7 +52,7 @@ gets uploaded to AzureML, by skipping all test folders.
- ([#546](https://github.com/microsoft/InnerEye-DeepLearning/pull/546)) Environment and hello_world_model documentation updated
- ([#525](https://github.com/microsoft/InnerEye-DeepLearning/pull/525)) Enable --store_dataset_sample
- ([#495](https://github.com/microsoft/InnerEye-DeepLearning/pull/495)) Fix model comparison.
- ([#547](https://github.com/microsoft/InnerEye-DeepLearning/pull/547)) The parameter pl_find_unused_parameters was no longer used
- ([#547](https://github.com/microsoft/InnerEye-DeepLearning/pull/547)) The parameter pl_find_unused_parameters was no longer used
to initialize the DDP Plugin.
- ([#482](https://github.com/microsoft/InnerEye-DeepLearning/pull/482)) Check bool parameter is either true or false.
- ([#475](https://github.com/microsoft/InnerEye-DeepLearning/pull/475)) Bug in AML SDK meant that we could not train
Expand Down Expand Up @@ -95,8 +96,8 @@ in inference-only runs when using lightning containers.
- ([#454](https://github.com/microsoft/InnerEye-DeepLearning/pull/454)) Checking that labels are mutually exclusive.
- ([#447](https://github.com/microsoft/InnerEye-DeepLearning/pull/447/)) Added a sanity check to ensure there are no
missing channels, nor missing files. If missing channels in the csv file or filenames associated with channels are
incorrect, pipeline exits with error report before running training or inference.
- ([#446](https://github.com/microsoft/InnerEye-DeepLearning/pull/446)) Guarding `save_outlier` so that it works when
incorrect, pipeline exits with error report before running training or inference.
- ([#446](https://github.com/microsoft/InnerEye-DeepLearning/pull/446)) Guarding `save_outlier` so that it works when
institution id and series id columns are missing.
- ([#441](https://github.com/microsoft/InnerEye-DeepLearning/pull/441)) Add script to move models from one AzureML workspace to another: `python InnerEye/Scripts/move_model.py`
- ([#417](https://github.com/microsoft/InnerEye-DeepLearning/pull/417)) Added a generic way of adding PyTorch Lightning
Expand Down Expand Up @@ -147,8 +148,8 @@ with the FastMRI challenge datasets.
console for easier diagnostics.
- ([#445](https://github.com/microsoft/InnerEye-DeepLearning/pull/445)) Adding test coverage for the `HelloContainer`
model with multiple GPUs
- ([#450](https://github.com/microsoft/InnerEye-DeepLearning/pull/450)) Adds the metric "Accuracy at threshold 0.5" to the classification report (`classification_crossval_report.ipynb`).
- ([#451](https://github.com/microsoft/InnerEye-DeepLearning/pull/451)) Write a file `model_outputs.csv` with columns
- ([#450](https://github.com/microsoft/InnerEye-DeepLearning/pull/450)) Adds the metric "Accuracy at threshold 0.5" to the classification report (`classification_crossval_report.ipynb`).
- ([#451](https://github.com/microsoft/InnerEye-DeepLearning/pull/451)) Write a file `model_outputs.csv` with columns
`subject`, `prediction_target`, `label`, `model_output` and `cross_validation_split_index`. This file is not written out for sequence models.
- ([#440](https://github.com/microsoft/InnerEye-DeepLearning/pull/440)) Added support for training of self-supervised
models (BYOL and SimCLR) based on the bring-your-own-model framework. Providing examples configurations for training
Expand Down Expand Up @@ -182,22 +183,22 @@ console for easier diagnostics.
- ([#437](https://github.com/microsoft/InnerEye-DeepLearning/pull/437)) Upgrade to PyTorch-Lightning 1.2.8.
- ([#439](https://github.com/microsoft/InnerEye-DeepLearning/pull/439)) Recovery checkpoints are now
named `recovery_epoch=x.ckpt` instead of `recovery.ckpt` or `recovery-v0.ckpt`.
- ([#451](https://github.com/microsoft/InnerEye-DeepLearning/pull/451)) Change the signature for function `generate_custom_report`
- ([#451](https://github.com/microsoft/InnerEye-DeepLearning/pull/451)) Change the signature for function `generate_custom_report`
in `ModelConfigBase` to take only the path to the reports folder and a `ModelProcessing` object.
- ([#444](https://github.com/microsoft/InnerEye-DeepLearning/pull/444)) The method `before_training_on_rank_zero` of
the `LightningContainer` class has been renamed to `before_training_on_global_rank_zero`. The order in which the
hooks are called has been changed.
- ([#458](https://github.com/microsoft/InnerEye-DeepLearning/pull/458)) Simplifying and generalizing the way we handle
data augmentations for classification models. The pipelining logic is now taken care of by a ImageTransformPipeline
- ([#458](https://github.com/microsoft/InnerEye-DeepLearning/pull/458)) Simplifying and generalizing the way we handle
data augmentations for classification models. The pipelining logic is now taken care of by a ImageTransformPipeline
class that takes as input a list of transforms to chain together. This pipeline takes of applying transforms on 3D or
2D images. The user can choose to apply the same transformation for all channels (RGB example) or whether to apply
different transformation for each channel (if each channel represents a different
modality / time point for example). The pipeline can now work directly with out-of-the box torchvision transform
(as long as they support [..., C, H, W] inputs). This allows to get rid of nearly all of our custom augmentations
functions. The conversion from pipeline of image transformation to ScalarItemAugmentation is now taken care of under
2D images. The user can choose to apply the same transformation for all channels (RGB example) or whether to apply
different transformation for each channel (if each channel represents a different
modality / time point for example). The pipeline can now work directly with out-of-the box torchvision transform
(as long as they support [..., C, H, W] inputs). This allows to get rid of nearly all of our custom augmentations
functions. The conversion from pipeline of image transformation to ScalarItemAugmentation is now taken care of under
the hood, the user does not need to call this wrapper for each config class. In models derived from ScalarModelConfig
to change which augmentations are applied to the images inputs (resp. segmentations inputs), users can override
`get_image_transform` (resp. `get_segmentation_transform`). These two functions replace the old
`get_image_transform` (resp. `get_segmentation_transform`). These two functions replace the old
`get_image_sample_transforms` method. See `docs/building_models.md` for more information on augmentations.

### Fixed
Expand All @@ -219,7 +220,7 @@ console for easier diagnostics.
- ([#450](https://github.com/microsoft/InnerEye-DeepLearning/pull/450)) Delete unused `classification_report.ipynb`.
- ([#455](https://github.com/microsoft/InnerEye-DeepLearning/pull/455)) Removed the AzureRunner conda environment.
The full InnerEye conda environment is needed to submit a training job to AzureML.
- ([#458](https://github.com/microsoft/InnerEye-DeepLearning/pull/458)) Getting rid of all the unused code for
- ([#458](https://github.com/microsoft/InnerEye-DeepLearning/pull/458)) Getting rid of all the unused code for
RandAugment & Co. The user has now instead complete freedom to specify the set of augmentations to use.
- ([#468](https://github.com/microsoft/InnerEye-DeepLearning/pull/468)) Removed the `KneeSinglecoil` example model

Expand Down
26 changes: 18 additions & 8 deletions InnerEye/ML/SSL/encoders.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,10 @@ def __init__(self, encoder_name: str, use_7x7_first_conv_in_resnet: bool = True)
"""

super().__init__()
self.cnn_model = create_ssl_encoder(encoder_name=encoder_name,
use_7x7_first_conv_in_resnet=use_7x7_first_conv_in_resnet)
self.cnn_model = create_ssl_encoder(
encoder_name=encoder_name,
use_7x7_first_conv_in_resnet=use_7x7_first_conv_in_resnet,
)

def forward(self, x: T) -> T:
x = self.cnn_model(x)
Expand All @@ -55,21 +57,29 @@ def get_output_feature_dim(self) -> int:
return get_encoder_output_dim(self)


def get_encoder_output_dim(pl_module: Union[pl.LightningModule, torch.nn.Module],
dm: Optional[pl.LightningDataModule] = None) -> int:
def get_encoder_output_dim(
pl_module: Union[pl.LightningModule, torch.nn.Module],
dm: Optional[pl.LightningDataModule] = None,
) -> int:
"""
Calculates the output dimension of ssl encoder by making a single forward pass.
:param pl_module: pl encoder module
:param dm: pl datamodule
"""
# Target device
device = pl_module.device if isinstance(pl_module, pl.LightningDataModule) else \
next(pl_module.parameters()).device # type: ignore
assert (isinstance(device, torch.device))
device = (
pl_module.device
if isinstance(pl_module, pl.LightningDataModule)
else next(pl_module.parameters()).device
) # type: ignore
assert isinstance(device, torch.device)

# Create a dummy input image
if dm is not None:
from InnerEye.ML.SSL.lightning_modules.ssl_online_evaluator import SSLOnlineEvaluatorInnerEye
from InnerEye.ML.SSL.lightning_modules.ssl_online_evaluator import (
SSLOnlineEvaluatorInnerEye,
)

dataloader = dm.train_dataloader()
dataloader = dataloader[SSLDataModuleType.LINEAR_HEAD] if isinstance(dataloader, dict) else dataloader # type: ignore
batch = iter(dataloader).next() # type: ignore
Expand Down
114 changes: 76 additions & 38 deletions Tests/ML/augmentations/test_transform_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,45 +7,64 @@
import PIL
import pytest
import torch
from torchvision.transforms import (CenterCrop, ColorJitter, RandomAffine, RandomErasing, RandomHorizontalFlip,
RandomResizedCrop, Resize, ToTensor)
from torchvision.transforms import (
CenterCrop,
ColorJitter,
RandomAffine,
RandomErasing,
RandomHorizontalFlip,
RandomResizedCrop,
Resize,
ToTensor,
)
vale-salvatelli marked this conversation as resolved.
Show resolved Hide resolved
from torchvision.transforms.functional import to_tensor

from InnerEye.ML.augmentations.image_transforms import (AddGaussianNoise, ElasticTransform,
ExpandChannels, RandomGamma)
from InnerEye.ML.augmentations.transform_pipeline import ImageTransformationPipeline, \
create_transforms_from_config
from InnerEye.ML.augmentations.image_transforms import (
AddGaussianNoise,
ElasticTransform,
ExpandChannels,
RandomGamma,
)
from InnerEye.ML.augmentations.transform_pipeline import (
ImageTransformationPipeline,
create_transforms_from_config,
)

from Tests.SSL.test_data_modules import cxr_augmentation_config

import numpy as np

image_size = (32, 32)
crop_size = 24
test_image_as_array = np.ones(list(image_size)) * 255.
test_image_as_array = np.ones(list(image_size)) * 255.0
test_image_as_array[10:15, 10:20] = 1
test_image_as_pil = PIL.Image.fromarray(test_image_as_array).convert("L")
test_2d_image_as_CHW_tensor = to_tensor(test_image_as_array)

test_2d_image_as_ZCHW_tensor = test_2d_image_as_CHW_tensor.unsqueeze(0)

test_4d_scan_as_tensor = torch.ones([5, 4, *image_size]) * 255.
test_4d_scan_as_tensor = torch.ones([5, 4, *image_size]) * 255.0
test_4d_scan_as_tensor[..., 10:15, 10:20] = 1


@pytest.mark.parametrize("use_different_transformation_per_channel", [True, False])
def test_torchvision_on_various_input(use_different_transformation_per_channel: bool) -> None:
def test_torchvision_on_various_input(
use_different_transformation_per_channel: bool,
) -> None:
"""
This tests that we can run transformation pipeline with out of the box torchvision transforms on various types
of input: PIL image, 3D tensor, 4D tensors. Tests that use_different_transformation_per_channel has the correct
behavior.
"""

transform = ImageTransformationPipeline(
[CenterCrop(crop_size),
RandomErasing(),
RandomAffine(degrees=(10, 12), shear=15, translate=(0.1, 0.3))
],
use_different_transformation_per_channel)
[
CenterCrop(crop_size),
RandomErasing(),
RandomAffine(degrees=(10, 12), shear=15, translate=(0.1, 0.3)),
],
use_different_transformation_per_channel,
)

# Test PIL image input
transformed = transform(test_image_as_pil)
Expand All @@ -68,22 +87,29 @@ def test_torchvision_on_various_input(use_different_transformation_per_channel:
assert transformed.shape == torch.Size([5, 4, crop_size, crop_size])

# Same transformation should be applied to all slices and channels.
assert torch.isclose(transformed[0, 0], transformed[1, 1]).all() != use_different_transformation_per_channel
assert (
torch.isclose(transformed[0, 0], transformed[1, 1]).all()
!= use_different_transformation_per_channel
)


@pytest.mark.parametrize("use_different_transformation_per_channel", [True, False])
def test_custom_tf_on_various_input(use_different_transformation_per_channel: bool) -> None:
def test_custom_tf_on_various_input(
use_different_transformation_per_channel: bool,
) -> None:
"""
This tests that we can run transformation pipeline with our custom transforms on various types
of input: PIL image, 3D tensor, 4D tensors. Tests that use_different_transformation_per_channel has the correct
behavior. The transforms are test individually in test_image_transforms.py
"""
pipeline = ImageTransformationPipeline(
[ElasticTransform(sigma=4, alpha=34, p_apply=1),
AddGaussianNoise(p_apply=1, std=0.05),
RandomGamma(scale=(0.3, 3))
],
use_different_transformation_per_channel)
[
ElasticTransform(sigma=4, alpha=34, p_apply=1),
AddGaussianNoise(p_apply=1, std=0.05),
RandomGamma(scale=(0.3, 3)),
],
use_different_transformation_per_channel,
)

# Test PIL image input
transformed = pipeline(test_image_as_pil)
Expand All @@ -104,28 +130,35 @@ def test_custom_tf_on_various_input(use_different_transformation_per_channel: bo
assert transformed.shape == test_4d_scan_as_tensor.shape

# Same transformation should be applied to all slices and channels.
assert torch.isclose(transformed[0, 0], transformed[1, 1]).all() != use_different_transformation_per_channel
assert (
torch.isclose(transformed[0, 0], transformed[1, 1]).all()
!= use_different_transformation_per_channel
)


@pytest.mark.parametrize("expand_channels", [True, False])
def test_create_transform_pipeline_from_config(expand_channels: bool) -> None:
"""
Tests that the pipeline returned by create_transform_pipeline_from_config returns the expected transformation.
"""
transformation_pipeline = create_transforms_from_config(cxr_augmentation_config, apply_augmentations=True,
expand_channels=expand_channels)
fake_cxr_as_array = np.ones([256, 256]) * 255.
transformation_pipeline = create_transforms_from_config(
cxr_augmentation_config,
apply_augmentations=True,
expand_channels=expand_channels,
)
fake_cxr_as_array = np.ones([256, 256]) * 255.0
fake_cxr_as_array[100:150, 100:200] = 1
all_transforms = [RandomAffine(degrees=180, translate=(0, 0), shear=40),
RandomResizedCrop(scale=(0.4, 1.0), size=256),
RandomHorizontalFlip(p=0.5),
RandomGamma(scale=(0.5, 1.5)),
ColorJitter(saturation=0, brightness=0.2, contrast=0.2),
ElasticTransform(sigma=4, alpha=34, p_apply=0.4),
CenterCrop(size=224),
RandomErasing(scale=(0.15, 0.4), ratio=(0.33, 3)),
AddGaussianNoise(std=0.05, p_apply=0.5)
]
all_transforms = [
RandomAffine(degrees=180, translate=(0, 0), shear=40),
RandomResizedCrop(scale=(0.4, 1.0), size=256),
RandomHorizontalFlip(p=0.5),
RandomGamma(scale=(0.5, 1.5)),
ColorJitter(saturation=0, brightness=0.2, contrast=0.2),
ElasticTransform(sigma=4, alpha=34, p_apply=0.4),
CenterCrop(size=224),
RandomErasing(scale=(0.15, 0.4), ratio=(0.33, 3)),
AddGaussianNoise(std=0.05, p_apply=0.5),
]

if expand_channels:
all_transforms.insert(0, ExpandChannels())
Expand All @@ -134,7 +167,9 @@ def test_create_transform_pipeline_from_config(expand_channels: bool) -> None:
# In the pipeline the image is converted to tensor before applying the transformations. Do the same here.
image = ToTensor()(fake_image).reshape([1, 1, 256, 256])
else:
fake_3d_array = np.dstack([fake_cxr_as_array, fake_cxr_as_array, fake_cxr_as_array])
fake_3d_array = np.dstack(
[fake_cxr_as_array, fake_cxr_as_array, fake_cxr_as_array]
)
fake_image = PIL.Image.fromarray(fake_3d_array.astype(np.uint8)).convert("RGB")
# In the pipeline the image is converted to tensor before applying the transformations. Do the same here.
image = ToTensor()(fake_image).reshape([1, 3, 256, 256])
Expand All @@ -158,8 +193,11 @@ def test_create_transform_pipeline_from_config(expand_channels: bool) -> None:
assert torch.isclose(expected_transformed, transformed_image).all()

# Test the evaluation pipeline
transformation_pipeline = create_transforms_from_config(cxr_augmentation_config, apply_augmentations=False,
expand_channels=expand_channels)
transformation_pipeline = create_transforms_from_config(
cxr_augmentation_config,
apply_augmentations=False,
expand_channels=expand_channels,
)
transformed_image = transformation_pipeline(image)
assert isinstance(transformed_image, torch.Tensor)
all_transforms = [Resize(size=256), CenterCrop(size=224)]
Expand Down