microsoft · vale-salvatelli · Sep 30, 2021 · Sep 14, 2021 · Sep 14, 2021 · Sep 14, 2021
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,35 @@
+# See https://pre-commit.com for more information
+# See https://pre-commit.com/hooks.html for more hooks
+# See https://github.com/pre-commit/pre-commit-hooks/blob/master/.pre-commit-config.yaml for an example with more hooks
+
+exclude: '^excluded_files_regex$'
+repos:
+-   repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.0.1
+    hooks:
+    - id: trailing-whitespace
+    - id: end-of-file-fixer
+    - id: check-yaml
+    - id: check-added-large-files
+    - id: check-ast
+    - id: check-merge-conflict
+    - id: debug-statements
+    - id: mixed-line-ending
+      args: [--fix=lf]
+
+-   repo: https://github.com/PyCQA/flake8
+    rev: 3.9.2
+    hooks:
+    -   id: flake8
+        additional_dependencies: [flake8-typing-imports==1.7.0]
+
+-   repo: https://github.com/pre-commit/mirrors-autopep8
+    rev: v1.5.7
+    hooks:
+    - id: autopep8
+
+-   repo: https://github.com/ambv/black
+    rev: 21.9b0
+    hooks:
+    - id: black
+      language_version: python3.7
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -22,6 +22,7 @@ jobs that run in AzureML.
   ensemble) using the parameter `model_id`.
 - ([#554](https://github.com/microsoft/InnerEye-DeepLearning/pull/554)) Added a parameter `pretraining_dataset_id` to
   `NIH_COVID_BYOL` to specify the name of the SSL training dataset.
+- ([#560](https://github.com/microsoft/InnerEye-DeepLearning/pull/560)) Added pre-commit hooks.
 - ([#559](https://github.com/microsoft/InnerEye-DeepLearning/pull/559)) Adding the accompanying code for the ["Active label cleaning: Improving dataset quality under resource constraints"](https://arxiv.org/abs/2109.00574) paper. The code can be found in the [InnerEye-DataQuality](InnerEye-DataQuality/README.md) subfolder. It provides tools for training noise robust models, running label cleaning simulation and loading our label cleaning benchmark datasets.
 
 ### Changed
@@ -51,7 +52,7 @@ gets uploaded to AzureML, by skipping all test folders.
 - ([#546](https://github.com/microsoft/InnerEye-DeepLearning/pull/546)) Environment and hello_world_model documentation updated
 - ([#525](https://github.com/microsoft/InnerEye-DeepLearning/pull/525)) Enable --store_dataset_sample
 - ([#495](https://github.com/microsoft/InnerEye-DeepLearning/pull/495)) Fix model comparison.
-- ([#547](https://github.com/microsoft/InnerEye-DeepLearning/pull/547)) The parameter pl_find_unused_parameters was no longer used 
+- ([#547](https://github.com/microsoft/InnerEye-DeepLearning/pull/547)) The parameter pl_find_unused_parameters was no longer used
 to initialize the DDP Plugin.
 - ([#482](https://github.com/microsoft/InnerEye-DeepLearning/pull/482)) Check bool parameter is either true or false.
 - ([#475](https://github.com/microsoft/InnerEye-DeepLearning/pull/475)) Bug in AML SDK meant that we could not train
@@ -95,8 +96,8 @@ in inference-only runs when using lightning containers.
 - ([#454](https://github.com/microsoft/InnerEye-DeepLearning/pull/454)) Checking that labels are mutually exclusive.
 - ([#447](https://github.com/microsoft/InnerEye-DeepLearning/pull/447/)) Added a sanity check to ensure there are no
   missing channels, nor missing files. If missing channels in the csv file or filenames associated with channels are
-  incorrect, pipeline exits with error report before running training or inference. 
-- ([#446](https://github.com/microsoft/InnerEye-DeepLearning/pull/446)) Guarding `save_outlier` so that it works when 
+  incorrect, pipeline exits with error report before running training or inference.
+- ([#446](https://github.com/microsoft/InnerEye-DeepLearning/pull/446)) Guarding `save_outlier` so that it works when
 institution id and series id columns are missing.
 - ([#441](https://github.com/microsoft/InnerEye-DeepLearning/pull/441)) Add script to move models from one AzureML workspace to another: `python InnerEye/Scripts/move_model.py`
 - ([#417](https://github.com/microsoft/InnerEye-DeepLearning/pull/417)) Added a generic way of adding PyTorch Lightning
@@ -147,8 +148,8 @@ with the FastMRI challenge datasets.
 console for easier diagnostics.
 - ([#445](https://github.com/microsoft/InnerEye-DeepLearning/pull/445)) Adding test coverage for the `HelloContainer`
   model with multiple GPUs
-- ([#450](https://github.com/microsoft/InnerEye-DeepLearning/pull/450)) Adds the metric "Accuracy at threshold 0.5" to the classification report (`classification_crossval_report.ipynb`). 
-- ([#451](https://github.com/microsoft/InnerEye-DeepLearning/pull/451)) Write a file `model_outputs.csv` with columns 
+- ([#450](https://github.com/microsoft/InnerEye-DeepLearning/pull/450)) Adds the metric "Accuracy at threshold 0.5" to the classification report (`classification_crossval_report.ipynb`).
+- ([#451](https://github.com/microsoft/InnerEye-DeepLearning/pull/451)) Write a file `model_outputs.csv` with columns
   `subject`, `prediction_target`, `label`, `model_output` and `cross_validation_split_index`. This file is not written out for sequence models.
 - ([#440](https://github.com/microsoft/InnerEye-DeepLearning/pull/440)) Added support for training of self-supervised
   models (BYOL and SimCLR) based on the bring-your-own-model framework. Providing examples configurations for training
@@ -182,22 +183,22 @@ console for easier diagnostics.
 - ([#437](https://github.com/microsoft/InnerEye-DeepLearning/pull/437)) Upgrade to PyTorch-Lightning 1.2.8.
 - ([#439](https://github.com/microsoft/InnerEye-DeepLearning/pull/439)) Recovery checkpoints are now
   named `recovery_epoch=x.ckpt` instead of `recovery.ckpt` or `recovery-v0.ckpt`.
-- ([#451](https://github.com/microsoft/InnerEye-DeepLearning/pull/451)) Change the signature for function `generate_custom_report` 
+- ([#451](https://github.com/microsoft/InnerEye-DeepLearning/pull/451)) Change the signature for function `generate_custom_report`
   in `ModelConfigBase` to take only the path to the reports folder and a `ModelProcessing` object.
 - ([#444](https://github.com/microsoft/InnerEye-DeepLearning/pull/444)) The method `before_training_on_rank_zero` of
  the `LightningContainer` class has been renamed to `before_training_on_global_rank_zero`. The order in which the
  hooks are called has been changed.
-- ([#458](https://github.com/microsoft/InnerEye-DeepLearning/pull/458)) Simplifying and generalizing the way we handle 
-  data augmentations for classification models. The pipelining logic is now taken care of by a ImageTransformPipeline 
+- ([#458](https://github.com/microsoft/InnerEye-DeepLearning/pull/458)) Simplifying and generalizing the way we handle
+  data augmentations for classification models. The pipelining logic is now taken care of by a ImageTransformPipeline
   class that takes as input a list of transforms to chain together. This pipeline takes of applying transforms on 3D or
-  2D images. The user can choose to apply the same transformation for all channels (RGB example) or whether to apply 
-  different transformation for each channel (if each channel represents a different 
-  modality / time point for example). The pipeline can now work directly with out-of-the box torchvision transform 
-  (as long as they support [..., C, H, W] inputs). This allows to get rid of nearly all of our custom augmentations 
-  functions. The conversion from pipeline of image transformation to ScalarItemAugmentation is now taken care of under 
+  2D images. The user can choose to apply the same transformation for all channels (RGB example) or whether to apply
+  different transformation for each channel (if each channel represents a different
+  modality / time point for example). The pipeline can now work directly with out-of-the box torchvision transform
+  (as long as they support [..., C, H, W] inputs). This allows to get rid of nearly all of our custom augmentations
+  functions. The conversion from pipeline of image transformation to ScalarItemAugmentation is now taken care of under
   the hood, the user does not need to call this wrapper for each config class. In models derived from ScalarModelConfig
   to change which augmentations are applied to the images inputs (resp. segmentations inputs), users can override
-  `get_image_transform` (resp. `get_segmentation_transform`). These two functions replace the old 
+  `get_image_transform` (resp. `get_segmentation_transform`). These two functions replace the old
   `get_image_sample_transforms` method. See `docs/building_models.md` for more information on augmentations.
 
 ### Fixed
@@ -219,7 +220,7 @@ console for easier diagnostics.
 - ([#450](https://github.com/microsoft/InnerEye-DeepLearning/pull/450)) Delete unused `classification_report.ipynb`.
 - ([#455](https://github.com/microsoft/InnerEye-DeepLearning/pull/455)) Removed the AzureRunner conda environment.
   The full InnerEye conda environment is needed to submit a training job to AzureML.
-- ([#458](https://github.com/microsoft/InnerEye-DeepLearning/pull/458)) Getting rid of all the unused code for 
+- ([#458](https://github.com/microsoft/InnerEye-DeepLearning/pull/458)) Getting rid of all the unused code for
    RandAugment & Co. The user has now instead complete freedom to specify the set of augmentations to use.
 - ([#468](https://github.com/microsoft/InnerEye-DeepLearning/pull/468)) Removed the `KneeSinglecoil` example model
 

diff --git a/InnerEye/ML/SSL/encoders.py b/InnerEye/ML/SSL/encoders.py
@@ -44,8 +44,10 @@ def __init__(self, encoder_name: str, use_7x7_first_conv_in_resnet: bool = True)
         """
 
         super().__init__()
-        self.cnn_model = create_ssl_encoder(encoder_name=encoder_name,
-                                            use_7x7_first_conv_in_resnet=use_7x7_first_conv_in_resnet)
+        self.cnn_model = create_ssl_encoder(
+            encoder_name=encoder_name,
+            use_7x7_first_conv_in_resnet=use_7x7_first_conv_in_resnet,
+        )
 
     def forward(self, x: T) -> T:
         x = self.cnn_model(x)
@@ -55,21 +57,29 @@ def get_output_feature_dim(self) -> int:
         return get_encoder_output_dim(self)
 
 
-def get_encoder_output_dim(pl_module: Union[pl.LightningModule, torch.nn.Module],
-                           dm: Optional[pl.LightningDataModule] = None) -> int:
+def get_encoder_output_dim(
+    pl_module: Union[pl.LightningModule, torch.nn.Module],
+    dm: Optional[pl.LightningDataModule] = None,
+) -> int:
     """
     Calculates the output dimension of ssl encoder by making a single forward pass.
     :param pl_module: pl encoder module
     :param dm: pl datamodule
     """
     # Target device
-    device = pl_module.device if isinstance(pl_module, pl.LightningDataModule) else \
-        next(pl_module.parameters()).device  # type: ignore
-    assert (isinstance(device, torch.device))
+    device = (
+        pl_module.device
+        if isinstance(pl_module, pl.LightningDataModule)
+        else next(pl_module.parameters()).device
+    )  # type: ignore
+    assert isinstance(device, torch.device)
 
     # Create a dummy input image
     if dm is not None:
-        from InnerEye.ML.SSL.lightning_modules.ssl_online_evaluator import SSLOnlineEvaluatorInnerEye
+        from InnerEye.ML.SSL.lightning_modules.ssl_online_evaluator import (
+            SSLOnlineEvaluatorInnerEye,
+        )
+
         dataloader = dm.train_dataloader()
         dataloader = dataloader[SSLDataModuleType.LINEAR_HEAD] if isinstance(dataloader, dict) else dataloader  # type: ignore
         batch = iter(dataloader).next()  # type: ignore

diff --git a/Tests/ML/augmentations/test_transform_pipeline.py b/Tests/ML/augmentations/test_transform_pipeline.py
@@ -7,45 +7,64 @@
 import PIL
 import pytest
 import torch
-from torchvision.transforms import (CenterCrop, ColorJitter, RandomAffine, RandomErasing, RandomHorizontalFlip,
-                                    RandomResizedCrop, Resize, ToTensor)
+from torchvision.transforms import (
+    CenterCrop,
+    ColorJitter,
+    RandomAffine,
+    RandomErasing,
+    RandomHorizontalFlip,
+    RandomResizedCrop,
+    Resize,
+    ToTensor,
+)
 from torchvision.transforms.functional import to_tensor
 
-from InnerEye.ML.augmentations.image_transforms import (AddGaussianNoise, ElasticTransform,
-                                                        ExpandChannels, RandomGamma)
-from InnerEye.ML.augmentations.transform_pipeline import ImageTransformationPipeline, \
-    create_transforms_from_config
+from InnerEye.ML.augmentations.image_transforms import (
+    AddGaussianNoise,
+    ElasticTransform,
+    ExpandChannels,
+    RandomGamma,
+)
+from InnerEye.ML.augmentations.transform_pipeline import (
+    ImageTransformationPipeline,
+    create_transforms_from_config,
+)
 
 from Tests.SSL.test_data_modules import cxr_augmentation_config
 
 import numpy as np
 
 image_size = (32, 32)
 crop_size = 24
-test_image_as_array = np.ones(list(image_size)) * 255.
+test_image_as_array = np.ones(list(image_size)) * 255.0
 test_image_as_array[10:15, 10:20] = 1
 test_image_as_pil = PIL.Image.fromarray(test_image_as_array).convert("L")
 test_2d_image_as_CHW_tensor = to_tensor(test_image_as_array)
 
 test_2d_image_as_ZCHW_tensor = test_2d_image_as_CHW_tensor.unsqueeze(0)
 
-test_4d_scan_as_tensor = torch.ones([5, 4, *image_size]) * 255.
+test_4d_scan_as_tensor = torch.ones([5, 4, *image_size]) * 255.0
 test_4d_scan_as_tensor[..., 10:15, 10:20] = 1
 
+
 @pytest.mark.parametrize("use_different_transformation_per_channel", [True, False])
-def test_torchvision_on_various_input(use_different_transformation_per_channel: bool) -> None:
+def test_torchvision_on_various_input(
+    use_different_transformation_per_channel: bool,
+) -> None:
     """
     This tests that we can run transformation pipeline with out of the box torchvision transforms on various types
     of input: PIL image, 3D tensor, 4D tensors. Tests that use_different_transformation_per_channel has the correct
     behavior.
     """
 
     transform = ImageTransformationPipeline(
-        [CenterCrop(crop_size),
-         RandomErasing(),
-         RandomAffine(degrees=(10, 12), shear=15, translate=(0.1, 0.3))
-         ],
-        use_different_transformation_per_channel)
+        [
+            CenterCrop(crop_size),
+            RandomErasing(),
+            RandomAffine(degrees=(10, 12), shear=15, translate=(0.1, 0.3)),
+        ],
+        use_different_transformation_per_channel,
+    )
 
     # Test PIL image input
     transformed = transform(test_image_as_pil)
@@ -68,22 +87,29 @@ def test_torchvision_on_various_input(use_different_transformation_per_channel:
     assert transformed.shape == torch.Size([5, 4, crop_size, crop_size])
 
     # Same transformation should be applied to all slices and channels.
-    assert torch.isclose(transformed[0, 0], transformed[1, 1]).all() != use_different_transformation_per_channel
+    assert (
+        torch.isclose(transformed[0, 0], transformed[1, 1]).all()
+        != use_different_transformation_per_channel
+    )
 
 
 @pytest.mark.parametrize("use_different_transformation_per_channel", [True, False])
-def test_custom_tf_on_various_input(use_different_transformation_per_channel: bool) -> None:
+def test_custom_tf_on_various_input(
+    use_different_transformation_per_channel: bool,
+) -> None:
     """
     This tests that we can run transformation pipeline with our custom transforms on various types
     of input: PIL image, 3D tensor, 4D tensors. Tests that use_different_transformation_per_channel has the correct
     behavior. The transforms are test individually in test_image_transforms.py
     """
     pipeline = ImageTransformationPipeline(
-        [ElasticTransform(sigma=4, alpha=34, p_apply=1),
-         AddGaussianNoise(p_apply=1, std=0.05),
-         RandomGamma(scale=(0.3, 3))
-         ],
-        use_different_transformation_per_channel)
+        [
+            ElasticTransform(sigma=4, alpha=34, p_apply=1),
+            AddGaussianNoise(p_apply=1, std=0.05),
+            RandomGamma(scale=(0.3, 3)),
+        ],
+        use_different_transformation_per_channel,
+    )
 
     # Test PIL image input
     transformed = pipeline(test_image_as_pil)
@@ -104,28 +130,35 @@ def test_custom_tf_on_various_input(use_different_transformation_per_channel: bo
     assert transformed.shape == test_4d_scan_as_tensor.shape
 
     # Same transformation should be applied to all slices and channels.
-    assert torch.isclose(transformed[0, 0], transformed[1, 1]).all() != use_different_transformation_per_channel
+    assert (
+        torch.isclose(transformed[0, 0], transformed[1, 1]).all()
+        != use_different_transformation_per_channel
+    )
 
 
 @pytest.mark.parametrize("expand_channels", [True, False])
 def test_create_transform_pipeline_from_config(expand_channels: bool) -> None:
     """
     Tests that the pipeline returned by create_transform_pipeline_from_config returns the expected transformation.
     """
-    transformation_pipeline = create_transforms_from_config(cxr_augmentation_config, apply_augmentations=True,
-                                                            expand_channels=expand_channels)
-    fake_cxr_as_array = np.ones([256, 256]) * 255.
+    transformation_pipeline = create_transforms_from_config(
+        cxr_augmentation_config,
+        apply_augmentations=True,
+        expand_channels=expand_channels,
+    )
+    fake_cxr_as_array = np.ones([256, 256]) * 255.0
     fake_cxr_as_array[100:150, 100:200] = 1
-    all_transforms = [RandomAffine(degrees=180, translate=(0, 0), shear=40),
-                      RandomResizedCrop(scale=(0.4, 1.0), size=256),
-                      RandomHorizontalFlip(p=0.5),
-                      RandomGamma(scale=(0.5, 1.5)),
-                      ColorJitter(saturation=0, brightness=0.2, contrast=0.2),
-                      ElasticTransform(sigma=4, alpha=34, p_apply=0.4),
-                      CenterCrop(size=224),
-                      RandomErasing(scale=(0.15, 0.4), ratio=(0.33, 3)),
-                      AddGaussianNoise(std=0.05, p_apply=0.5)
-                      ]
+    all_transforms = [
+        RandomAffine(degrees=180, translate=(0, 0), shear=40),
+        RandomResizedCrop(scale=(0.4, 1.0), size=256),
+        RandomHorizontalFlip(p=0.5),
+        RandomGamma(scale=(0.5, 1.5)),
+        ColorJitter(saturation=0, brightness=0.2, contrast=0.2),
+        ElasticTransform(sigma=4, alpha=34, p_apply=0.4),
+        CenterCrop(size=224),
+        RandomErasing(scale=(0.15, 0.4), ratio=(0.33, 3)),
+        AddGaussianNoise(std=0.05, p_apply=0.5),
+    ]
 
     if expand_channels:
         all_transforms.insert(0, ExpandChannels())
@@ -134,7 +167,9 @@ def test_create_transform_pipeline_from_config(expand_channels: bool) -> None:
         # In the pipeline the image is converted to tensor before applying the transformations. Do the same here.
         image = ToTensor()(fake_image).reshape([1, 1, 256, 256])
     else:
-        fake_3d_array = np.dstack([fake_cxr_as_array, fake_cxr_as_array, fake_cxr_as_array])
+        fake_3d_array = np.dstack(
+            [fake_cxr_as_array, fake_cxr_as_array, fake_cxr_as_array]
+        )
         fake_image = PIL.Image.fromarray(fake_3d_array.astype(np.uint8)).convert("RGB")
         # In the pipeline the image is converted to tensor before applying the transformations. Do the same here.
         image = ToTensor()(fake_image).reshape([1, 3, 256, 256])
@@ -158,8 +193,11 @@ def test_create_transform_pipeline_from_config(expand_channels: bool) -> None:
     assert torch.isclose(expected_transformed, transformed_image).all()
 
     # Test the evaluation pipeline
-    transformation_pipeline = create_transforms_from_config(cxr_augmentation_config, apply_augmentations=False,
-                                                            expand_channels=expand_channels)
+    transformation_pipeline = create_transforms_from_config(
+        cxr_augmentation_config,
+        apply_augmentations=False,
+        expand_channels=expand_channels,
+    )
     transformed_image = transformation_pipeline(image)
     assert isinstance(transformed_image, torch.Tensor)
     all_transforms = [Resize(size=256), CenterCrop(size=224)]