Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate half precision in sklearn based task #100

Merged
merged 10 commits into from
Jan 26, 2024
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@
# Changelog
All notable changes to this project will be documented in this file.

### [1.5.6]

#### Added

- Add support for half precision training and inference for sklearn based tasks
- Add gradcam export for sklearn training

### [1.5.5]

#### Fixed
Expand Down
3 changes: 3 additions & 0 deletions docs/tutorials/examples/sklearn_classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,8 @@ datamodule:

task:
device: cuda:0
half_precision: false
gradcam: false
automatic_batch_size:
starting_batch_size: 1024
disable: true
Expand All @@ -160,6 +162,7 @@ task:

This will train a logistic regression classifier using a resnet18 backbone, resizing the images to 224x224 and using a 5-fold cross validation. The `class_to_idx` parameter is used to map the class names to indexes, the indexes will be used to train the classifier. The `output` parameter is used to specify the output folder and the type of output to save. The `export.types` parameter can be used to export the model in different formats, at the moment `torchscript`, `onnx` and `pytorch` are supported.
The backbone (in torchscript and pytorch format) will be saved along with the classifier. `test_full_data` is used to specify if a final test should be performed on all the data (after training on the training and validation datasets).
It's possible to enable half precision training by setting `half_precision` to `true` and export gradcam results by setting `gradcam` to `true`.

Optionally it's possible to enable the automatic batch size finder by setting `automatic_batch_size.disable` to `false`. This will try to find the maximum batch size that can be used on the given device without running out of memory. The `starting_batch_size` parameter is used to specify the starting batch size to use for the search, the algorithm will start from this value and will try to divide it by two until it doesn't run out of memory.
Finally, the `save_model_summary` parameter can be used to save the backbone information in a text file called `model_summary.txt` located in the root of the output folder.
Expand Down
2 changes: 2 additions & 0 deletions docs/tutorials/examples/sklearn_patch_classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,7 @@ datamodule:

task:
device: cuda:2
half_precision: false
automatic_batch_size:
starting_batch_size: 1024
disable: true
Expand All @@ -235,6 +236,7 @@ task:

This will train a resnet18 model on the given dataset, using 256 as batch size and skipping the background class during training.
The experiment results will be saved under the `classification_patch_experiment` folder. The deployment model will be generated but only the classifier will be saved (in joblib format), to reconstruct patches for evaluation the `major_voting` method will be used.
It is possible to extract features in half precision by setting `half_precision` to true. The `automatic_batch_size` parameter can be used to automatically adjust the batch size to fit the memory of the device. The `starting_batch_size` parameter is used to specify the starting batch size, the algorithm will try to decrease the batch size until it can fit the batch into memory. The `disable` parameter can be used to disable the automatic batch size adjustment.

### Run

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "quadra"
version = "1.5.5"
version = "1.5.6"
description = "Deep Learning experiment orchestration library"
authors = [
"Federico Belotti <federico.belotti@orobix.com>",
Expand Down
2 changes: 1 addition & 1 deletion quadra/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "1.5.5"
__version__ = "1.5.6"


def get_version():
Expand Down
2 changes: 2 additions & 0 deletions quadra/configs/task/sklearn_classification.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
_target_: quadra.tasks.SklearnClassification
device: cuda:0
half_precision: false
gradcam: false
automatic_batch_size:
starting_batch_size: 1024
disable: true
Expand Down
1 change: 1 addition & 0 deletions quadra/configs/task/sklearn_classification_patch.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
_target_: quadra.tasks.PatchSklearnClassification
device: cuda:0
half_precision: false
automatic_batch_size:
starting_batch_size: 1024
disable: true
Expand Down
44 changes: 37 additions & 7 deletions quadra/tasks/classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,11 @@ def generate_report(self) -> None:
if not self.run_test or self.config.trainer.get("fast_dev_run"):
self.datamodule.setup(stage="test")

if "16" in self.trainer.precision:
log.warning("Gradcam is currently not supported with half precision, it will be disabled")
self.module.gradcam = False
self.gradcam = False

predictions_outputs = self.trainer.predict(
model=self.module, datamodule=self.datamodule, ckpt_path=self.best_model_path
)
Expand Down Expand Up @@ -473,6 +478,8 @@ class SklearnClassification(Generic[SklearnClassificationDataModuleT], Task[Skle
output: Dictionary defining which kind of outputs to generate. Defaults to None.
automatic_batch_size: Whether to automatically find the largest batch size that fits in memory.
save_model_summary: Whether to save a model_summary.txt file containing the model summary.
half_precision: Whether to use half precision during training.
gradcam: Whether to compute gradcams for test results.
"""

def __init__(
Expand All @@ -482,6 +489,8 @@ def __init__(
device: str,
automatic_batch_size: DictConfig,
save_model_summary: bool = False,
half_precision: bool = False,
gradcam: bool = False,
):
super().__init__(config=config)

Expand All @@ -495,13 +504,16 @@ def __init__(
"test_accuracy": [],
"test_results": [],
"test_labels": [],
"cams": [],
}
self.export_folder = "deployment_model"
self.deploy_info_file = "model.json"
self.train_dataloader_list: List[torch.utils.data.DataLoader] = []
self.test_dataloader_list: List[torch.utils.data.DataLoader] = []
self.automatic_batch_size = automatic_batch_size
self.save_model_summary = save_model_summary
self.half_precision = half_precision
self.gradcam = gradcam

@property
def device(self) -> str:
Expand Down Expand Up @@ -558,6 +570,14 @@ def backbone(self, backbone_config):

self._backbone = ModelSignatureWrapper(self._backbone)
self._backbone.eval()
if self.half_precision:
if self.device == "cpu":
raise ValueError("Half precision is not supported on CPU")
self._backbone.half()

if self.gradcam:
log.warning("Gradcam is currently not supported with half precision, it will be disabled")
self.gradcam = False
self._backbone.to(self.device)

@property
Expand Down Expand Up @@ -622,21 +642,23 @@ def train(self) -> None:
train_features=all_features_sorted[0:train_len], train_labels=all_labels_sorted[0:train_len]
)

_, pd_cm, accuracy, res, _ = self.trainer.test(
_, pd_cm, accuracy, res, cams = self.trainer.test(
test_dataloader=test_dataloader,
test_features=all_features_sorted[train_len:],
test_labels=all_labels_sorted[train_len:],
class_to_keep=class_to_keep,
idx_to_class=train_dataloader.dataset.idx_to_class,
predict_proba=True,
gradcam=self.gradcam,
)
else:
self.trainer.fit(train_dataloader=train_dataloader)
_, pd_cm, accuracy, res, _ = self.trainer.test(
_, pd_cm, accuracy, res, cams = self.trainer.test(
test_dataloader=test_dataloader,
class_to_keep=class_to_keep,
idx_to_class=train_dataloader.dataset.idx_to_class,
predict_proba=True,
gradcam=self.gradcam,
)

# save results
Expand All @@ -649,6 +671,7 @@ def train(self) -> None:
for i in res["real_label"].unique().tolist()
]
)
self.metadata["cams"].append(cams)

def extract_model_summary(
self, feature_extractor: torch.nn.Module | BaseEvaluationModel, dl: torch.utils.data.DataLoader
Expand All @@ -670,7 +693,8 @@ def extract_model_summary(

if hasattr(feature_extractor, "parameters"):
# Move input to the correct device
x1 = x1.to(next(feature_extractor.parameters()).device)
parameter = next(feature_extractor.parameters())
x1 = x1.to(parameter.device).to(parameter.dtype)
x1 = x1[0].unsqueeze(0) # Remove batch dimension

model_info = None
Expand Down Expand Up @@ -725,8 +749,8 @@ def test_full_data(self) -> None:

# Put backbone on the correct device as it may be moved after export
self.backbone.to(self.device)
_, pd_cm, accuracy, res, _ = self.trainer.test(
test_dataloader=test_dataloader, idx_to_class=idx_to_class, predict_proba=True
_, pd_cm, accuracy, res, cams = self.trainer.test(
test_dataloader=test_dataloader, idx_to_class=idx_to_class, predict_proba=True, gradcam=self.gradcam
)

output_folder_test = "test"
Expand All @@ -741,6 +765,7 @@ def test_full_data(self) -> None:
test_dataloader=test_dataloader,
config=self.config,
output=self.output,
grayscale_cams=cams,
)

def export(self) -> None:
Expand All @@ -757,7 +782,7 @@ def export(self) -> None:
config=self.config,
model=self.backbone,
export_folder=self.export_folder,
half_precision=False,
half_precision=self.half_precision,
input_shapes=input_shapes,
idx_to_class=idx_to_class,
pytorch_model_type="backbone",
Expand Down Expand Up @@ -789,6 +814,7 @@ def generate_report(self) -> None:
test_dataloader=self.test_dataloader_list[count],
config=self.config,
output=self.output,
grayscale_cams=self.metadata["cams"][count],
)
final_confusion_matrix = sum(cm_list)

Expand Down Expand Up @@ -1104,7 +1130,11 @@ def prepare_gradcam(self) -> None:
return

if isinstance(self.deployment_model.model.features_extractor, timm.models.resnet.ResNet):
target_layers = [cast(BaseNetworkBuilder, self.deployment_model.model).features_extractor.layer4[-1]]
target_layers = [
cast(BaseNetworkBuilder, self.deployment_model.model).features_extractor.layer4[
-1
] # type: ignore[index]
]
self.cam = GradCAM(
model=self.deployment_model.model,
target_layers=target_layers,
Expand Down
9 changes: 8 additions & 1 deletion quadra/tasks/patch.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ class PatchSklearnClassification(Task[PatchSklearnClassificationDataModule]):
device: The device to use
output: Dictionary defining which kind of outputs to generate. Defaults to None.
automatic_batch_size: Whether to automatically find the largest batch size that fits in memory.
half_precision: Whether to use half precision.
"""

def __init__(
Expand All @@ -41,6 +42,7 @@ def __init__(
output: DictConfig,
device: str,
automatic_batch_size: DictConfig,
half_precision: bool = False,
):
super().__init__(config=config)
self.device: str = device
Expand All @@ -58,6 +60,7 @@ def __init__(
}
self.export_folder: str = "deployment_model"
self.automatic_batch_size = automatic_batch_size
self.half_precision = half_precision

@property
def model(self) -> ClassifierMixin:
Expand Down Expand Up @@ -87,6 +90,10 @@ def backbone(self, backbone_config):

self._backbone = ModelSignatureWrapper(self._backbone)
self._backbone.eval()
if self.half_precision:
if self.device == "cpu":
raise ValueError("Half precision is not supported on CPU")
self._backbone.half()
self._backbone = self._backbone.to(self.device)

def prepare(self) -> None:
Expand Down Expand Up @@ -222,7 +229,7 @@ def export(self) -> None:
config=self.config,
model=self.backbone,
export_folder=self.export_folder,
half_precision=False,
half_precision=self.half_precision,
input_shapes=input_shapes,
idx_to_class=idx_to_class,
pytorch_model_type="backbone",
Expand Down
16 changes: 12 additions & 4 deletions quadra/utils/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,12 @@
from timm.models.vision_transformer import Mlp
from torch import nn

from quadra.models.evaluation import BaseEvaluationModel, TorchEvaluationModel, TorchscriptEvaluationModel
from quadra.models.evaluation import (
BaseEvaluationModel,
ONNXEvaluationModel,
TorchEvaluationModel,
TorchscriptEvaluationModel,
)
from quadra.utils import utils
from quadra.utils.vit_explainability import VitAttentionGradRollout

Expand Down Expand Up @@ -109,7 +114,7 @@ def get_feature(
if isinstance(feature_extractor, (TorchEvaluationModel, TorchscriptEvaluationModel)):
# If we are working with torch based evaluation models we need to extract the model
feature_extractor = feature_extractor.model
else:
elif isinstance(feature_extractor, ONNXEvaluationModel):
gradcam = False

feature_extractor.eval()
Expand Down Expand Up @@ -146,8 +151,11 @@ def get_feature(
x1, y1 = b

if hasattr(feature_extractor, "parameters"):
# Move input to the correct device
x1 = x1.to(next(feature_extractor.parameters()).device)
# Move input to the correct device and dtype
parameter = next(feature_extractor.parameters())
x1 = x1.to(parameter.device).to(parameter.dtype)
elif isinstance(feature_extractor, BaseEvaluationModel):
x1 = x1.to(feature_extractor.device).to(feature_extractor.model_dtype)

if gradcam:
y_hat = cast(
Expand Down
Loading