[Fix Serious BUG] Using Albu will crash the trainning #918

PeterH0323 · 2022-07-19T13:26:35Z

Motivation

When I use Albu for dataset aug, It cause error :

Modification

I debug and found out that becuase the Albu will change the label from array(x) to array([x]), which will crash the trainning.

And I also find out that there are no field gt_labels in all configs file in mmcls. So I think is inherit from mmdet.

Becuase of that I correct it in mmcls with gt_label and added some code to change it back to array(x) after albu change it to incorrect type array([x])

Thx for @Ezra-Yu ,we can find it in here:

https://github.com/albumentations-team/albumentations/blob/a8dc46ee29f9573c8569213f0d243254980b02f1/albumentations/core/composition.py#L276-L282

Note

If need to test this PR before mmcls merge into master branch, we need to reinstall mmcls after modefied this file.

pip uninstall mmcls -y && pip install -e .

Config

train = dict(
    type='ImageNet',
    pipeline=[
        dict(type='LoadImageFromFile'),
        dict(
            type='Albu',
            transforms=[
                dict(
                    type='ShiftScaleRotate',
                    shift_limit=0.0625,
                    scale_limit=0.0,
                    rotate_limit=0,
                    interpolation=1,
                    p=0.5),
                dict(
                    type='RandomBrightnessContrast',
                    brightness_limit=[0.1, 0.3],
                    contrast_limit=[0.1, 0.3],
                    p=0.2),
                dict(type='ChannelShuffle', p=0.1),
                dict(
                    type='OneOf',
                    transforms=[
                        dict(type='Blur', blur_limit=3, p=1.0),
                        dict(type='MedianBlur', blur_limit=3, p=1.0)
                    ],
                    p=0.1),
            ]),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='ToTensor', keys=['gt_label']),
            dict(type='Collect', keys=['img', 'gt_label'])
    ]
),

Checklist

Before PR:

Pre-commit or other linting tools are used to fix the potential lint issues.
Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects, like MMDet or MMSeg.
CLA has been signed and all committers have signed the CLA in this PR.

…[x]) and crash the trainning

PeterH0323 · 2022-07-20T00:28:09Z

Hi @mzr1996 @Ezra-Yu
Pls take a look, I think this fix is very important for mmcls 😄

Ezra-Yu · 2022-07-20T03:14:05Z

Thank you for your report and PR.

I debug and found out that because the Albu will change the label from array(x) to array([x])

So, this is caused by some transforms in 'Albu' that will change the gt_label field in the results dict, which is necessary for the detection and segmentation task, but useless in the classification task.

PeterH0323 · 2022-07-20T03:34:51Z

Thank you for your report and PR.

I debug and found out that because the Albu will change the label from array(x) to array([x])

So, this is caused by some transforms in 'Albu' that will change the gt_label field in the results dict, which is necessary for the detection and segmentation task, but useless in the classification task.

Hi @Ezra-Yu

I tested it , is not some transforms but is all transforms in Albu will change the gt_label shape.

When I train my model, the gt_label is necessary when calculate the loss.
https://github.com/open-mmlab/mmclassification/blob/11df205e399fb6d6367b3040e1b261563c8a13e5/mmcls/models/heads/cls_head.py#L48-L49

And if I don't set gt_label in my config, the trainning won't process with an error: missing gt_label.

Ezra-Yu · 2022-07-26T07:18:16Z

mmcls/datasets/pipelines/transforms.py

@@ -1122,10 +1122,9 @@ def __call__(self, results):

        results = self.aug(**results)

-        if 'gt_labels' in results:


Just modifying the key 'gt_labels' to 'gt_label' in the original code will solve the bug. And your comment will help people understand the code.

if 'gt_label' in results: if isinstance(results['gt_label'], list): results['gt_label'] = np.array(results['gt_label']) results['gt_label'] = results['gt_label'].astype(np.int64)

Your code may cause a potential bug in multi-label tasks. If we use transforms of Albu, which would not change the 'gt_label' in multi-label tasks. The results['gt_label'] don't need to be modified. But results['gt_label'].shape != (), multi-label would be change to single label.

In my case, this can't solved the problem, the shape is not equal after results['gt_label'].astype(np.int64)

In my lastest commit, I using copy.deepcopy to backup the original gt_label and update it to dict results after aug. I think this way we can make sure it is the same.

codecov · 2022-07-26T09:31:57Z

Codecov Report

Merging #918 (f8afc67) into dev (812f3d4) will decrease coverage by 0.85%.
The diff coverage is 0.00%.

❗ Current head f8afc67 differs from pull request most recent head 57517b5. Consider uploading reports for the commit 57517b5 to get more accurate results

@@            Coverage Diff             @@
##              dev     #918      +/-   ##
==========================================
- Coverage   85.45%   84.60%   -0.86%     
==========================================
  Files         132      132              
  Lines        8750     8747       -3     
  Branches     1513     1512       -1     
==========================================
- Hits         7477     7400      -77     
- Misses       1050     1114      +64     
- Partials      223      233      +10

Flag	Coverage Δ
unittests	`84.60% <0.00%> (-0.79%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mmcls/datasets/pipelines/transforms.py	`88.63% <0.00%> (+0.36%)`	⬆️
mmcls/models/backbones/convnext.py	`60.00% <0.00%> (-34.79%)`	⬇️
mmcls/utils/setup_env.py	`72.72% <0.00%> (-22.73%)`	⬇️
mmcls/models/utils/helpers.py	`79.16% <0.00%> (-20.84%)`	⬇️
mmcls/datasets/custom.py	`88.52% <0.00%> (-11.48%)`	⬇️
mmcls/models/backbones/convmixer.py	`88.52% <0.00%> (-9.84%)`	⬇️
mmcls/datasets/builder.py	`78.37% <0.00%> (-9.46%)`	⬇️
mmcls/models/backbones/cspnet.py	`89.54% <0.00%> (-3.27%)`	⬇️
mmcls/core/visualization/image.py	`86.20% <0.00%> (-0.98%)`	⬇️
mmcls/datasets/dataset_wrappers.py	`71.83% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 11df205...57517b5. Read the comment docs.

Ezra-Yu

please add a case to the unit test to check the gt_label.
such as ShiftScaleRotate :

dict(
        type='Albu',
        transforms=[
            dict(
                type='ShiftScaleRotate',
                shift_limit=0.0625,
                scale_limit=0.0,
                rotate_limit=0,
                interpolation=1,
                p=1)
        ]),

mmcls/datasets/pipelines/transforms.py

PeterH0323 · 2022-07-27T06:28:29Z

please add a case to the unit test to check the gt_label. such as ShiftScaleRotate :

dict(
        type='Albu',
        transforms=[
            dict(
                type='ShiftScaleRotate',
                shift_limit=0.0625,
                scale_limit=0.0,
                rotate_limit=0,
                interpolation=1,
                p=1)
        ]),

Done, I improved the func named test_albu_transform of file tests/test_data/test_pipelines/test_transform.py

Ezra-Yu

LGTM.

* Fix albu BUG: using albu will cause the label from array(x) to array([x]) and crash the trainning * Fix common * Using copy incase potential bug in multi-label tasks * Improve coding * Improve code logic * Add unit test * Fix typo * Fix yapf

* unit test for multi_task_head * [Feature] MultiTaskHead (#628, #481) * [Fix] lint for multi_task_head * [Feature] Add `MultiTaskDataset` to support multi-task training. * Update MultiTaskClsHead * Update docs * [CI] Add test mim CI. (#879) * [Fix] Remove duplicated wide-resnet metafile. * [Feature] Support MPS device. (#894) * [Feature] Support MPS device. * Add `auto_select_device` * Add unit tests * [Fix] Fix Albu crash bug. (#918) * Fix albu BUG: using albu will cause the label from array(x) to array([x]) and crash the trainning * Fix common * Using copy incase potential bug in multi-label tasks * Improve coding * Improve code logic * Add unit test * Fix typo * Fix yapf * Bump version to 0.23.2. (#937) * [Improve] Use `forward_dummy` to calculate FLOPS. (#953) * Update README * [Docs] Fix typo for wrong reference. (#1036) * [Doc] Fix typo in tutorial 2 (#1043) * [Docs] Fix a typo in ImageClassifier (#1050) * add mask to loss * add another pipeline * adpat the pipeline if there is no mask * switch mask and task * first version of multi data smaple * fix problem with attribut by getattr * rm img_label suffix, fix 'LabelData' object has no attribute 'gt_label' * training without evaluation * first version work * add others metrics * delete evaluation from dataset * fix linter * fix linter * multi metrics * first version of test * change evaluate metric * Update tests/test_models/test_heads.py Co-authored-by: Colle <piercus@users.noreply.github.com> * Update tests/test_models/test_heads.py Co-authored-by: Colle <piercus@users.noreply.github.com> * add tests * add test for multidatasample * create a generic test * create a generic test * create a generic test * change multi data sample * correct test * test * add new test * add test for dataset * correct test * correct test * correct test * correct test * fix : #5 * run yapf * fix linter * fix linter * fix linter * fix isort * fix isort * fix docformmater * fix docformmater * fix linter * fix linter * fix data sample * Update mmcls/structures/multi_task_data_sample.py Co-authored-by: Colle <piercus@users.noreply.github.com> * Update mmcls/structures/multi_task_data_sample.py Co-authored-by: Colle <piercus@users.noreply.github.com> * Update mmcls/structures/multi_task_data_sample.py Co-authored-by: Colle <piercus@users.noreply.github.com> * Update mmcls/structures/multi_task_data_sample.py Co-authored-by: Colle <piercus@users.noreply.github.com> * Update mmcls/structures/multi_task_data_sample.py Co-authored-by: Colle <piercus@users.noreply.github.com> * Update mmcls/structures/multi_task_data_sample.py Co-authored-by: Colle <piercus@users.noreply.github.com> * Update tests/test_structures/test_datasample.py Co-authored-by: Colle <piercus@users.noreply.github.com> * Update mmcls/structures/multi_task_data_sample.py Co-authored-by: Colle <piercus@users.noreply.github.com> * Update tests/test_structures/test_datasample.py Co-authored-by: Colle <piercus@users.noreply.github.com> * Update tests/test_structures/test_datasample.py Co-authored-by: Colle <piercus@users.noreply.github.com> * update data sample * update head * update head * update multi data sample * fix linter * fix linter * fix linter * fix linter * fix linter * fix linter * update head * fix problem we don't set pred or gt * fix problem we don't set pred or gt * fix problem we don't set pred or gt * fix linter * fix : #2 * fix : linter * update multi head * fix linter * fix linter * update data sample * update data sample * fix ; linter * update test * test pipeline * update pipeline * update test * update dataset * update dataset * fix linter * fix linter * update formatting * add test for multi-task-eval * update formatting * fix linter * update test * update * add test * update metrics * update metrics * add doc for functions * fix linter * training for multitask 1.x * fix linter * run flake8 * run linter * update test * add mask in evaluation * update metric doc * update metric doc * Update mmcls/evaluation/metrics/multi_task.py Co-authored-by: Colle <piercus@users.noreply.github.com> * Update mmcls/evaluation/metrics/multi_task.py Co-authored-by: Colle <piercus@users.noreply.github.com> * Update mmcls/evaluation/metrics/multi_task.py Co-authored-by: Colle <piercus@users.noreply.github.com> * Update mmcls/evaluation/metrics/multi_task.py Co-authored-by: Colle <piercus@users.noreply.github.com> * Update mmcls/evaluation/metrics/multi_task.py Co-authored-by: Colle <piercus@users.noreply.github.com> * Update mmcls/evaluation/metrics/multi_task.py Co-authored-by: Colle <piercus@users.noreply.github.com> * update metric doc * update metric doc * Fix cannot import name MultiTaskDataSample * fix test_datasets * fix test_datasets * fix linter * add an example of multitask * change name of configs dataset * Refactor the multi-task support * correct test and metric * add test to multidatasample * add test to multidatasample * correct test * correct metrics and clshead * Update mmcls/models/heads/cls_head.py Co-authored-by: Colle <piercus@users.noreply.github.com> * update cls_head.py documentation * lint * lint * fix: lint * fix linter * add eval mask * fix documentation * fix: single_label.py back to 1.x * Update mmcls/models/heads/multi_task_head.py Co-authored-by: Ma Zerun <mzr1996@163.com> * Remove multi-task configs. Co-authored-by: mzr1996 <mzr1996@163.com> Co-authored-by: HinGwenWoong <peterhuang0323@qq.com> Co-authored-by: Ming-Hsuan-Tu <alec.tu@acer.com> Co-authored-by: Lei Lei <18294546+Crescent-Saturn@users.noreply.github.com> Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com> Co-authored-by: marouaneamz <maroineamil99@gmail.com> Co-authored-by: marouane amzil <53240092+marouaneamz@users.noreply.github.com>

Fix albu BUG: using albu will cause the label from array(x) to array(…

5ef6014

…[x]) and crash the trainning

PeterH0323 changed the title ~~[Fix erious BUG] Using Albu will crash the trainning~~ [Fix Serious BUG] Using Albu will crash the trainning Jul 19, 2022

Fix common

f8afc67

PeterH0323 mentioned this pull request Jul 26, 2022

添加albu，报错：RuntimeError: multi-target not supported at /opt/conda/conda-bld/pytorch_1614378073850/work/aten/src/THCUNN/generic/ClassNLLCriterion.cu:15 #929

Closed

Ezra-Yu requested changes Jul 26, 2022

View reviewed changes

Ezra-Yu self-requested a review July 26, 2022 09:39

PeterH0323 added 2 commits July 26, 2022 19:25

Using copy incase potential bug in multi-label tasks

aa25036

Improve coding

8d43cda

Ezra-Yu reviewed Jul 27, 2022

View reviewed changes

mmcls/datasets/pipelines/transforms.py Outdated Show resolved Hide resolved

mmcls/datasets/pipelines/transforms.py Outdated Show resolved Hide resolved

PeterH0323 added 2 commits July 27, 2022 11:23

Improve code logic

700231b

Add unit test

4f4e36f

Fix typo

b2c1991

Ezra-Yu approved these changes Jul 27, 2022

View reviewed changes

Fix yapf

57517b5

mzr1996 merged commit 00f0e0d into open-mmlab:dev Jul 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix Serious BUG] Using Albu will crash the trainning #918

[Fix Serious BUG] Using Albu will crash the trainning #918

PeterH0323 commented Jul 19, 2022 •

edited

Loading

PeterH0323 commented Jul 20, 2022

Ezra-Yu commented Jul 20, 2022

PeterH0323 commented Jul 20, 2022 •

edited

Loading

Ezra-Yu Jul 26, 2022

PeterH0323 Jul 26, 2022

codecov bot commented Jul 26, 2022 •

edited

Loading

Ezra-Yu left a comment

PeterH0323 commented Jul 27, 2022

Ezra-Yu left a comment

		@@ -1122,10 +1122,9 @@ def __call__(self, results):

		results = self.aug(**results)

		if 'gt_labels' in results:

[Fix Serious BUG] Using Albu will crash the trainning #918

[Fix Serious BUG] Using Albu will crash the trainning #918

Conversation

PeterH0323 commented Jul 19, 2022 • edited Loading

Motivation

Modification

Note

Config

Checklist

PeterH0323 commented Jul 20, 2022

Ezra-Yu commented Jul 20, 2022

PeterH0323 commented Jul 20, 2022 • edited Loading

Ezra-Yu Jul 26, 2022

Choose a reason for hiding this comment

PeterH0323 Jul 26, 2022

Choose a reason for hiding this comment

codecov bot commented Jul 26, 2022 • edited Loading

Codecov Report

Ezra-Yu left a comment

Choose a reason for hiding this comment

PeterH0323 commented Jul 27, 2022

Ezra-Yu left a comment

Choose a reason for hiding this comment

PeterH0323 commented Jul 19, 2022 •

edited

Loading

PeterH0323 commented Jul 20, 2022 •

edited

Loading

codecov bot commented Jul 26, 2022 •

edited

Loading