Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix Serious BUG] Using Albu will crash the trainning #918

Merged
merged 8 commits into from
Jul 28, 2022

Conversation

PeterH0323
Copy link
Contributor

@PeterH0323 PeterH0323 commented Jul 19, 2022

Motivation

When I use Albu for dataset aug, It cause error :
c2efab7c719a40a3720e18396738935

Modification

I debug and found out that becuase the Albu will change the label from array(x) to array([x]), which will crash the trainning.

And I also find out that there are no field gt_labels in all configs file in mmcls. So I think is inherit from mmdet.

Becuase of that I correct it in mmcls with gt_label and added some code to change it back to array(x) after albu change it to incorrect type array([x])

Thx for @Ezra-Yu ,we can find it in here:

https://github.com/albumentations-team/albumentations/blob/a8dc46ee29f9573c8569213f0d243254980b02f1/albumentations/core/composition.py#L276-L282

image

Note

If need to test this PR before mmcls merge into master branch, we need to reinstall mmcls after modefied this file.

pip uninstall mmcls -y && pip install -e .

Config

train = dict(
    type='ImageNet',
    pipeline=[
        dict(type='LoadImageFromFile'),
        dict(
            type='Albu',
            transforms=[
                dict(
                    type='ShiftScaleRotate',
                    shift_limit=0.0625,
                    scale_limit=0.0,
                    rotate_limit=0,
                    interpolation=1,
                    p=0.5),
                dict(
                    type='RandomBrightnessContrast',
                    brightness_limit=[0.1, 0.3],
                    contrast_limit=[0.1, 0.3],
                    p=0.2),
                dict(type='ChannelShuffle', p=0.1),
                dict(
                    type='OneOf',
                    transforms=[
                        dict(type='Blur', blur_limit=3, p=1.0),
                        dict(type='MedianBlur', blur_limit=3, p=1.0)
                    ],
                    p=0.1),
            ]),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='ToTensor', keys=['gt_label']),
            dict(type='Collect', keys=['img', 'gt_label'])
    ]
),

Checklist

Before PR:

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
  • The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

  • If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects, like MMDet or MMSeg.
  • CLA has been signed and all committers have signed the CLA in this PR.

@PeterH0323 PeterH0323 changed the title [Fix erious BUG] Using Albu will crash the trainning [Fix Serious BUG] Using Albu will crash the trainning Jul 19, 2022
@PeterH0323
Copy link
Contributor Author

Hi @mzr1996 @Ezra-Yu
Pls take a look, I think this fix is very important for mmcls 😄

@Ezra-Yu
Copy link
Collaborator

Ezra-Yu commented Jul 20, 2022

Thank you for your report and PR.

I debug and found out that because the Albu will change the label from array(x) to array([x])

So, this is caused by some transforms in 'Albu' that will change the gt_label field in the results dict, which is necessary for the detection and segmentation task, but useless in the classification task.

@PeterH0323
Copy link
Contributor Author

PeterH0323 commented Jul 20, 2022

Thank you for your report and PR.

I debug and found out that because the Albu will change the label from array(x) to array([x])

So, this is caused by some transforms in 'Albu' that will change the gt_label field in the results dict, which is necessary for the detection and segmentation task, but useless in the classification task.

Hi @Ezra-Yu

I tested it , is not some transforms but is all transforms in Albu will change the gt_label shape.

When I train my model, the gt_label is necessary when calculate the loss.
https://github.com/open-mmlab/mmclassification/blob/11df205e399fb6d6367b3040e1b261563c8a13e5/mmcls/models/heads/cls_head.py#L48-L49

And if I don't set gt_label in my config, the trainning won't process with an error: missing gt_label.

@@ -1122,10 +1122,9 @@ def __call__(self, results):

results = self.aug(**results)

if 'gt_labels' in results:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just modifying the key 'gt_labels' to 'gt_label' in the original code will solve the bug. And your comment will help people understand the code.

        if 'gt_label' in results:
            if isinstance(results['gt_label'], list):
                results['gt_label'] = np.array(results['gt_label'])
            results['gt_label'] = results['gt_label'].astype(np.int64)

Your code may cause a potential bug in multi-label tasks. If we use transforms of Albu, which would not change the 'gt_label' in multi-label tasks. The results['gt_label'] don't need to be modified. But results['gt_label'].shape != (), multi-label would be change to single label.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my case, this can't solved the problem, the shape is not equal after results['gt_label'].astype(np.int64)

In my lastest commit, I using copy.deepcopy to backup the original gt_label and update it to dict results after aug. I think this way we can make sure it is the same.

@codecov
Copy link

codecov bot commented Jul 26, 2022

Codecov Report

Merging #918 (f8afc67) into dev (812f3d4) will decrease coverage by 0.85%.
The diff coverage is 0.00%.

❗ Current head f8afc67 differs from pull request most recent head 57517b5. Consider uploading reports for the commit 57517b5 to get more accurate results

@@            Coverage Diff             @@
##              dev     #918      +/-   ##
==========================================
- Coverage   85.45%   84.60%   -0.86%     
==========================================
  Files         132      132              
  Lines        8750     8747       -3     
  Branches     1513     1512       -1     
==========================================
- Hits         7477     7400      -77     
- Misses       1050     1114      +64     
- Partials      223      233      +10     
Flag Coverage Δ
unittests 84.60% <0.00%> (-0.79%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmcls/datasets/pipelines/transforms.py 88.63% <0.00%> (+0.36%) ⬆️
mmcls/models/backbones/convnext.py 60.00% <0.00%> (-34.79%) ⬇️
mmcls/utils/setup_env.py 72.72% <0.00%> (-22.73%) ⬇️
mmcls/models/utils/helpers.py 79.16% <0.00%> (-20.84%) ⬇️
mmcls/datasets/custom.py 88.52% <0.00%> (-11.48%) ⬇️
mmcls/models/backbones/convmixer.py 88.52% <0.00%> (-9.84%) ⬇️
mmcls/datasets/builder.py 78.37% <0.00%> (-9.46%) ⬇️
mmcls/models/backbones/cspnet.py 89.54% <0.00%> (-3.27%) ⬇️
mmcls/core/visualization/image.py 86.20% <0.00%> (-0.98%) ⬇️
mmcls/datasets/dataset_wrappers.py 71.83% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 11df205...57517b5. Read the comment docs.

@Ezra-Yu Ezra-Yu self-requested a review July 26, 2022 09:39
Copy link
Collaborator

@Ezra-Yu Ezra-Yu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a case to the unit test to check the gt_label.
such as ShiftScaleRotate :

dict(
        type='Albu',
        transforms=[
            dict(
                type='ShiftScaleRotate',
                shift_limit=0.0625,
                scale_limit=0.0,
                rotate_limit=0,
                interpolation=1,
                p=1)
        ]),

mmcls/datasets/pipelines/transforms.py Outdated Show resolved Hide resolved
mmcls/datasets/pipelines/transforms.py Outdated Show resolved Hide resolved
@PeterH0323
Copy link
Contributor Author

please add a case to the unit test to check the gt_label. such as ShiftScaleRotate :

dict(
        type='Albu',
        transforms=[
            dict(
                type='ShiftScaleRotate',
                shift_limit=0.0625,
                scale_limit=0.0,
                rotate_limit=0,
                interpolation=1,
                p=1)
        ]),

Done, I improved the func named test_albu_transform of file tests/test_data/test_pipelines/test_transform.py

Copy link
Collaborator

@Ezra-Yu Ezra-Yu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@mzr1996 mzr1996 merged commit 00f0e0d into open-mmlab:dev Jul 28, 2022
Ezra-Yu pushed a commit to Ezra-Yu/mmclassification that referenced this pull request Sep 6, 2022
* Fix albu BUG: using albu will cause the label from array(x) to array([x]) and crash the trainning

* Fix common

* Using copy incase potential bug in multi-label tasks

* Improve coding

* Improve code logic

* Add unit test

* Fix typo

* Fix yapf
mzr1996 added a commit that referenced this pull request Dec 30, 2022
* unit test for multi_task_head

* [Feature] MultiTaskHead (#628, #481)

* [Fix] lint for multi_task_head

* [Feature] Add `MultiTaskDataset` to support multi-task training.

* Update MultiTaskClsHead

* Update docs

* [CI] Add test mim CI. (#879)

* [Fix] Remove duplicated wide-resnet metafile.

* [Feature] Support MPS device. (#894)

* [Feature] Support MPS device.

* Add `auto_select_device`

* Add unit tests

* [Fix] Fix Albu crash bug. (#918)

* Fix albu BUG: using albu will cause the label from array(x) to array([x]) and crash the trainning

* Fix common

* Using copy incase potential bug in multi-label tasks

* Improve coding

* Improve code logic

* Add unit test

* Fix typo

* Fix yapf

* Bump version to 0.23.2. (#937)

* [Improve] Use `forward_dummy` to calculate FLOPS. (#953)

* Update README

* [Docs] Fix typo for wrong reference. (#1036)

* [Doc] Fix typo in tutorial 2 (#1043)

* [Docs] Fix a typo in ImageClassifier (#1050)

* add mask to loss

* add another pipeline

* adpat the pipeline if there is no mask

* switch mask and task

* first version of multi data smaple

* fix problem with attribut by getattr

* rm img_label suffix, fix 'LabelData' object has no attribute 'gt_label'

* training  without evaluation

* first version work

* add others metrics

* delete evaluation from dataset

* fix linter

* fix linter

* multi metrics

* first version of test

* change evaluate metric

* Update tests/test_models/test_heads.py

Co-authored-by: Colle <piercus@users.noreply.github.com>

* Update tests/test_models/test_heads.py

Co-authored-by: Colle <piercus@users.noreply.github.com>

* add tests

* add test for multidatasample

* create a generic test

* create a generic test

* create a generic test

* change multi data sample

* correct test

* test

* add new test

* add test for dataset

* correct test

* correct test

* correct test

* correct test

* fix : #5

* run yapf

* fix linter

* fix linter

* fix linter

* fix isort

* fix isort

* fix docformmater

* fix docformmater

* fix linter

* fix linter

* fix data sample

* Update mmcls/structures/multi_task_data_sample.py

Co-authored-by: Colle <piercus@users.noreply.github.com>

* Update mmcls/structures/multi_task_data_sample.py

Co-authored-by: Colle <piercus@users.noreply.github.com>

* Update mmcls/structures/multi_task_data_sample.py

Co-authored-by: Colle <piercus@users.noreply.github.com>

* Update mmcls/structures/multi_task_data_sample.py

Co-authored-by: Colle <piercus@users.noreply.github.com>

* Update mmcls/structures/multi_task_data_sample.py

Co-authored-by: Colle <piercus@users.noreply.github.com>

* Update mmcls/structures/multi_task_data_sample.py

Co-authored-by: Colle <piercus@users.noreply.github.com>

* Update tests/test_structures/test_datasample.py

Co-authored-by: Colle <piercus@users.noreply.github.com>

* Update mmcls/structures/multi_task_data_sample.py

Co-authored-by: Colle <piercus@users.noreply.github.com>

* Update tests/test_structures/test_datasample.py

Co-authored-by: Colle <piercus@users.noreply.github.com>

* Update tests/test_structures/test_datasample.py

Co-authored-by: Colle <piercus@users.noreply.github.com>

* update data sample

* update head

* update head

* update multi data sample

* fix linter

* fix linter

* fix linter

* fix linter

* fix linter

* fix linter

* update head

* fix problem we don't  set pred or  gt

* fix problem we don't  set pred or  gt

* fix problem we don't  set pred or  gt

* fix linter

* fix : #2

* fix : linter

* update multi head

* fix linter

* fix linter

* update data sample

* update data sample

* fix ; linter

* update test

* test pipeline

* update pipeline

* update test

* update dataset

* update dataset

* fix linter

* fix linter

* update formatting

* add test for multi-task-eval

* update formatting

* fix linter

* update test

* update

* add test

* update metrics

* update metrics

* add doc for functions

* fix linter

* training for multitask 1.x

* fix linter

* run flake8

* run linter

* update test

* add mask in evaluation

* update metric doc

* update metric doc

* Update mmcls/evaluation/metrics/multi_task.py

Co-authored-by: Colle <piercus@users.noreply.github.com>

* Update mmcls/evaluation/metrics/multi_task.py

Co-authored-by: Colle <piercus@users.noreply.github.com>

* Update mmcls/evaluation/metrics/multi_task.py

Co-authored-by: Colle <piercus@users.noreply.github.com>

* Update mmcls/evaluation/metrics/multi_task.py

Co-authored-by: Colle <piercus@users.noreply.github.com>

* Update mmcls/evaluation/metrics/multi_task.py

Co-authored-by: Colle <piercus@users.noreply.github.com>

* Update mmcls/evaluation/metrics/multi_task.py

Co-authored-by: Colle <piercus@users.noreply.github.com>

* update metric doc

* update metric doc

* Fix cannot import name MultiTaskDataSample

* fix test_datasets

* fix test_datasets

* fix linter

* add an example of multitask

* change name of configs dataset

* Refactor the multi-task support

* correct test and metric

* add test to multidatasample

* add test to multidatasample

* correct test

* correct metrics and clshead

* Update mmcls/models/heads/cls_head.py

Co-authored-by: Colle <piercus@users.noreply.github.com>

* update cls_head.py documentation

* lint

* lint

* fix: lint

* fix linter

* add eval mask

* fix documentation

* fix: single_label.py back to 1.x

* Update mmcls/models/heads/multi_task_head.py

Co-authored-by: Ma Zerun <mzr1996@163.com>

* Remove multi-task configs.

Co-authored-by: mzr1996 <mzr1996@163.com>
Co-authored-by: HinGwenWoong <peterhuang0323@qq.com>
Co-authored-by: Ming-Hsuan-Tu <alec.tu@acer.com>
Co-authored-by: Lei Lei <18294546+Crescent-Saturn@users.noreply.github.com>
Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com>
Co-authored-by: marouaneamz <maroineamil99@gmail.com>
Co-authored-by: marouane amzil <53240092+marouaneamz@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants