How to make pruner to support FPN like structure? #79

twmht · 2022-02-11T08:31:47Z

I am trying to prune from mmdet (https://github.com/open-mmlab/mmdetection/blob/master/configs/atss/atss_r50_fpn_1x_coco.py)

But it throw the exception when forwarding with FPN.

Any idea?

By the way, I think it's better to let users to configure the whole block as a group (like neck and bbox_head) which sharing the mask, since these blocks are always complicated, and the parsers are hard to modify to deal with these cases.

pppppM · 2022-02-16T08:54:25Z

Could you upload pruner config?

twmht · 2022-02-16T09:02:53Z

it's the same as https://github.com/open-mmlab/mmrazor/blob/master/configs/pruning/autoslim/autoslim_mbv2_supernet_8xb256_in1k.py#L41, except the model is change to https://github.com/open-mmlab/mmdetection/blob/master/configs/atss/atss_r50_fpn_1x_coco.py

pppppM · 2022-02-17T05:09:47Z

I'm very sorry for the inconvenience to you.
There is a bug in the trace mechanism in pruner. The sharable head only traces its first parent module (FPN 0), and other parent modules (FPN 1, FPN 2, ...) are not traced.
I will fix it as soon as possible.

twmht · 2022-02-17T05:23:38Z

@pppppM

This is what I concerned.

By the way, I think it's better to let users to configure the whole block as a group (like neck and bbox_head) which sharing the mask, since these blocks are always complicated, and the parsers are hard to modify to deal with these cases.

I have done this by passing the prebuilt channel space (in txt format) to my reimplemented autoslim.

The parser is hard to deal with all the network architectures. the same problem can be found in nni(https://nni.readthedocs.io/en/stable/Compression/ModelSpeedup.html#limitations).

the channel space can be generated by nni or mmrazor and saved it with text file, and can be modified by the users if the channel dependencies are not correctly built.

What is your opinion?

pppppM · 2022-02-17T05:57:49Z

Sounds great！
Are you interested in making a PR？ We can discuss further.

twmht · 2022-02-17T06:03:09Z

@pppppM

Sure.

pppppM · 2022-02-17T06:58:03Z

Before our open source version, most popular models can be handled correctly, such as ResNet, MobileNet, RetinaNet, Yolox, etc. Probably something went wrong when we refactored the code.
It does require some configurable mechanism to handle models that cannot be handled correctly.
I'm very excited to develop this feature with you. Looking forward to your PR.

HIT-cwh · 2022-04-15T09:46:44Z

Hi! This bug has been fixed in pr#126.

Bing1002 · 2022-04-15T17:25:36Z

it's the same as https://github.com/open-mmlab/mmrazor/blob/master/configs/pruning/autoslim/autoslim_mbv2_supernet_8xb256_in1k.py#L41, except the model is change to https://github.com/open-mmlab/mmdetection/blob/master/configs/atss/atss_r50_fpn_1x_coco.py

Hi, Can you please upload the prune config file? I used the way you referred but still got errors. Did you successfully to run autoslim on object detection task? Thanks.

twmht · 2022-04-16T02:21:06Z

@Bing1002

I have not tried the latest mmraor. Did you try?

Bing1002 · 2022-04-16T02:35:27Z

I tried using latest one but still failed. I am not sure if I gave wrong config or there is still a bug there.

…

On Fri, Apr 15, 2022 at 22:21 Ming-Hsuan-Tu ***@***.***> wrote: @Bing1002 <https://github.com/Bing1002> I have not tried the latest mmraor. Did you try? — Reply to this email directly, view it on GitHub <#79 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC27F5KWTFIXTVACVCLKRZ3VFIPZ3ANCNFSM5ODP7CIQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

HIT-cwh · 2022-04-16T03:36:31Z

I'm very sorry for the inconvenience to you.
Pruning models with GroupNorm haven't been supported at present. And GroupNorm is the default normalization in ATSSHead. We will fix it as soon as possible.
Models, such as RetinaNet and YoloX, can be pruned in our codes. The following codes can be used:

model = dict(
    type='mmdet.RetinaNet',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        start_level=1,
        add_extra_convs='on_input',
        num_outs=5),
    bbox_head=dict(
        type='RetinaHead',
        num_classes=80,
        in_channels=256,
        stacked_convs=4,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            octave_base_scale=4,
            scales_per_octave=3,
            ratios=[0.5, 1.0, 2.0],
            strides=[8, 16, 32, 64, 128]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[.0, .0, .0, .0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
    # model training and testing settings
    train_cfg=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.5,
            neg_iou_thr=0.4,
            min_pos_iou=0,
            ignore_iof_thr=-1),
        allowed_border=-1,
        pos_weight=-1,
        debug=False),
    test_cfg=dict(
        nms_pre=1000,
        min_bbox_size=0,
        score_thr=0.05,
        nms=dict(type='nms', iou_threshold=0.5),
        max_per_img=100))

algorithm_cfg = ConfigDict(
    type='AutoSlim',
    architecture=dict(type='MMDetArchitecture', model=model),
    pruner=dict(
        type='RatioPruner',
        ratios=(2 / 12, 3 / 12, 4 / 12, 5 / 12, 6 / 12, 7 / 12, 8 / 12, 9 / 12,
                10 / 12, 11 / 12, 1.0)),
    retraining=False,
    bn_training_mode=True,
    input_shape=None)

algorithm = build_algorithm(algorithm_cfg)

Bing1002 · 2022-04-16T13:53:24Z

Hi, thanks for your reply. I tried this config but still failed.

Here is the config:

model = dict(
    type='mmdet.RetinaNet',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        start_level=1,
        add_extra_convs='on_input',
        num_outs=5),
    bbox_head=dict(
        type='RetinaHead',
        num_classes=80,
        in_channels=256,
        stacked_convs=4,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            octave_base_scale=4,
            scales_per_octave=3,
            ratios=[0.5, 1.0, 2.0],
            strides=[8, 16, 32, 64, 128]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[.0, .0, .0, .0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
    # model training and testing settings
    train_cfg=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.5,
            neg_iou_thr=0.4,
            min_pos_iou=0,
            ignore_iof_thr=-1),
        allowed_border=-1,
        pos_weight=-1,
        debug=False),
    test_cfg=dict(
        nms_pre=1000,
        min_bbox_size=0,
        score_thr=0.05,
        nms=dict(type='nms', iou_threshold=0.5),
        max_per_img=100))

dataset_type = 'CocoDataset'
data_root = '/mnt/data/coco_demo/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]
data = dict(
    samples_per_gpu=4,
    workers_per_gpu=2,
    train=dict(
        type='CocoDataset',
        ann_file=data_root + 'annotations/instances_train2017.json',
        img_prefix=data_root + 'train2017/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='LoadAnnotations', with_bbox=True),
            dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
            dict(type='RandomFlip', flip_ratio=0.5),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
        ]),
    val=dict(
        type='CocoDataset',
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]),
    test=dict(
        type='CocoDataset',
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='MultiScaleFlipAug',
                img_scale=(1333, 800),
                flip=False,
                transforms=[
                    dict(type='Resize', keep_ratio=True),
                    dict(type='RandomFlip'),
                    dict(
                        type='Normalize',
                        mean=[123.675, 116.28, 103.53],
                        std=[58.395, 57.12, 57.375],
                        to_rgb=True),
                    dict(type='Pad', size_divisor=32),
                    dict(type='ImageToTensor', keys=['img']),
                    dict(type='Collect', keys=['img'])
                ])
        ]))
evaluation = dict(interval=1, metric='bbox')
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
checkpoint_config = dict(interval=1)
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
work_dir = './work_dirs/retinanet_r50_fpn_1x_coco'
auto_resume = False
gpu_ids = range(0, 1)


algorithm = dict(
    type='AutoSlim',
    architecture=dict(type='MMDetArchitecture', model=model),
    pruner=dict(
        type='RatioPruner',
        ratios=(2 / 12, 3 / 12, 4 / 12, 5 / 12, 6 / 12, 7 / 12, 8 / 12, 9 / 12,
                10 / 12, 11 / 12, 1.0)),
    retraining=False,
    bn_training_mode=True,
    input_shape=None)

And the error is the same as usual:

2022-04-16 09:49:55,736 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs
2022-04-16 09:49:55,736 - mmdet - INFO - Checkpoints will be saved to /home/local/york.lan/bing.zha/code/mmrazor/work_dirs/retinanet_r50_fpn_1x_coco by HardDiskBackend.
Traceback (most recent call last):
  File "/home/local/york.lan/bing.zha/code/mmrazor/tools/mmdet/train_mmdet.py", line 210, in <module>
    main()
  File "/home/local/york.lan/bing.zha/code/mmrazor/tools/mmdet/train_mmdet.py", line 199, in main
    train_mmdet_model(
  File "/home/local/york.lan/bing.zha/code/mmrazor/mmrazor/apis/mmdet/train.py", line 206, in train_mmdet_model
    runner.run(data_loader, cfg.workflow)
  File "/home/local/york.lan/bing.zha/code/mmcv_1.4.6/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/local/york.lan/bing.zha/code/mmcv_1.4.6/mmcv/runner/epoch_based_runner.py", line 51, in train
    self.call_hook('after_train_iter')
  File "/home/local/york.lan/bing.zha/code/mmcv_1.4.6/mmcv/runner/base_runner.py", line 309, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/local/york.lan/bing.zha/code/mmcv_1.4.6/mmcv/runner/hooks/optimizer.py", line 56, in after_train_iter
    runner.outputs['loss'].backward()
  File "/home/local/york.lan/bing.zha/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/local/york.lan/bing.zha/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/autograd/__init__.py", line 147, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

Looking forward your response. Thanks.

twmht · 2022-04-16T14:47:09Z

You may try to set optimizer_config to None.

Bing1002 · 2022-04-16T14:54:01Z

You may try to set optimizer_config to None.

After changing that part, now I can run pruning. Could you please explain why that setting matters?

twmht · 2022-04-16T14:59:11Z

They call optimizer.step() in autoslim, not by mmcv hook. Setting optimizer_config to None would not register mmcv hook and you would not call optimizer.step() twice.

Bing1002 · 2022-04-16T18:16:45Z

Thanks. Then it seems the return loss become nan.

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 15.2 task/s, elapsed: 329s, ETA:     0s2022-04-16 14:02:58,786 - mmdet - INFO - Evaluating bbox...
Loading and preparing results...
2022-04-16 14:02:58,787 - mmdet - ERROR - The testing results of the whole dataset is empty.
2022-04-16 14:02:58,816 - mmdet - INFO - Exp name: autoslim_retinanet.py
2022-04-16 14:02:58,841 - mmdet - INFO - Epoch(val) [4][5000]
2022-04-16 14:04:45,274 - mmdet - INFO - Epoch [5][50/1239]     lr: 1.000e-02, eta: 5:31:29, time: 2.128, data_time: 0.051, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:06:28,567 - mmdet - INFO - Epoch [5][100/1239]    lr: 1.000e-02, eta: 5:29:53, time: 2.066, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:08:13,967 - mmdet - INFO - Epoch [5][150/1239]    lr: 1.000e-02, eta: 5:28:21, time: 2.108, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:09:58,547 - mmdet - INFO - Epoch [5][200/1239]    lr: 1.000e-02, eta: 5:26:47, time: 2.092, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:11:42,629 - mmdet - INFO - Epoch [5][250/1239]    lr: 1.000e-02, eta: 5:25:11, time: 2.082, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:13:25,492 - mmdet - INFO - Epoch [5][300/1239]    lr: 1.000e-02, eta: 5:23:34, time: 2.057, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:15:09,866 - mmdet - INFO - Epoch [5][350/1239]    lr: 1.000e-02, eta: 5:21:59, time: 2.087, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan

Do you have any idea about it? Thanks a lot!

HIT-cwh · 2022-04-17T05:45:53Z

We have not verified whether AutoSlim works on object detection. Maybe you can try to prune Mobilenet v2 first to check if there is a problem with the codes or the AutoSlim.

HIT-cwh · 2022-04-19T08:03:43Z

Thanks. Then it seems the return loss become nan.

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 15.2 task/s, elapsed: 329s, ETA:     0s2022-04-16 14:02:58,786 - mmdet - INFO - Evaluating bbox...
Loading and preparing results...
2022-04-16 14:02:58,787 - mmdet - ERROR - The testing results of the whole dataset is empty.
2022-04-16 14:02:58,816 - mmdet - INFO - Exp name: autoslim_retinanet.py
2022-04-16 14:02:58,841 - mmdet - INFO - Epoch(val) [4][5000]
2022-04-16 14:04:45,274 - mmdet - INFO - Epoch [5][50/1239]     lr: 1.000e-02, eta: 5:31:29, time: 2.128, data_time: 0.051, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:06:28,567 - mmdet - INFO - Epoch [5][100/1239]    lr: 1.000e-02, eta: 5:29:53, time: 2.066, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:08:13,967 - mmdet - INFO - Epoch [5][150/1239]    lr: 1.000e-02, eta: 5:28:21, time: 2.108, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:09:58,547 - mmdet - INFO - Epoch [5][200/1239]    lr: 1.000e-02, eta: 5:26:47, time: 2.092, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:11:42,629 - mmdet - INFO - Epoch [5][250/1239]    lr: 1.000e-02, eta: 5:25:11, time: 2.082, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:13:25,492 - mmdet - INFO - Epoch [5][300/1239]    lr: 1.000e-02, eta: 5:23:34, time: 2.057, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan
2022-04-16 14:15:09,866 - mmdet - INFO - Epoch [5][350/1239]    lr: 1.000e-02, eta: 5:21:59, time: 2.087, data_time: 0.007, memory: 9424, max_model.loss_cls: nan, max_model.loss_bbox: nan, min_model.loss_cls: nan, min_model.loss_bbox: nan, prune_model1.loss_cls: nan, prune_model1.loss_bbox: nan, prune_model2.loss_cls: nan, prune_model2.loss_bbox: nan, loss: nan

Do you have any idea about it? Thanks a lot!

Do you detach the teacher's output in the loss function? Such as here.

twmht · 2022-04-19T08:36:56Z

@HIT-cwh

he did not use distilling.

HIT-cwh · 2022-04-19T08:46:51Z

@HIT-cwh

he did not use distilling.

My bad.
Due to a lack of manpower, the progress of transferring AutoSlim to other tasks is not very satisfactory. And I'm very sorry for the inconvenience to you.
We are reproducing BigNAS, if it goes well, we will release the BigNAS example on semantic segmentation.

twmht · 2022-04-19T08:55:03Z

@HIT-cwh

In fact, I have implemented my own autoslim, it's quite different from mmrazor, the memory usage is much efficient than mmrazor.

I use grad clip to clip the gradient in object detection, without distilling the result is satisfied. but when applying the distilling like cwd, the result is bad. You may try grad clip if you hava nan in the beginning of training.

by the way, most anytime network (like BigNAS) does not explain how they use distilling in object detection, I am exploring this and I am looking forward your experiment on this.

HIT-cwh · 2022-04-19T09:16:06Z

@HIT-cwh

In fact, I have implemented my own autoslim, it's quite different from mmrazor, the memory usage is much efficient than mmrazor.

I use grad clip to clip the gradient in object detection, without distilling the result is satisfied. but when applying the distilling like cwd, the result is bad. You may try grad clip if you hava nan in the beginning of training.

by the way, most anytime network (like BigNAS) does not explain how they use distilling in object detection, I am exploring this and I am looking forward your experiment on this.

I will appreciate it if you can share how to save memory in your implementation. And we will improve our codes based on that.

* [Refactor] Refactor configs according to new standard (open-mmlab#67) * modify cfg and cfg_util * modify tensorrt config * fix bug * lint * Fix 1. Delete print 2. Modify the return value from "False, None" to "None" and related code 3. Rename 2 get functions * modify apply_marks * [Feature] Refactor ocr config (open-mmlab#71) * add text detection config refactor * add text recognition refactor * add static exporting for mmocr * fix lint * set max space in child config * use Sequence[int] instead * add assert input_shape * fix static bug and add ppl ort and trt static (open-mmlab#77) * [Feature] Refine setup.py (open-mmlab#61) * add setup.py and related files * lint * Edit requirements * modify onnx version * modify according to comments * [Refactor] Refactor mmseg configs (open-mmlab#73) * refactor mmseg config * change create_input * fix lint * fix lint * fix lint * fix yapf * fix yapf * update export * remove Segmentation * remove tast assert * add onnx_config * remove hardcode * Inherit with static * Remove blank line * Add segmentation task enum * add assert task * mmocr version 0.3.0 (open-mmlab#79) * add dump_info * [Feature]: Refactor config in mmdet (open-mmlab#75) * support onnxruntime * add two stage * test two-stage ort and ppl * update fcos post_params * fix calib * test ok with maskrcnn dynamic * add empty line * add static into config filename * add input_shape to create_input in mmdet * add static to some configs * remove todo codes * remove partition config in base * refactor create_input * rename task name in mmdet * return None if input_shape is None * add size info into mmdet configs filenames * reorganize mmdet configs * add object detection task for mmdet * rename get_mmdet_params * keep naming style consistent * update post_params for fcos * fix typo in ncnn config * [Refactor] Refactor mmedit static config (open-mmlab#78) * add static cfg * update create_input * [Refactor]: Refactor mmcls configs (open-mmlab#74) * refactor mmcls2.0 * fix classify_tensorrt_dynamic.py * fix classify_tensorrt_dynmic.py * classify_tensorrt_dynamic_int8.py * fix file name * fix ncnn ppl * updata prepare_input.py * update utils.py * updata constant.py * add * fix prepare_input.py * fix prepare_input.py * add static config file * add blank lines * fix prepare_input.py(wait test) * fix input_shape(wait test) * Update prepare_input.py * fix classification_tensorrt_dynamic(wait test) * fix classification_tensorrt_dynamic_int8(wait test) * fix classification_tensorrt_static_int8(wait test) * Rename classification_tensorrt_dynamic.py to classification_tensorrt_dynamic-224x224-224x224.py * Rename classification_tensorrt_dynamic_int8.py to classification_tensorrt_dynamic_int8-224x224-224x224.py * Rename classification_tensorrt_dynamic_int8-224x224-224x224.py to classification_tensorrt_int8_dynamic_224x224-224x224.py * Rename classification_tensorrt_dynamic-224x224-224x224.py to classification_tensorrt_dynamic_224x224-224x224.py * Rename classification_tensorrt_static.py to classification_tensorrt_static_224x224.py * Rename classification_tensorrt_static_int8.py to classification_tensorrt_int8_static_224x224.py * Update prepare_input.py * Rename classification_tensorrt_dynamic_224x224-224x224.py to classification_tensorrt_dynamic-224x224-224x224.py * Rename classification_tensorrt_int8_dynamic_224x224-224x224.py to classification_tensorrt_int8-dynamic_224x224-224x224.py * Rename classification_tensorrt_int8-dynamic_224x224-224x224.py to classification_tensorrt_int8_dynamic-224x224-224x224.py * Rename classification_tensorrt_int8_static_224x224.py to classification_tensorrt_int8_static-224x224.py * Rename classification_tensorrt_static_224x224.py to classification_tensorrt_static-224x224.py * Update prepare_input.py * Update prepare_input.py * Update prepare_input.py * Update prepare_input.py * Update prepare_input.py * Update prepare_input.py * Update prepare_input.py * change logging msg Co-authored-by: maningsheng <mnsheng@yeah.net> * fix * fix else branch * fix bug for trt in mmseg * enable dump trt info * fix trt static for mmdet * remove two-stage_partition_tensorrt_static-800x1344 config * fix wrong backend in ppl config * fix partition calibration Co-authored-by: Yifan Zhou <singlezombie@163.com> Co-authored-by: AllentDan <41138331+AllentDan@users.noreply.github.com> Co-authored-by: hanrui1sensetime <83800577+hanrui1sensetime@users.noreply.github.com> Co-authored-by: RunningLeon <maningsheng@sensetime.com> Co-authored-by: VVsssssk <88368822+VVsssssk@users.noreply.github.com> Co-authored-by: maningsheng <mnsheng@yeah.net> Co-authored-by: AllentDan <AllentDan@yeah.net>

twmht added the help wanted label Feb 11, 2022

pppppM self-assigned this Feb 16, 2022

pppppM added the bug Something isn't working label Feb 17, 2022

twmht mentioned this issue Feb 23, 2022

[Bug] Can't use RepVGG with AutoSlim #97

Open

pppppM linked a pull request Apr 1, 2022 that will close this issue

[BUG]Fix bugs in pruner #126

Merged

pppppM removed a link to a pull request Apr 1, 2022

[BUG]Fix bugs in pruner #126

Merged

pppppM linked a pull request Apr 1, 2022 that will close this issue

[BUG]Fix bugs in pruner #126

Merged

pppppM removed the help wanted label Apr 13, 2022

pppppM closed this as completed Jul 1, 2022

qraleq mentioned this issue Feb 18, 2024

Was support for GN ever added to mmrazor? #627

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to make pruner to support FPN like structure? #79

How to make pruner to support FPN like structure? #79

twmht commented Feb 11, 2022 •

edited

Loading

pppppM commented Feb 16, 2022

twmht commented Feb 16, 2022

pppppM commented Feb 17, 2022

twmht commented Feb 17, 2022 •

edited

Loading

pppppM commented Feb 17, 2022

twmht commented Feb 17, 2022

pppppM commented Feb 17, 2022

HIT-cwh commented Apr 15, 2022

Bing1002 commented Apr 15, 2022

twmht commented Apr 16, 2022

Bing1002 commented Apr 16, 2022 via email

HIT-cwh commented Apr 16, 2022 •

edited

Loading

Bing1002 commented Apr 16, 2022

twmht commented Apr 16, 2022

Bing1002 commented Apr 16, 2022

twmht commented Apr 16, 2022

Bing1002 commented Apr 16, 2022

HIT-cwh commented Apr 17, 2022

HIT-cwh commented Apr 19, 2022 •

edited

Loading

twmht commented Apr 19, 2022

HIT-cwh commented Apr 19, 2022

twmht commented Apr 19, 2022 •

edited

Loading

HIT-cwh commented Apr 19, 2022

How to make pruner to support FPN like structure? #79

How to make pruner to support FPN like structure? #79

Comments

twmht commented Feb 11, 2022 • edited Loading

pppppM commented Feb 16, 2022

twmht commented Feb 16, 2022

pppppM commented Feb 17, 2022

twmht commented Feb 17, 2022 • edited Loading

pppppM commented Feb 17, 2022

twmht commented Feb 17, 2022

pppppM commented Feb 17, 2022

HIT-cwh commented Apr 15, 2022

Bing1002 commented Apr 15, 2022

twmht commented Apr 16, 2022

Bing1002 commented Apr 16, 2022 via email

HIT-cwh commented Apr 16, 2022 • edited Loading

Bing1002 commented Apr 16, 2022

twmht commented Apr 16, 2022

Bing1002 commented Apr 16, 2022

twmht commented Apr 16, 2022

Bing1002 commented Apr 16, 2022

HIT-cwh commented Apr 17, 2022

HIT-cwh commented Apr 19, 2022 • edited Loading

twmht commented Apr 19, 2022

HIT-cwh commented Apr 19, 2022

twmht commented Apr 19, 2022 • edited Loading

HIT-cwh commented Apr 19, 2022

twmht commented Feb 11, 2022 •

edited

Loading

twmht commented Feb 17, 2022 •

edited

Loading

HIT-cwh commented Apr 16, 2022 •

edited

Loading

HIT-cwh commented Apr 19, 2022 •

edited

Loading

twmht commented Apr 19, 2022 •

edited

Loading