Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]Support broadcast_object_list in multi-machines & support Searcher running in single GPU #153

Merged
merged 2 commits into from
May 4, 2022

Conversation

humu789
Copy link
Collaborator

@humu789 humu789 commented Apr 27, 2022

Motivation

fix init bug when running in single GPU #42
fix broadcast_object_list bug, which can not be executed in multi-machines.

Modification

Refactor broadcast_object_list to be consistent with pytorch

BC-breaking (Optional)

  1. new broad_object_list is without return value.
  2. broad_object_list's parameter changed: object_list -> data

Use cases (Optional)

Examples:
>>> import torch
>>> import mmrazor.core.utils as dist
>>> # non-distributed environment
>>> data = ['foo', 12, {1: 2}]
>>> dist.broadcast_object_list(data)
>>> data
['foo', 12, {1: 2}]
>>> # distributed environment
>>> # We have 2 process groups, 2 ranks.
>>> if dist.get_rank() == 0:
>>> # Assumes world_size of 3.
>>> data = ["foo", 12, {1: 2}] # any picklable object
>>> else:
>>> data = [None, None, None]
>>> dist.broadcast_object_list(data)
>>> data
["foo", 12, {1: 2}] # Rank 0
["foo", 12, {1: 2}] # Rank 1

@humu789 humu789 added the bug Something isn't working label Apr 27, 2022
@humu789 humu789 requested review from HIT-cwh and pppppM April 27, 2022 15:35
@codecov
Copy link

codecov bot commented Apr 27, 2022

Codecov Report

Merging #153 (9a4dbd6) into dev_v0.4.0 (7de8a07) will decrease coverage by 0.65%.
The diff coverage is 24.09%.

@@              Coverage Diff               @@
##           dev_v0.4.0     #153      +/-   ##
==============================================
- Coverage       66.17%   65.51%   -0.66%     
==============================================
  Files              92       93       +1     
  Lines            3376     3428      +52     
  Branches          615      630      +15     
==============================================
+ Hits             2234     2246      +12     
- Misses           1040     1080      +40     
  Partials          102      102              
Flag Coverage Δ
unittests 65.49% <24.09%> (-0.66%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmrazor/core/searcher/evolution_search.py 9.84% <0.00%> (+0.07%) ⬆️
mmrazor/core/searcher/greedy_search.py 12.63% <0.00%> (ø)
mmrazor/core/utils/broadcast.py 21.81% <21.15%> (-7.82%) ⬇️
mmrazor/core/utils/utils.py 29.16% <29.16%> (ø)
mmrazor/core/utils/__init__.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7de8a07...9a4dbd6. Read the comment docs.

@pppppM pppppM changed the title 【Fix】Support broadcast_object_list in multi-machines & support Searcher running in single GPU [Enhancement]Support broadcast_object_list in multi-machines & support Searcher running in single GPU Apr 29, 2022
@pppppM
Copy link
Collaborator

pppppM commented Apr 29, 2022

The modification of broadcast_object_list will cause BC-Breaking.
BC-Breaking and Use cases should be added to the PR message.

object_list[i] = _tensor_to_object(obj_view, obj_size)


def broadcast_object_list(data: List[Any],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A warning needs to be added here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@pppppM pppppM merged commit c2492d5 into open-mmlab:dev_v0.4.0 May 4, 2022
pppppM added a commit that referenced this pull request May 4, 2022
* [Enhance] Add extra dataloader settings in configs (#141)

* [Docs] fix md link failure in docs (#142)

* [Docs] update Cream readme

* delete 'readme.md' in model_zoo.md

* fix md link failure in docs

* [Docs] add myst_parser to extensions  in conf.py

* [Docs] delete the deprecated recommonmark

* [Docs] delete recommandmark from conf.py

* [Docs] fix md link failure and lint failture

* [Fix] Fix seed error in mmseg/train_seg.py and typos in train.md (#152)

* [Docs] update Cream readme

* delete 'readme.md' in model_zoo.md

* fix cwd docs and fix seed in #151

* delete readme of cream

* [Enhancement]Support broadcast_object_list in multi-machines & support Searcher running in single GPU (#153)

* broadcast_object_list support multi-machines

* add userwarning

* [Fix] Fix configs (#149)

* fix configs

* fix spos configs

* fix readme

* replace the official mutable_cfg with the mutable_cfg searched by ourselves

* update https prefix

Co-authored-by: pppppM <gjf_mail@126.com>

* [BUG]Support to prune models containing GroupNorm or InstanceNorm. (#144)

* suport GN and IN

* test pruner

* limit pytorch version

* fix pytest

* throw an error when tracing groupnorm with torch version under 1.6.0

Co-authored-by: caoweihan <caoweihan@sensetime.com>

* Bump version to 0.3.1

Co-authored-by: qiufeng <44188071+wutongshenqiu@users.noreply.github.com>
Co-authored-by: PJDong <1115957667@qq.com>
Co-authored-by: humu789 <88702197+humu789@users.noreply.github.com>
Co-authored-by: whcao <41630003+HIT-cwh@users.noreply.github.com>
Co-authored-by: caoweihan <caoweihan@sensetime.com>
pppppM added a commit to pppppM/mmrazor that referenced this pull request Jul 15, 2022
* [Enhance] Add extra dataloader settings in configs (open-mmlab#141)

* [Docs] fix md link failure in docs (open-mmlab#142)

* [Docs] update Cream readme

* delete 'readme.md' in model_zoo.md

* fix md link failure in docs

* [Docs] add myst_parser to extensions  in conf.py

* [Docs] delete the deprecated recommonmark

* [Docs] delete recommandmark from conf.py

* [Docs] fix md link failure and lint failture

* [Fix] Fix seed error in mmseg/train_seg.py and typos in train.md (open-mmlab#152)

* [Docs] update Cream readme

* delete 'readme.md' in model_zoo.md

* fix cwd docs and fix seed in open-mmlab#151

* delete readme of cream

* [Enhancement]Support broadcast_object_list in multi-machines & support Searcher running in single GPU (open-mmlab#153)

* broadcast_object_list support multi-machines

* add userwarning

* [Fix] Fix configs (open-mmlab#149)

* fix configs

* fix spos configs

* fix readme

* replace the official mutable_cfg with the mutable_cfg searched by ourselves

* update https prefix

Co-authored-by: pppppM <gjf_mail@126.com>

* [BUG]Support to prune models containing GroupNorm or InstanceNorm. (open-mmlab#144)

* suport GN and IN

* test pruner

* limit pytorch version

* fix pytest

* throw an error when tracing groupnorm with torch version under 1.6.0

Co-authored-by: caoweihan <caoweihan@sensetime.com>

* Bump version to 0.3.1

Co-authored-by: qiufeng <44188071+wutongshenqiu@users.noreply.github.com>
Co-authored-by: PJDong <1115957667@qq.com>
Co-authored-by: humu789 <88702197+humu789@users.noreply.github.com>
Co-authored-by: whcao <41630003+HIT-cwh@users.noreply.github.com>
Co-authored-by: caoweihan <caoweihan@sensetime.com>
pppppM added a commit to pppppM/mmrazor that referenced this pull request Jul 15, 2022
* [Enhance] Add extra dataloader settings in configs (open-mmlab#141)

* [Docs] fix md link failure in docs (open-mmlab#142)

* [Docs] update Cream readme

* delete 'readme.md' in model_zoo.md

* fix md link failure in docs

* [Docs] add myst_parser to extensions  in conf.py

* [Docs] delete the deprecated recommonmark

* [Docs] delete recommandmark from conf.py

* [Docs] fix md link failure and lint failture

* [Fix] Fix seed error in mmseg/train_seg.py and typos in train.md (open-mmlab#152)

* [Docs] update Cream readme

* delete 'readme.md' in model_zoo.md

* fix cwd docs and fix seed in open-mmlab#151

* delete readme of cream

* [Enhancement]Support broadcast_object_list in multi-machines & support Searcher running in single GPU (open-mmlab#153)

* broadcast_object_list support multi-machines

* add userwarning

* [Fix] Fix configs (open-mmlab#149)

* fix configs

* fix spos configs

* fix readme

* replace the official mutable_cfg with the mutable_cfg searched by ourselves

* update https prefix

Co-authored-by: pppppM <gjf_mail@126.com>

* [BUG]Support to prune models containing GroupNorm or InstanceNorm. (open-mmlab#144)

* suport GN and IN

* test pruner

* limit pytorch version

* fix pytest

* throw an error when tracing groupnorm with torch version under 1.6.0

Co-authored-by: caoweihan <caoweihan@sensetime.com>

* Bump version to 0.3.1

Co-authored-by: qiufeng <44188071+wutongshenqiu@users.noreply.github.com>
Co-authored-by: PJDong <1115957667@qq.com>
Co-authored-by: humu789 <88702197+humu789@users.noreply.github.com>
Co-authored-by: whcao <41630003+HIT-cwh@users.noreply.github.com>
Co-authored-by: caoweihan <caoweihan@sensetime.com>
humu789 pushed a commit to humu789/mmrazor that referenced this pull request Feb 13, 2023
* Add doc

* Remove spaces

* sovle comments

* Resolve comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants