-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add multi machine dist_train #114
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov Report
@@ Coverage Diff @@
## dev_v0.3.0 #114 +/- ##
==============================================
- Coverage 64.81% 63.23% -1.59%
==============================================
Files 91 91
Lines 3223 3272 +49
Branches 597 600 +3
==============================================
- Hits 2089 2069 -20
- Misses 1035 1095 +60
- Partials 99 108 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
pppppM
added a commit
to humu789/mmrazor
that referenced
this pull request
Mar 27, 2022
* support multi nodes * update training doc * fix lints * remove fixed seed
pppppM
added a commit
that referenced
this pull request
Apr 2, 2022
* [Feature] Add function to meet mmdeploy support (#102) * add init_model function for mmdeploy * fix lint * add unittest for init_xxx_model * fix lint * mv test_inference.py to test_apis directory * [Feature] Add function to meet mmdeploy support (#102) * add init_model function for mmdeploy * fix lint * add unittest for init_xxx_model * fix lint * mv test_inference.py to test_apis directory * [Refactor] Delete redundant `set_random_seed` function (#104) * refactor set_random_seed * add unittests * fix unittests error * fix lint * avoid bc breaking * [Feature] Add diff seeds to diff ranks and set torch seed in worker_init_fn (#113) * add init_random_seed * Set diff seed to diff workers * [Feature] Add multi machine dist_train (#114) * support multi nodes * update training doc * fix lints * remove fixed seed * fix ddp wrapper registry (#128) * [Docs] Add brief installation steps in README(_zh-CN).md (#121) * Add brief installation * add brief installtion ref to mmediting pr#816 Co-authored-by: caoweihan <caoweihan@sensetime.com> * [BUG]Fix bugs in pruner (#126) * fix bugs in pruner when pruning models with shared modules * pruner can trace models with dilation conv2d * fix deploy_subnet * fix add_pruning_attrs * fix bugs in modify_forward * fix lint * fix StructurePruner * test tracing models with shared modules Co-authored-by: caoweihan <caoweihan@sensetime.com> * [Docs]Add some more details to docs (#133) * add docs for dataset * add cfg-options for distillation * fix docs Co-authored-by: caoweihan <caoweihan@sensetime.com> * reset norm running status after prepare_from_supernet (#81) * [Improvement]Sync train api (#115) Co-authored-by: caoweihan <caoweihan@sensetime.com> * [Feature]Support Relational Knowledge Distillation (#127) * add rkd * add rkd pytest * add rkd configs * fix readme * fix rkd * split rkd loss to distance-wise and angle-wise losses * rename rkd losses * add rkd metaflie * add rkd related links * rename rkd metafile and add to model index * delete cifar100 Co-authored-by: caoweihan <caoweihan@sensetime.com> Co-authored-by: pppppM <gjf_mail@126.com> Co-authored-by: qiufeng <44188071+wutongshenqiu@users.noreply.github.com> Co-authored-by: wutongshenqiu <690364065@qq.com> Co-authored-by: whcao <41630003+HIT-cwh@users.noreply.github.com> Co-authored-by: caoweihan <caoweihan@sensetime.com>
pppppM
added a commit
to pppppM/mmrazor
that referenced
this pull request
Jul 15, 2022
* [Feature] Add function to meet mmdeploy support (open-mmlab#102) * add init_model function for mmdeploy * fix lint * add unittest for init_xxx_model * fix lint * mv test_inference.py to test_apis directory * [Feature] Add function to meet mmdeploy support (open-mmlab#102) * add init_model function for mmdeploy * fix lint * add unittest for init_xxx_model * fix lint * mv test_inference.py to test_apis directory * [Refactor] Delete redundant `set_random_seed` function (open-mmlab#104) * refactor set_random_seed * add unittests * fix unittests error * fix lint * avoid bc breaking * [Feature] Add diff seeds to diff ranks and set torch seed in worker_init_fn (open-mmlab#113) * add init_random_seed * Set diff seed to diff workers * [Feature] Add multi machine dist_train (open-mmlab#114) * support multi nodes * update training doc * fix lints * remove fixed seed * fix ddp wrapper registry (open-mmlab#128) * [Docs] Add brief installation steps in README(_zh-CN).md (open-mmlab#121) * Add brief installation * add brief installtion ref to mmediting pr#816 Co-authored-by: caoweihan <caoweihan@sensetime.com> * [BUG]Fix bugs in pruner (open-mmlab#126) * fix bugs in pruner when pruning models with shared modules * pruner can trace models with dilation conv2d * fix deploy_subnet * fix add_pruning_attrs * fix bugs in modify_forward * fix lint * fix StructurePruner * test tracing models with shared modules Co-authored-by: caoweihan <caoweihan@sensetime.com> * [Docs]Add some more details to docs (open-mmlab#133) * add docs for dataset * add cfg-options for distillation * fix docs Co-authored-by: caoweihan <caoweihan@sensetime.com> * reset norm running status after prepare_from_supernet (open-mmlab#81) * [Improvement]Sync train api (open-mmlab#115) Co-authored-by: caoweihan <caoweihan@sensetime.com> * [Feature]Support Relational Knowledge Distillation (open-mmlab#127) * add rkd * add rkd pytest * add rkd configs * fix readme * fix rkd * split rkd loss to distance-wise and angle-wise losses * rename rkd losses * add rkd metaflie * add rkd related links * rename rkd metafile and add to model index * delete cifar100 Co-authored-by: caoweihan <caoweihan@sensetime.com> Co-authored-by: pppppM <gjf_mail@126.com> Co-authored-by: qiufeng <44188071+wutongshenqiu@users.noreply.github.com> Co-authored-by: wutongshenqiu <690364065@qq.com> Co-authored-by: whcao <41630003+HIT-cwh@users.noreply.github.com> Co-authored-by: caoweihan <caoweihan@sensetime.com>
pppppM
added a commit
to pppppM/mmrazor
that referenced
this pull request
Jul 15, 2022
* [Feature] Add function to meet mmdeploy support (open-mmlab#102) * add init_model function for mmdeploy * fix lint * add unittest for init_xxx_model * fix lint * mv test_inference.py to test_apis directory * [Feature] Add function to meet mmdeploy support (open-mmlab#102) * add init_model function for mmdeploy * fix lint * add unittest for init_xxx_model * fix lint * mv test_inference.py to test_apis directory * [Refactor] Delete redundant `set_random_seed` function (open-mmlab#104) * refactor set_random_seed * add unittests * fix unittests error * fix lint * avoid bc breaking * [Feature] Add diff seeds to diff ranks and set torch seed in worker_init_fn (open-mmlab#113) * add init_random_seed * Set diff seed to diff workers * [Feature] Add multi machine dist_train (open-mmlab#114) * support multi nodes * update training doc * fix lints * remove fixed seed * fix ddp wrapper registry (open-mmlab#128) * [Docs] Add brief installation steps in README(_zh-CN).md (open-mmlab#121) * Add brief installation * add brief installtion ref to mmediting pr#816 Co-authored-by: caoweihan <caoweihan@sensetime.com> * [BUG]Fix bugs in pruner (open-mmlab#126) * fix bugs in pruner when pruning models with shared modules * pruner can trace models with dilation conv2d * fix deploy_subnet * fix add_pruning_attrs * fix bugs in modify_forward * fix lint * fix StructurePruner * test tracing models with shared modules Co-authored-by: caoweihan <caoweihan@sensetime.com> * [Docs]Add some more details to docs (open-mmlab#133) * add docs for dataset * add cfg-options for distillation * fix docs Co-authored-by: caoweihan <caoweihan@sensetime.com> * reset norm running status after prepare_from_supernet (open-mmlab#81) * [Improvement]Sync train api (open-mmlab#115) Co-authored-by: caoweihan <caoweihan@sensetime.com> * [Feature]Support Relational Knowledge Distillation (open-mmlab#127) * add rkd * add rkd pytest * add rkd configs * fix readme * fix rkd * split rkd loss to distance-wise and angle-wise losses * rename rkd losses * add rkd metaflie * add rkd related links * rename rkd metafile and add to model index * delete cifar100 Co-authored-by: caoweihan <caoweihan@sensetime.com> Co-authored-by: pppppM <gjf_mail@126.com> Co-authored-by: qiufeng <44188071+wutongshenqiu@users.noreply.github.com> Co-authored-by: wutongshenqiu <690364065@qq.com> Co-authored-by: whcao <41630003+HIT-cwh@users.noreply.github.com> Co-authored-by: caoweihan <caoweihan@sensetime.com>
humu789
pushed a commit
to humu789/mmrazor
that referenced
this pull request
Feb 13, 2023
* add shape constantofshape unittest for ncnn * fix lint * standarize import * fix lint * reply for code review * reply for code review * fix lint * remove some hardcode * fix lint * reply for code review * test gather and fix gather cpp code * fix yapf * fix clang-format * reply for code review * reply for code review * fix lint
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Add training startup documentation
Support training with multi nodes
ref: open-mmlab/mmselfsup#232
Modification
Add training startup documentation
Update
tools/xxx/dist_train.sh
andtools/xxx/dist_test.sh