NNI 2021 August~September Iteration Planning #3986

QuanluZhang · 2021-07-28T02:43:28Z

This is the plan for the iteration in August~September Iteration, it's a 6 weeks iteration.

Release Plan for stableness release and use case support

Release manager: @acured
Feature freeze date: ~~Sept 9~~ Sep 16
Code freeze and demo date: ~~Sept 16~~ ~~Sep 23~~ ~~Sep 28~~ Oct 9
Branch cut and next release planning date: ~~Sept 23~~ ~~Sep 30~~ Oct 13
Bugbash - Oct 22
Release date: ~~Sept 27~~ ~~Oct 8~~ ~~Oct 9~~ ~~Oct 18~~ Oct 25

NAS

Model Compression

HPO

refactor Hyper-opt based algorithm - random tuner. @liuzhe-lz Refactor Hyperopt Tuners (Stage 1) - random tuner #4118

nnictl & nnimanager & training service @SparkSnail @liuzhe-lz @acured

WebUI

bug fix - Improve trial intermediate result graph. @Lijiaoa Improve trial intermediate result graph: trials have not intermediate default key #4171
bug fix - webui prompts error message. @Lijiaoa [fix issue#4015] webui prompts error message if user set maxExperimentDuration < execDuration #4164

regression test needed, or bugs reported during the iteration by external users (need test and close the issues)

delayed

acured · 2021-09-10T06:22:09Z

Stretch goals

NAS

Model Compression

P0 - support iterative pruning @J-shang 6.18 review meeting (done) / 6.23 sample / 6.25 7.14
- P0 - implement model compression 2.0, include 8 maskers (basic pruners) 7.9 Compression v2 Stage 1 #3917
- P0 - improve pruners to more faithfully reproduce the papers @J-shang
- P2 - Refactor compress() in IterativePruner, unify support AGP, ADMM. (iterative pruner follow-up #3669 )
- P2 - Add more criteria(strategy, i.e., acc drop toleronce) (discussion needed)
P1 - MixedMaskerPruner @J-shang
P2 - Refactor of model graph generation (as an independent component) @zheng-ningxin

Training service & NNI manager && nnictl

WebUI

P0 - Adaptive UI (height) for high resolution, current window for high resolution need improvement @JunzhuZhang
- overview page: blank at the bottom
- align issue in overview page
- support searching customized string in the search box

Hyper-parameter tuning

P1 - Improve HPO benchmarks, add more concrete conclusions
- add one benchmark and one search space 7.14 HPO Benchmark Fixes and New Features #3925
P1 - support nested search space @kvartet 6.28 discussion meeting
P1 - Improve hyperband, PBT with shared storage (dependency on shared storage implementation) @acured
P1 - upgrade cuda, config_space, docker image @liuzhe-lz 7.21 Bump Python dependencies #3976
P1 - grid search does not support _name for nested search space (update doc) @kvartet
P1 - evolution tuner does not mutate sub search space for nested search space @kvartet

jupyter

jupyter restart, node process is not stopped.
jupyter does not show error message of node process.
build jupyterlab-extension as option.

Pipeline

P0 - support python 3.9 in pipeline (wait for pytorch upgrade from 1.6 to 1.9) @ultmaster Use python 3.9 on pipeline #3881
- P2 - write ipc test
P0 - fix and improve pipeline
- nnimanager ut, nni-ci-gpu-03 failed, maybe due to this machine's environment, cannot be executed through daemon, need reimage -> create a new vm

Doc

zh Doc translate.
P2 - update NAS API doc to add more supported evaluators @ultmaster
doc and demo improvement
- end2end, based on comparison of different tuners (insights from benchmark, provide reasonable result)
- demo how to share data, how to use shared storage, mainly refactor doc (view experiment from shared storage)

QuanluZhang · 2021-10-11T02:52:47Z

Bugbash manager @liuzhe-lz (check other prs)

NAS

Model Compression

support compression v2 @J-shang Compression v2 Stage 1 #3917 [Model Compression] update config list key #4074 [Model Compression] Pruning Scheduler #4089 [Model Compression] admm pruner #4116 [Model Compression] Add more Task Generator #4178 [Model Compression] Add Unit Test #4125 tester: @Fiascolsy
- tests
- (bug bash)doc tutorial
support per-channel quantization @linbinskn support dtype&scheme customization for QAT quantizer #4137 tester: @J-shang
end2end examples for image classification, object detection, audio, transformer (show reasonably good performance) tester: @J-shang
- image classification (MobileNet) @xiaowu0162 [End-to-end demo] MobileNetV2 Compression #4102

HPO

refactor Hyper-opt based algorithm - random tuner. @liuzhe-lz Refactor Hyperopt Tuners (Stage 1) - random tuner #4118 tester: @ultmaster

nnictl & nnimanager & training service @SparkSnail @liuzhe-lz @acured

experiment config doc update @J-shang Update Experiment Config doc #4131 tester: @linbinskn
improve doc: clarify the difference between training service storage and shared storage @J-shang Update shared storage doc #4217 tester: @linbinskn

WebUI

bug fix - Improve trial intermediate result graph. @Lijiaoa Improve trial intermediate result graph: trials have not intermediate default key #4171 tester: @SparkSnail
bug fix - webui prompts error message. @Lijiaoa [fix issue#4015] webui prompts error message if user set maxExperimentDuration < execDuration #4164 tester: @SparkSnail

regression test needed, or bugs reported during the iteration by external users (need test and close the issues)

liuzhe-lz · 2021-10-15T07:32:17Z

BUG BASH (Target date Friday 10/22)

PRs seemingly need to test:

model compression

chenbohua3
Fiascolsy
- @J-shang Support 'op_partial_names' in config_list #4184 Add examples for 'op_partial_names' and use more proper words. #4225
- @J-shang Fix v2 level pruner default config bug #4245
J-shang
- @Fiascolsy Compression v2 Stage 1 #3917 [Model Compression] update config list key #4074 [Model Compression] Pruning Scheduler #4089 [Model Compression] admm pruner #4116 [Model Compression] Add more Task Generator #4178
- @Fiascolsy [Model Compression] add scheduler high level api #4236
linbinskn
xiaowu0162
- @J-shang [End-to-end demo] MobileNetV2 Compression #4102
zheng-ningxin

nas

other

contributors

alibaba
- (no environment) HPO: Alibaba DSW+DLC support #4055 @weidankong need you to test, verify and confirm, thanks!
- @liuzhe-lz jupyter_ext_dev: pull request for jupyter extension 2.x and 3.x recap… #4167
- @liuzhe-lz jupyter_ext: Added reminding information when there isn't nni kicked off #4188
individuals
- @Fiascolsy added config_windows.yml for examples in /examples/trials #4121
  - @liuzhe-lz and @Fiascolsy follow up on problems found during tests.
- @ultmaster Pass ConfigSpace definition file directly to BOHB #4153 @QuanluZhang
- (no need to test) Update amc_pruner.py #4161
- (no need to test) Update ConfigSpace to 0.4.11 #4208

liuzhe-lz · 2021-10-22T08:51:30Z

To fix:

~~Tutorial for Python experiment API training service @liuzhe-lz~~ next release
lightning trainer doc @ultmaster
experiment config code default value @J-shang
HPO benchmark
Add "experimental" to CGO doc

QuanluZhang · 2021-10-24T06:45:37Z

double check gpuIndices, trialGpuNumber, @linbinskn @liuzhe-lz

cruiseliu · 2021-10-28T23:24:33Z

Model Compression

New major version of pruning framework (doc)
- Iterative pruning is more automated, users can use less code to implement iterative pruning.
- Support exporting intermediate models in the iterative pruning process.
- The implementation of the pruning algorithm is closer to the paper.
- Users can easily customize their own iterative pruning by using PruningScheduler.
- Optimize the basic pruners underlying generate mask logic, easier to extend new functions.
- Optimized the memory usage of the pruners.
MobileNetV2 end-to-end example (notebook)
Improved QAT quantizer
- support dtype and scheme customization
- support dp multi-gpu training
- support load_calibration_config
Model speed-up now supports directly loading the mask
- work item: add ModelSpeedup API reference doc
Support speed-up depth-wise convolution
Support bn-folding for LSQ quantizer
Support QAT and LSQ resume from PTQ

Neural Architecture Search

NAS benchmark (doc)
- Support benchmark table lookup in experiments
- New data preparation approach
Improved quick start doc
Experimental CGO execution engine (doc)

Hyper-Parameter Optimization

New training platform: Alibaba DSW+DLC (doc)
Support passing ConfigSpace definition directly to BOHB (doc) (thanks to khituras)
Reformatted experiment config doc
Added example config files for Windows (thanks to politecat314)
FrameworkController now supports reuse mode

Fixed Bugs

Experiment cannot start due to platform timestamp format (issue ERROR: Restful server start failed! Failed to create log dir: RangeError: Invalid time value #4077 Restful server started failed #4083)
Cannot use 1e-5 in search space (issue Exponential Notation is not allowed anymore #4080)
Dependency version conflict caused by ConfigSpace (issue NNI installation issue with ConfigSpace-0.4.7 #3909) (thanks to jexxers)
Hardware-aware SPOS example does not work (issue SPOS hardware-aware nas multi_trial.py don't work #4198)
Web UI show wrong remaining time when duration exceeds limit (issue The remaining time is error when the spent time has exceeded the duration time. #4015)
cudnn.deterministic is always set in AMC pruner (cudnn.deterministic been set while import, suggestion to avoid this kind of behavior #4117) thanks to mstczuo

And...

New emoticons!

QuanluZhang changed the title ~~NNI 2021 August~September Iteration Planning~~ NNI 2021 August Iteration Planning Jul 28, 2021

QuanluZhang changed the title ~~NNI 2021 August Iteration Planning~~ NNI 2021 August~September Iteration Planning Jul 28, 2021

QuanluZhang pinned this issue Jul 31, 2021

scarlett2018 assigned acured Aug 17, 2021

scarlett2018 added the iteration-plan label Aug 17, 2021

This was referenced Sep 26, 2021

NNI 2021 Oct~Nov Iteration Plan #4211

Closed

Runtime Error during L1FilterPruner (and other one-shot pruners) #3944

Open

liuzhe-lz closed this as completed Nov 4, 2021

liuzhe-lz unpinned this issue Nov 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NNI 2021 August~September Iteration Planning #3986

NNI 2021 August~September Iteration Planning #3986

QuanluZhang commented Jul 28, 2021 •

edited by scarlett2018

Loading

acured commented Sep 10, 2021 •

edited

Loading

QuanluZhang commented Oct 11, 2021 •

edited

Loading

liuzhe-lz commented Oct 15, 2021 •

edited

Loading

liuzhe-lz commented Oct 22, 2021 •

edited

Loading

QuanluZhang commented Oct 24, 2021 •

edited by liuzhe-lz

Loading

cruiseliu commented Oct 28, 2021 •

edited by liuzhe-lz

Loading

NNI 2021 August~September Iteration Planning #3986

NNI 2021 August~September Iteration Planning #3986

Comments

QuanluZhang commented Jul 28, 2021 • edited by scarlett2018 Loading

Release Plan for stableness release and use case support

NAS

Model Compression

HPO

nnictl & nnimanager & training service @SparkSnail @liuzhe-lz @acured

WebUI

regression test needed, or bugs reported during the iteration by external users (need test and close the issues)

delayed

acured commented Sep 10, 2021 • edited Loading

Stretch goals

NAS

Model Compression

Training service & NNI manager && nnictl

WebUI

Hyper-parameter tuning

jupyter

Pipeline

Doc

QuanluZhang commented Oct 11, 2021 • edited Loading

NAS

Model Compression

HPO

nnictl & nnimanager & training service @SparkSnail @liuzhe-lz @acured

WebUI

regression test needed, or bugs reported during the iteration by external users (need test and close the issues)

liuzhe-lz commented Oct 15, 2021 • edited Loading

model compression

nas

other

contributors

liuzhe-lz commented Oct 22, 2021 • edited Loading

QuanluZhang commented Oct 24, 2021 • edited by liuzhe-lz Loading

cruiseliu commented Oct 28, 2021 • edited by liuzhe-lz Loading

Model Compression

Neural Architecture Search

Hyper-Parameter Optimization

Fixed Bugs

And...

QuanluZhang commented Jul 28, 2021 •

edited by scarlett2018

Loading

acured commented Sep 10, 2021 •

edited

Loading

QuanluZhang commented Oct 11, 2021 •

edited

Loading

liuzhe-lz commented Oct 15, 2021 •

edited

Loading

liuzhe-lz commented Oct 22, 2021 •

edited

Loading

QuanluZhang commented Oct 24, 2021 •

edited by liuzhe-lz

Loading

cruiseliu commented Oct 28, 2021 •

edited by liuzhe-lz

Loading