Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

NNI 2021 August~September Iteration Planning #3986

Closed
39 of 78 tasks
QuanluZhang opened this issue Jul 28, 2021 · 6 comments
Closed
39 of 78 tasks

NNI 2021 August~September Iteration Planning #3986

QuanluZhang opened this issue Jul 28, 2021 · 6 comments
Assignees

Comments

@QuanluZhang
Copy link
Contributor

QuanluZhang commented Jul 28, 2021

This is the plan for the iteration in August~September Iteration, it's a 6 weeks iteration.

Release Plan for stableness release and use case support

  • Release manager: @acured
  • Feature freeze date: Sept 9 Sep 16
  • Code freeze and demo date: Sept 16 Sep 23 Sep 28 Oct 9
  • Branch cut and next release planning date: Sept 23 Sep 30 Oct 13
  • Bugbash - Oct 22
  • Release date: Sept 27 Oct 8 Oct 9 Oct 18 Oct 25

NAS

Model Compression

HPO

nnictl & nnimanager & training service @SparkSnail @liuzhe-lz @acured

WebUI

regression test needed, or bugs reported during the iteration by external users (need test and close the issues)

delayed

  • classification @jonsnows Aug 26
    • investigate search space and common models and chose one to start.
    • review meeting for investigate. Aug 30
    • implement NAS (run on openpai ne cluster). Sep 8 Sep 10
    • review meeting for result(demo). Sep 15 Sep 22 Sep 29
    • improvement and PR(code and doc)
  • object detection @QuanluZhang Aug 30 Sep 6 Sep 8 Sep 9 Sep 15
    • investigate search space and common models and chose one to start.
    • review meeting for investigate. Sep 16
    • implement NAS, baseline model result
    • review meeting for result(demo).
    • improvement and PR(code and doc)
@QuanluZhang QuanluZhang changed the title NNI 2021 August~September Iteration Planning NNI 2021 August Iteration Planning Jul 28, 2021
@QuanluZhang QuanluZhang changed the title NNI 2021 August Iteration Planning NNI 2021 August~September Iteration Planning Jul 28, 2021
@QuanluZhang QuanluZhang pinned this issue Jul 31, 2021
@acured
Copy link
Contributor

acured commented Sep 10, 2021

Stretch goals

NAS

  • P2 - refactor nas examples @ultmaster
  • P2 - support debug mode for NAS experiment @ultmaster
  • P0 - Add examples for NAS-Bench-series @ultmaster
    • P2 - 301
  • P2 - Evaluate existing strategies (multi-trial) on NAS benchmarks @ultmaster
  • P0 - Support one system-level optimization: CSE optimization to deduplicate computation @hzhua pr [Retiarii] cross-graph optimization: device placement and input deduplication #3202
  • P0 - Improve training service to support CSE optimization @hzhua extend reusable training service to support placement constraint #3897
  • P2 - TensorFlow support: graph converter and support inline mutation API @liuzhe-lz
    • add raw framework
    • add example 7.12
  • P0 - Migrate FBNet from NAS v1.0 to Retiarii framework @ali
  • P2 - AutoDropout @skeletondyh 7.16
  • P1 - Evaluate one-shot strategies on NAS benchmarks @ultmaster
  • P1 - Further improve the logic of graph generation and code generation (Further integrated with TorchScript) @QuanluZhang
  • P1 - Test classic nas tuners on NNI NAS benchmark and report evaluation results @ultmaster
  • P2 - Migrate Cream from NAS v1.0 to Retiarii framework @ yuhangchen
  • P2 - refactor of Retiarii execution engine @ultmaster
    • support local debug mode for pure-python execution engine
    • support weight transfer from supernet to submodel
    • support export of top model for pure-python execution engine
    • P2 - refactor of "self._cur_samples" in mutators
    • P2 - support validation for other strategies
  • P1 - Review strategy/experiment/rest stop condition @QuanluZhang
  • P1 - support graph-based NAS algorithms @ultmaster @skeletondyh
  • P1 - Deal with strategy failure
  • NAS runtime improvement Sep 28
    • provide option to keep webui when experiment is done (add arg in run()) @QuanluZhang Sep 15 Sep 24
    • support hyper-parameter figure on webui for retiarii experiment @QuanluZhang Sep 15 Sep 24
    • support tuner_no_more_trial for retiarii experiment @QuanluZhang Sep 15 Sep 24

Model Compression

  • P0 - support iterative pruning @J-shang 6.18 review meeting (done) / 6.23 sample / 6.25 7.14
    • P0 - implement model compression 2.0, include 8 maskers (basic pruners) 7.9 Compression v2 Stage 1 #3917
    • P0 - improve pruners to more faithfully reproduce the papers @J-shang
    • P2 - Refactor compress() in IterativePruner, unify support AGP, ADMM. (iterative pruner follow-up  #3669 )
    • P2 - Add more criteria(strategy, i.e., acc drop toleronce) (discussion needed)
  • P1 - MixedMaskerPruner @J-shang
  • P2 - Refactor of model graph generation (as an independent component) @zheng-ningxin

Training service & NNI manager && nnictl

WebUI

  • P0 - Adaptive UI (height) for high resolution, current window for high resolution need improvement @JunzhuZhang
    • overview page: blank at the bottom
    • align issue in overview page
    • support searching customized string in the search box

Hyper-parameter tuning

  • P1 - Improve HPO benchmarks, add more concrete conclusions
  • P1 - support nested search space @kvartet 6.28 discussion meeting
  • P1 - Improve hyperband, PBT with shared storage (dependency on shared storage implementation) @acured
  • P1 - upgrade cuda, config_space, docker image @liuzhe-lz 7.21 Bump Python dependencies #3976
  • P1 - grid search does not support _name for nested search space (update doc) @kvartet
  • P1 - evolution tuner does not mutate sub search space for nested search space @kvartet

jupyter

  • jupyter restart, node process is not stopped.
  • jupyter does not show error message of node process.
  • build jupyterlab-extension as option.

Pipeline

  • P0 - support python 3.9 in pipeline (wait for pytorch upgrade from 1.6 to 1.9) @ultmaster Use python 3.9 on pipeline #3881
    • P2 - write ipc test
  • P0 - fix and improve pipeline
    • nnimanager ut, nni-ci-gpu-03 failed, maybe due to this machine's environment, cannot be executed through daemon, need reimage -> create a new vm

Doc

  • zh Doc translate.
  • P2 - update NAS API doc to add more supported evaluators @ultmaster
  • doc and demo improvement
    • end2end, based on comparison of different tuners (insights from benchmark, provide reasonable result)
    • demo how to share data, how to use shared storage, mainly refactor doc (view experiment from shared storage)

@QuanluZhang
Copy link
Contributor Author

QuanluZhang commented Oct 11, 2021

Bugbash manager @liuzhe-lz (check other prs)

NAS

Model Compression

HPO

nnictl & nnimanager & training service @SparkSnail @liuzhe-lz @acured

WebUI

regression test needed, or bugs reported during the iteration by external users (need test and close the issues)

@liuzhe-lz
Copy link
Contributor

liuzhe-lz commented Oct 15, 2021

BUG BASH (Target date Friday 10/22)

PRs seemingly need to test:

model compression

nas

other

contributors

@liuzhe-lz
Copy link
Contributor

liuzhe-lz commented Oct 22, 2021

To fix:

  • Tutorial for Python experiment API training service @liuzhe-lz next release
  • lightning trainer doc @ultmaster
  • experiment config code default value @J-shang
  • HPO benchmark
  • Add "experimental" to CGO doc

@QuanluZhang
Copy link
Contributor Author

QuanluZhang commented Oct 24, 2021

@cruiseliu
Copy link
Contributor

cruiseliu commented Oct 28, 2021

Model Compression

  • New major version of pruning framework (doc)
    • Iterative pruning is more automated, users can use less code to implement iterative pruning.
    • Support exporting intermediate models in the iterative pruning process.
    • The implementation of the pruning algorithm is closer to the paper.
    • Users can easily customize their own iterative pruning by using PruningScheduler.
    • Optimize the basic pruners underlying generate mask logic, easier to extend new functions.
    • Optimized the memory usage of the pruners.
  • MobileNetV2 end-to-end example (notebook)
  • Improved QAT quantizer
    • support dtype and scheme customization
    • support dp multi-gpu training
    • support load_calibration_config
  • Model speed-up now supports directly loading the mask
    • work item: add ModelSpeedup API reference doc
  • Support speed-up depth-wise convolution
  • Support bn-folding for LSQ quantizer
  • Support QAT and LSQ resume from PTQ

Neural Architecture Search

  • NAS benchmark (doc)
    • Support benchmark table lookup in experiments
    • New data preparation approach
  • Improved quick start doc
  • Experimental CGO execution engine (doc)

Hyper-Parameter Optimization

  • New training platform: Alibaba DSW+DLC (doc)
  • Support passing ConfigSpace definition directly to BOHB (doc) (thanks to khituras)
  • Reformatted experiment config doc
  • Added example config files for Windows (thanks to politecat314)
  • FrameworkController now supports reuse mode

Fixed Bugs

And...

@liuzhe-lz liuzhe-lz unpinned this issue Nov 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants