Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

[Retiarii] refactor of NAS doc #3785

Merged
merged 19 commits into from
Jun 13, 2021
Merged

Conversation

QuanluZhang
Copy link
Contributor

@QuanluZhang QuanluZhang commented Jun 9, 2021

  • promote Retiarii framework as default NAS framework on NNI
  • remove documents of NNI NAS 1.0 (added reference to NNI NAS 1.0 of v2.2)
  • make pure-python execution engine as default execution engine for Retiarii

2. Add ``config.execution_engine = 'py'`` to ``RetiariiExeConfig``.
3. If you need to export top models, formatter needs to be set to ``dict``. Exporting ``code`` won't work with this engine.

.. note:: You should always use ``super().__init__()` instead of ``super(MyNetwork, self).__init__()`` in the PyTorch model, because the latter one has issues with model wrapper.
Copy link
Contributor

@ultmaster ultmaster Jun 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing ` at ``super().__init__()`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

* - `TPE Strategy <./ApiReference.rst#nni.retiarii.strategy.TPEStrategy>`__
- Sampling new model(s) from user defined model space using `TPE algorithm <https://papers.nips.cc/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf>`__ . (``nni.retiarii.strategy.TPEStrategy``)
* - `RL Strategy <./ApiReference.rst#nni.retiarii.strategy.PolicyBasedRL>`__
- It uses `PPO algorithm <https://arxiv.org/abs/1707.06347>`__ to sample new model(s) from user defined model space. (``nni.retiarii.strategy.PolicyBasedRL``)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not fair to say so. It supports a series of policy-based RL policies based on tianshou. PPO is used by default.


* `nn.Repeat <./ApiReference.rst#nni.retiarii.nn.pytorch.Repeat>`__. Repeat a block by a variable number of times.

* `nn.Cell <./ApiReference.rst#nni.retiarii.nn.pytorch.Cell>`__. `This cell structure is popularly used in NAS literature <https://arxiv.org/abs/1611.01578>`__. A cell consists of multiple "nodes". Each node is a sum of multiple operators. Each operator is chosen from ``op_candidates``, and takes one input from previous nodes and predecessors. Predecessor means the input of cell. The output of cell is the concatenation of some of the nodes in the cell (currently all the nodes).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should cite the high-level idea from papers and leave the rest of the technical details to API documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed some words. we can leave the refactor to the next pr

@kvartet
Copy link
Contributor

kvartet commented Jun 10, 2021

need to update toctree in the docs/en_US/nas.rst, README.md, and docs/en_US/_templates/index.html.

minor inline changings
minor words inline changes
Copy link
Member

@scarlett2018 scarlett2018 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first batch of review comments

docs/en_US/NAS/NasVisualization.rst Outdated Show resolved Hide resolved
docs/en_US/NAS/Overview.rst Outdated Show resolved Hide resolved
docs/en_US/NAS/Overview.rst Show resolved Hide resolved
docs/en_US/NAS/Overview.rst Outdated Show resolved Hide resolved

With this motivation, our ambition is to provide a unified architecture in NNI, accelerate innovations on NAS, and apply state-of-the-art algorithms to real-world problems faster.
Please refer to `Quick Start <./QuickStart.rst>`__ for how to run a NAS experiment on NNI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个放后面as reference link吧?overview忽然跑出来一个quick start好奇怪。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

docs/en_US/NAS/Overview.rst Outdated Show resolved Hide resolved
docs/en_US/NAS/Overview.rst Show resolved Hide resolved
docs/en_US/NAS/OneshotTrainer.rst Show resolved Hide resolved
One-shot NAS
============

Before reading this tutorial, we highly recommend you to first go through the tutorial of how to `define a model space <./QuickStart.rst#define-your-model-space>`__.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above.


Before reading this tutorial, we highly recommend you to first go through the tutorial of how to `define a model space <./QuickStart.rst#define-your-model-space>`__.

Model Search with One-shot Trainer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to consist the terminology used in the doc. sometimes we use "model search" sometimes we use "model explore". given the paper was setting Retiarii as model exploration framework, we might want to always use model exploration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more: in the overview, we can "one-shot" the approach for exploration. and here we call it Trainer...

trainer.fit()
final_architecture = trainer.export()

**Format of the exported architecture.** TBD.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是还没实现,还是还没写doc呀。。。

docs/en_US/NAS/legacy/TextNAS.rst Outdated Show resolved Hide resolved
docs/en_US/NAS/legacy/SearchSpaceZoo.rst Outdated Show resolved Hide resolved
@@ -0,0 +1,11 @@
#####################
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which file will refer this rst?

@@ -0,0 +1,96 @@
Customize A New Model Evaluator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trainer equals to evaluator?

Customize A New Model Evaluator
===============================

Model Evaluator is necessary to evaluate the performance of new explored models. A model evaluator usually includes training, validating and testing of a single model. We provide two ways for users to write a new model evaluator, which will be demonstrated below respectively.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我对这个confuse的点比较多,需要call一下~~

Execution Engines
=================

Execution engine is for running NAS experiment. NNI supports three execution engines, each of them has their own characteristics. **Pure-python execution engine** does not have special requirement on user model, it supports the model space expressed with `inline mutation APIs <./MutationPrimitives.rst>`__. **Graph-based execution engine** requires that user's model should be able to be parsed by `TorchScript <https://pytorch.org/docs/stable/jit.html>`__, it supports model space expressed with both `inline mutation APIs <./MutationPrimitives.rst>`__ and `mutators <./Mutators.rst>`__. **CGO execution engine** has the same requirement and ability to graph-based execution engine, it further enables cross-model optimizations, which makes model space exploration faster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NAS experiment --> Retiarii Experiment (专有名词)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some inline changes and committed the change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: if CGO execution engine are all the same but faster, why don't we only support CGO engine? (when it is ready)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: if CGO execution engine are all the same but faster, why don't we only support CGO engine? (when it is ready)

in short-term, no. Users may not want to do such optimization, just want to use mutator for debug purpose. in this case, graph-based execution engine is a good choice


1. Add ``@nni.retiarii.model_wrapper`` decorator outside the whole PyTorch model.
2. If you need to export top models, formatter needs to be set to ``dict``. Exporting ``code`` won't work with this engine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this part.... formatter?? exporting code????

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this point, as pure-python execution engine has become the default one. no need to set formatter any more.
to answer your question, top models can be exported into different format, we can directly export source code of top model, which is code format. we can also export a dict which records all the chosen values, which is dict format.

docs/en_US/NAS/ExecutionEngines.rst Show resolved Hide resolved
CGO Execution Engine
--------------------

CGO execution engine does cross-model optimizations based on the graph-based execution engine. This execution engine will be release in v2.4.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an issue for this work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

docs/en_US/NAS/ExplorationStrategies.rst Show resolved Hide resolved
docs/en_US/NAS/ExplorationStrategies.rst Show resolved Hide resolved
@@ -1,6 +1,8 @@
FBNet
======

.. note:: This one-shot NAS is still implemented under NNI NAS 1.0, and will migrated to Retiarii framework in v2.4.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an issue for this migration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

docs/en_US/nas.rst Show resolved Hide resolved
to accelerate innovations on NAS, and apply state-of-art algorithms on real world problems faster.
For details, please refer to the following tutorials:
Thus, we design `Retiarii <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__. It is a deep learning framework that supports the exploratory training on a neural network model space, rather than on a single neural network model.
Exploratory training with Retiarii allows user to express various search space for *Neural Architecture Search* and *Hyper-Parameter Tuning* with high flexibility.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

various search space -> various search spaces

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -197,9 +177,17 @@ After all the above are prepared, it is time to start an experiment to do the mo

The complete code of a simple MNIST example can be found :githublink:`here <examples/nas/multi-trial/mnist/search.py>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we mention that the Retiarii can run on different training platforms, not just in the local mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion, update


To make users easily express a model space within their PyTorch/TensorFlow model, NNI provides some inline mutation APIs as shown below.

* `nn.LayerChoice <./ApiReference.rst#nni.retiarii.nn.pytorch.LayerChoice>`__. It allows users to put several candidate operations (e.g., PyTorch modules), one of them is chosen in each explored model. Note that if the candidate is a user-defined module, it should be decorated as a `basic unit <./Advanced.rst>`__ with ``@basic_unit``. In the following example, ``ops.PoolBN`` and ``ops.SepConv`` should be decorated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basic unit is a broken link

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! removed this sentence

stride=stride
)

``label`` is used by mutator to identify this placeholder. The other parameters are the information that are required by mutator. They can be accessed from ``node.operation.parameters`` as a dict, it could include any information that users want to put to pass it to user defined mutator. The complete example code can be found in :githublink:`Mnasnet base model <examples/nas/multi-trial/mnasnet/base_mnasnet.py>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. that is required by ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Multi-trial NAS
===============

In multi-trial NAS, users need model evaluator to evaluate the performance of each sampled model, and need an exploration strategy to sample models from a defined model space. Here, users could use NNI provided model evaluators or write their own model evalutor. They can simply choose a exploration strategy. Advanced users can also customize new exploration strategy. For a simple example about how to run a multi-trial NAS experiment, please refer to `Quick Start <./QuickStart.rst>`__.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better to introduce strategy first and then evaluator? (including the toctree)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both evaluator and strategy is necessary, let's leave the update in the next pr

Usage of Model Evaluator
------------------------

In multi-trial NAS, a sampled model should be able to be executed on a remote machine or a training platform (e.g., AzureML, OpenPAI). Thus, both the model and its model evaluator should be correctly serialized. To make NNI correctly serialize model evaluator, users should apply ``serialize`` on some of their functions and objects
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A full stop is missing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


NNI provides some commonly used model evaluators for users' convenience. If these model evaluators do not meet users' requirement, they can customize new model evaluators following the tutorial `here <./WriteTrainer.rst>`__.

.. autoclass:: nni.retiarii.evaluator.pytorch.lightning.Classification
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Lightning documentation in the docstring is invalid, also in the below class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix in this commit

Copy link
Contributor

@kvartet kvartet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some questions:

  1. the API reference in the One-shot algorithms doc(like NAS/ENAS.rst) is NAS 1.0 API, please update them to the Retiarii API.
  2. There is a typo on the first line of the NAS/Benchmarks.rst, it should be "improve" instead of "imporve", would you mind fixing it in this pr?
  3. Shall we remove PPO Tuner and Network Morphism Tuner in the HPO part in this pr?
  4. I saw this comment, but I didn't find how to do that in the documentation. Did I miss some part?
  5. In multi-trial NAS, we can use various training platforms supported by NNI. Do we need to tell users how to use it in this documentation?

Model Search with One-shot Trainer
----------------------------------

With a defined model space, users can explore the space in two ways. One is using strategy and single-arch evaluator as demonstrated `here <./Tutorial.rst#explore-the-defined-model-space>`__. The other is using one-shot trainer, which consumes much less computational resource compared to the first one. In this tutorial we focus on this one-shot approach. The principle of one-shot approach is combining all the models in a model space into one big model (usually called super-model or super-graph). It takes charge of both search, training and testing, by training and evaluating this big model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broken link

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for finding out the broken link. updated

@QuanluZhang
Copy link
Contributor Author

QuanluZhang commented Jun 11, 2021

some questions:

  1. the API reference in the One-shot algorithms doc(like NAS/ENAS.rst) is NAS 1.0 API, please update them to the Retiarii API.
    -> updated
  2. There is a typo on the first line of the NAS/Benchmarks.rst, it should be "improve" instead of "imporve", would you mind fixing it in this pr?
    -> updated
  3. Shall we remove PPO Tuner and Network Morphism Tuner in the HPO part in this pr?
    -> removed PPO, keep Network Morphism for now
  4. I saw this comment, but I didn't find how to do that in the documentation. Did I miss some part?
    -> replied in this issue
  5. In multi-trial NAS, we can use various training platforms supported by NNI. Do we need to tell users how to use it in this documentation?
    -> added

@kvartet
Copy link
Contributor

kvartet commented Jun 11, 2021

need to update toctree in the docs/en_US/nas.rst, README.md, and docs/en_US/_templates/index.html.

when updating README.md, should update docs/en_US/_templates/index.html too, which is the homepage of the readthedocs

@kvartet
Copy link
Contributor

kvartet commented Jun 11, 2021

  1. the API reference in the One-shot algorithms doc(like NAS/ENAS.rst) is NAS 1.0 API, please update them to the Retiarii API.

not just in ENAS.rst, the API reference in DARTS.rst, SPOS.rst, Proxylessnas.rst are all outdated.

@QuanluZhang
Copy link
Contributor Author

  1. the API reference in the One-shot algorithms doc(like NAS/ENAS.rst) is NAS 1.0 API, please update them to the Retiarii API.

not just in ENAS.rst, the API reference in DARTS.rst, SPOS.rst, Proxylessnas.rst are all outdated.

updated, thanks for finding out this issue!

@ultmaster ultmaster merged commit 4146c71 into microsoft:master Jun 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants