-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Conversation
QuanluZhang
commented
Jun 9, 2021
•
edited
Loading
edited
- promote Retiarii framework as default NAS framework on NNI
- remove documents of NNI NAS 1.0 (added reference to NNI NAS 1.0 of v2.2)
- make pure-python execution engine as default execution engine for Retiarii
docs/en_US/NAS/ExecutionEngines.rst
Outdated
2. Add ``config.execution_engine = 'py'`` to ``RetiariiExeConfig``. | ||
3. If you need to export top models, formatter needs to be set to ``dict``. Exporting ``code`` won't work with this engine. | ||
|
||
.. note:: You should always use ``super().__init__()` instead of ``super(MyNetwork, self).__init__()`` in the PyTorch model, because the latter one has issues with model wrapper. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing `
at ``super().__init__()`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
* - `TPE Strategy <./ApiReference.rst#nni.retiarii.strategy.TPEStrategy>`__ | ||
- Sampling new model(s) from user defined model space using `TPE algorithm <https://papers.nips.cc/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf>`__ . (``nni.retiarii.strategy.TPEStrategy``) | ||
* - `RL Strategy <./ApiReference.rst#nni.retiarii.strategy.PolicyBasedRL>`__ | ||
- It uses `PPO algorithm <https://arxiv.org/abs/1707.06347>`__ to sample new model(s) from user defined model space. (``nni.retiarii.strategy.PolicyBasedRL``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not fair to say so. It supports a series of policy-based RL policies based on tianshou. PPO is used by default.
|
||
* `nn.Repeat <./ApiReference.rst#nni.retiarii.nn.pytorch.Repeat>`__. Repeat a block by a variable number of times. | ||
|
||
* `nn.Cell <./ApiReference.rst#nni.retiarii.nn.pytorch.Cell>`__. `This cell structure is popularly used in NAS literature <https://arxiv.org/abs/1611.01578>`__. A cell consists of multiple "nodes". Each node is a sum of multiple operators. Each operator is chosen from ``op_candidates``, and takes one input from previous nodes and predecessors. Predecessor means the input of cell. The output of cell is the concatenation of some of the nodes in the cell (currently all the nodes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should cite the high-level idea from papers and leave the rest of the technical details to API documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed some words. we can leave the refactor to the next pr
need to update |
minor inline changings
minor words inline changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first batch of review comments
docs/en_US/NAS/Overview.rst
Outdated
|
||
With this motivation, our ambition is to provide a unified architecture in NNI, accelerate innovations on NAS, and apply state-of-the-art algorithms to real-world problems faster. | ||
Please refer to `Quick Start <./QuickStart.rst>`__ for how to run a NAS experiment on NNI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个放后面as reference link吧?overview忽然跑出来一个quick start好奇怪。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
One-shot NAS | ||
============ | ||
|
||
Before reading this tutorial, we highly recommend you to first go through the tutorial of how to `define a model space <./QuickStart.rst#define-your-model-space>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above.
|
||
Before reading this tutorial, we highly recommend you to first go through the tutorial of how to `define a model space <./QuickStart.rst#define-your-model-space>`__. | ||
|
||
Model Search with One-shot Trainer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might want to consist the terminology used in the doc. sometimes we use "model search" sometimes we use "model explore". given the paper was setting Retiarii as model exploration framework, we might want to always use model exploration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more: in the overview, we can "one-shot" the approach for exploration. and here we call it Trainer...
trainer.fit() | ||
final_architecture = trainer.export() | ||
|
||
**Format of the exported architecture.** TBD. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是还没实现,还是还没写doc呀。。。
@@ -0,0 +1,11 @@ | |||
##################### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which file will refer this rst?
docs/en_US/NAS/WriteTrainer.rst
Outdated
@@ -0,0 +1,96 @@ | |||
Customize A New Model Evaluator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trainer equals to evaluator?
docs/en_US/NAS/WriteTrainer.rst
Outdated
Customize A New Model Evaluator | ||
=============================== | ||
|
||
Model Evaluator is necessary to evaluate the performance of new explored models. A model evaluator usually includes training, validating and testing of a single model. We provide two ways for users to write a new model evaluator, which will be demonstrated below respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我对这个confuse的点比较多,需要call一下~~
docs/en_US/NAS/ExecutionEngines.rst
Outdated
Execution Engines | ||
================= | ||
|
||
Execution engine is for running NAS experiment. NNI supports three execution engines, each of them has their own characteristics. **Pure-python execution engine** does not have special requirement on user model, it supports the model space expressed with `inline mutation APIs <./MutationPrimitives.rst>`__. **Graph-based execution engine** requires that user's model should be able to be parsed by `TorchScript <https://pytorch.org/docs/stable/jit.html>`__, it supports model space expressed with both `inline mutation APIs <./MutationPrimitives.rst>`__ and `mutators <./Mutators.rst>`__. **CGO execution engine** has the same requirement and ability to graph-based execution engine, it further enables cross-model optimizations, which makes model space exploration faster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAS experiment --> Retiarii Experiment (专有名词)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some inline changes and committed the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: if CGO execution engine are all the same but faster, why don't we only support CGO engine? (when it is ready)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: if CGO execution engine are all the same but faster, why don't we only support CGO engine? (when it is ready)
in short-term, no. Users may not want to do such optimization, just want to use mutator for debug purpose. in this case, graph-based execution engine is a good choice
|
||
1. Add ``@nni.retiarii.model_wrapper`` decorator outside the whole PyTorch model. | ||
2. If you need to export top models, formatter needs to be set to ``dict``. Exporting ``code`` won't work with this engine. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this part.... formatter?? exporting code????
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this point, as pure-python execution engine has become the default one. no need to set formatter any more.
to answer your question, top models can be exported into different format, we can directly export source code of top model, which is code
format. we can also export a dict which records all the chosen values, which is dict
format.
docs/en_US/NAS/ExecutionEngines.rst
Outdated
CGO Execution Engine | ||
-------------------- | ||
|
||
CGO execution engine does cross-model optimizations based on the graph-based execution engine. This execution engine will be release in v2.4. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an issue for this work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
docs/en_US/NAS/FBNet.rst
Outdated
@@ -1,6 +1,8 @@ | |||
FBNet | |||
====== | |||
|
|||
.. note:: This one-shot NAS is still implemented under NNI NAS 1.0, and will migrated to Retiarii framework in v2.4. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an issue for this migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
docs/en_US/nas.rst
Outdated
to accelerate innovations on NAS, and apply state-of-art algorithms on real world problems faster. | ||
For details, please refer to the following tutorials: | ||
Thus, we design `Retiarii <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__. It is a deep learning framework that supports the exploratory training on a neural network model space, rather than on a single neural network model. | ||
Exploratory training with Retiarii allows user to express various search space for *Neural Architecture Search* and *Hyper-Parameter Tuning* with high flexibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
various search space -> various search spaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
docs/en_US/NAS/QuickStart.rst
Outdated
@@ -197,9 +177,17 @@ After all the above are prepared, it is time to start an experiment to do the mo | |||
|
|||
The complete code of a simple MNIST example can be found :githublink:`here <examples/nas/multi-trial/mnist/search.py>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we mention that the Retiarii can run on different training platforms, not just in the local mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good suggestion, update
|
||
To make users easily express a model space within their PyTorch/TensorFlow model, NNI provides some inline mutation APIs as shown below. | ||
|
||
* `nn.LayerChoice <./ApiReference.rst#nni.retiarii.nn.pytorch.LayerChoice>`__. It allows users to put several candidate operations (e.g., PyTorch modules), one of them is chosen in each explored model. Note that if the candidate is a user-defined module, it should be decorated as a `basic unit <./Advanced.rst>`__ with ``@basic_unit``. In the following example, ``ops.PoolBN`` and ``ops.SepConv`` should be decorated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basic unit
is a broken link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! removed this sentence
docs/en_US/NAS/Mutators.rst
Outdated
stride=stride | ||
) | ||
|
||
``label`` is used by mutator to identify this placeholder. The other parameters are the information that are required by mutator. They can be accessed from ``node.operation.parameters`` as a dict, it could include any information that users want to put to pass it to user defined mutator. The complete example code can be found in :githublink:`Mnasnet base model <examples/nas/multi-trial/mnasnet/base_mnasnet.py>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. that is required by ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Multi-trial NAS | ||
=============== | ||
|
||
In multi-trial NAS, users need model evaluator to evaluate the performance of each sampled model, and need an exploration strategy to sample models from a defined model space. Here, users could use NNI provided model evaluators or write their own model evalutor. They can simply choose a exploration strategy. Advanced users can also customize new exploration strategy. For a simple example about how to run a multi-trial NAS experiment, please refer to `Quick Start <./QuickStart.rst>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be better to introduce strategy first and then evaluator? (including the toctree)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both evaluator and strategy is necessary, let's leave the update in the next pr
docs/en_US/NAS/ModelEvaluators.rst
Outdated
Usage of Model Evaluator | ||
------------------------ | ||
|
||
In multi-trial NAS, a sampled model should be able to be executed on a remote machine or a training platform (e.g., AzureML, OpenPAI). Thus, both the model and its model evaluator should be correctly serialized. To make NNI correctly serialize model evaluator, users should apply ``serialize`` on some of their functions and objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A full stop is missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
|
||
NNI provides some commonly used model evaluators for users' convenience. If these model evaluators do not meet users' requirement, they can customize new model evaluators following the tutorial `here <./WriteTrainer.rst>`__. | ||
|
||
.. autoclass:: nni.retiarii.evaluator.pytorch.lightning.Classification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the Lightning documentation in the docstring is invalid, also in the below class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix in this commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some questions:
- the API reference in the One-shot algorithms doc(like
NAS/ENAS.rst
) is NAS 1.0 API, please update them to the Retiarii API. - There is a typo on the first line of the
NAS/Benchmarks.rst
, it should be "improve" instead of "imporve", would you mind fixing it in this pr? - Shall we remove PPO Tuner and Network Morphism Tuner in the HPO part in this pr?
- I saw this comment, but I didn't find how to do that in the documentation. Did I miss some part?
- In multi-trial NAS, we can use various training platforms supported by NNI. Do we need to tell users how to use it in this documentation?
docs/en_US/NAS/OneshotTrainer.rst
Outdated
Model Search with One-shot Trainer | ||
---------------------------------- | ||
|
||
With a defined model space, users can explore the space in two ways. One is using strategy and single-arch evaluator as demonstrated `here <./Tutorial.rst#explore-the-defined-model-space>`__. The other is using one-shot trainer, which consumes much less computational resource compared to the first one. In this tutorial we focus on this one-shot approach. The principle of one-shot approach is combining all the models in a model space into one big model (usually called super-model or super-graph). It takes charge of both search, training and testing, by training and evaluating this big model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
broken link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for finding out the broken link. updated
to fix the broken link
|
when updating |
not just in |
updated, thanks for finding out this issue! |