Update guide and reference of NAS #1972

ultmaster · 2020-01-19T10:19:11Z

No description provided.

# Conflicts: # README.md # docs/en_US/NAS/Overview.md

QuanluZhang · 2020-02-01T07:31:58Z

docs/en_US/NAS/NasGuide.md

+
+## Implement a Search Space
+
+Assuming now we've got a baseline model, what should we do to be empowered with NAS? Take [MNIST on PyTorch](https://github.com/pytorch/examples/blob/master/mnist/main.py) as an example, the story looks like this:


QuanluZhang · 2020-02-01T07:37:48Z

docs/en_US/NAS/NasGuide.md

+        return output
+```
+
+The general purpose of InputChoice is a callable module that receives a list of tensors and output the concatenation/sum/mean of some of them, or `None` if none is selected. As a module, we **need to initialize `InputChoice` in `__init__`**, and use it in `forward`. We will see later that this is to allow search algorithms to identify these choices, and do necessary preparation.


better to tell the default op: concat or sum or mean?

"As a module, we need to initialize InputChoice in __init__, and use it in forward.", this sentence is not clear, you can either make it clear or add reference to show where you explained this in detail

QuanluZhang · 2020-02-01T07:42:03Z

docs/en_US/NAS/NasGuide.md

+
+The general purpose of InputChoice is a callable module that receives a list of tensors and output the concatenation/sum/mean of some of them, or `None` if none is selected. As a module, we **need to initialize `InputChoice` in `__init__`**, and use it in `forward`. We will see later that this is to allow search algorithms to identify these choices, and do necessary preparation.
+
+Users can specify a **key** for each mutable. By default NNI will assign one for you that is globally unique, but in case users want to share choices (for example, there are two `LayerChoice` with the same candidate operations, and you want them to have the same choice, i.e., if first one chooses the i-th op, the second one also chooses the i-th op), they can give them the same key. The key marks the identity for this choice, and will be used in dumped checkpoint. So if you want to increase the readability of your exported architecture, manually assigning keys to each mutable would be a good idea.


"for each mutable", seems you have not explained mutable

QuanluZhang · 2020-02-01T07:52:40Z

docs/en_US/NAS/NasGuide.md

+
+## Use a Search Algorithm
+
+Different in how the search space is explored and trials are spawned, there are at least two different ways users can do search. One runs NAS distributedly, which can be as naive as enumerating all the architectures and training each one from scratch, or leveraging more advanced technique. Since training many different architectures are known to be expensive, another family of methods, called one-shot NAS, builds a supernet containing every candidate in the search space as its subnetwork, and in each step a subnetwork or combination of several subnetworks is trained.


“leveraging more advanced technique” better to add reference here

QuanluZhang · 2020-02-01T07:56:45Z

docs/en_US/NAS/NasGuide.md

+trainer.export(file="model_dir/final_architecture.json")  # export the final architecture to file
+```
+
+Users can directly run their training file by `python(3) train.py`, without `nnictl`. After training, users could export the best one of the found models through `trainer.export()`.


you can simply write "python3 train.py"

QuanluZhang · 2020-02-01T07:57:43Z

docs/en_US/NAS/NasGuide.md

+
+Users can directly run their training file by `python(3) train.py`, without `nnictl`. After training, users could export the best one of the found models through `trainer.export()`.
+
+Normally, the trainer exposes a few things that you can customize, for example, loss function, metrics function, optimizer, and datasets. These should satisfy the needs from most usages, and we do our best to make sure our built-in trainers work on as many models, tasks and datasets as possible. But there is no guarantee. For example, some trainers have assumption that the task has to be a classification task; some trainers might have a different definition of "epoch" (e.g., an ENAS epoch = some child steps + some controller steps). So after a few tryouts, if you want to actually use the trainers on your very customized applications, you might very soon need to [customize your trainer](#extend-the-ability-of-one-shot-trainers).


"exposes a few things" -> "exposes a few arguments"

QuanluZhang · 2020-02-01T08:03:08Z

@ultmaster suggest to add a quick start to show usage, basically it could include how to run a NAS example, how to use users' own dataset, how to export model, how to retrain a model, how to visualize results, etc.

QuanluZhang · 2020-02-01T08:11:44Z

docs/en_US/NAS/NasGuide.md

+nni.report_final_result(acc)  # report the performance of the chosen architecture
+```
+
+The search space should be automatically generated and sent to tuner. As with NNI NAS API the search space is embedded in user code, users could use "[nnictl ss_gen](../Tutorial/Nnictl.md)" to generate search space file. Then, put the path of the generated search space in the field `searchSpacePath` of `config.yml`. The other fields in `config.yml` can be filled by referring [this tutorial](../Tutorial/QuickStart.md).


remove "automatically"

QuanluZhang · 2020-02-01T08:13:03Z

docs/en_US/NAS/NasGuide.md

+
+You could use [NNI tuners](../Tuner/BuiltinTuner.md) to do the search. Currently, only PPO Tuner supports NAS search space.
+
+We support standalone mode for easy debugging, where you could directly run the trial command without launching an NNI experiment. This is for checking whether your trial code can correctly run. The first candidate(s) are chosen for `LayerChoice` and `InputChoice` in this standalone mode.


I think we should highlight this part, in dedicated section or in quick start.

QuanluZhang · 2020-02-01T08:15:49Z

docs/en_US/NAS/NasGuide.md

+apply_fixed_architecture(model, "model_dir/final_architecture.json")
+```
+
+After applying, the model is then fixed and ready for a final training. The model works as a single model, although it might contain more parameters than expected. For deeper reasons and possible workaround, see [Trainers](./NasReference.md#retrain).


suggest to give a link to an example code of retrain

also need to explain how to load checkpointed model weights

better to explain the format of exported architecture

QuanluZhang · 2020-02-01T08:20:26Z

docs/en_US/NAS/NasGuide.md

+Users can directly run their training file by `python(3) train.py`, without `nnictl`. After training, users could export the best one of the found models through `trainer.export()`.
+
+Normally, the trainer exposes a few things that you can customize, for example, loss function, metrics function, optimizer, and datasets. These should satisfy the needs from most usages, and we do our best to make sure our built-in trainers work on as many models, tasks and datasets as possible. But there is no guarantee. For example, some trainers have assumption that the task has to be a classification task; some trainers might have a different definition of "epoch" (e.g., an ENAS epoch = some child steps + some controller steps). So after a few tryouts, if you want to actually use the trainers on your very customized applications, you might very soon need to [customize your trainer](#extend-the-ability-of-one-shot-trainers).
+


we also should describe distributed training for one-shot nas

QuanluZhang · 2020-02-01T08:21:42Z

docs/en_US/NAS/NasGuide.md

+
+After applying, the model is then fixed and ready for a final training. The model works as a single model, although it might contain more parameters than expected. For deeper reasons and possible workaround, see [Trainers](./NasReference.md#retrain).
+
+## Customize a Search Algorithm


suggest to use a dedicated doc file for "customize a search algorithm"

QuanluZhang · 2020-02-01T08:28:53Z

docs/en_US/NAS/NasGuide.md

+
+Users might want to do multiple things if they are using the trainers on real tasks, for example, distributed training, half-precision training, logging periodically, writing tensorboard, dumping checkpoints and so on. As mentioned previously, some trainers do have support for some of the items listed above; others might not. Generally, there are two recommended ways to add anything you want to an existing trainer: inherit an existing trainer and override, or copy an existing trainer and modify.
+
+Either way, you are walking into the scope of implementing a new trainer. Basically, implementing a one-shot trainer is no different from any traditional deep learning trainer, except that a new concept called mutator will reveal itself. So that the implementation will be different in at least two places:


you should briefly explain mutator here, so that users could get a high level idea of what it is.

for example why we introduce mutator

QuanluZhang · 2020-02-01T08:33:47Z

docs/en_US/NAS/NasGuide.md

+
+Meanwhile, NNI provides some useful tools so that users can implement trainers more easily. See [Trainers](./NasReference.md#trainers) for details.
+
+### Invent New Mutators


write a new mutator

QuanluZhang · 2020-02-01T08:34:54Z

docs/en_US/NAS/NasGuide.md

+    will immediately set one possible candidate in the search space as the active one.
+```
+
+### Search with Distribution


we can think about a better title

xuehui1991 · 2020-02-05T02:51:23Z

src/sdk/pynni/nni/nas/pytorch/utils.py

+    """
+    Computes and stores the average and current value.
+
+    Parameters


Parameters? Or attribute?

xuehui1991 · 2020-02-05T02:53:14Z

src/sdk/pynni/nni/nas/pytorch/spos/trainer.py

@@ -15,43 +15,42 @@
 class SPOSSupernetTrainer(Trainer):
    """
    This trainer trains a supernet that can be used for evolution search.
+
+    Parameters


xuehui1991 · 2020-02-05T02:53:31Z

src/sdk/pynni/nni/nas/pytorch/spos/mutator.py

+    """
+    A random mutator with flops limit.
+
+    Parameters


xuehui1991 · 2020-02-05T02:53:44Z

src/sdk/pynni/nni/nas/pytorch/spos/evolution.py

+    """
+    SPOS evolution tuner.
+
+    Parameters


xuehui1991 · 2020-02-05T02:54:49Z

src/sdk/pynni/nni/nas/pytorch/mutables.py

+                x4 = torch.zeros_like(x)
+                return self.input_choice([x1, x2, x3, x4])
+
+    Parameters


xuehui1991 · 2020-02-05T03:05:37Z

src/sdk/pynni/nni/nas/pytorch/enas/mutator.py

+    """
+    A mutator that mutates the graph with RL.
+
+    Parameters


xuehui1991 · 2020-02-05T03:05:44Z

src/sdk/pynni/nni/nas/pytorch/darts/trainer.py

+    """
+    DARTS trainer.
+
+    Parameters


xuehui1991 · 2020-02-05T03:06:07Z

src/sdk/pynni/nni/nas/pytorch/cdarts/trainer.py

+    """
+    CDARTS trainer.
+
+    Parameters


xuehui1991 · 2020-02-05T03:06:15Z

src/sdk/pynni/nni/nas/pytorch/cdarts/mutator.py

@@ -127,18 +127,15 @@ def export(self, logger):
 class DartsDiscreteMutator(Mutator):
    """
    A mutator that applies the final sampling result of a parent mutator on another model to train.
+
+    Parameters


xuehui1991 · 2020-02-05T03:06:35Z

src/sdk/pynni/nni/nas/pytorch/base_mutator.py

-    callbacks that are called in ``forward`` in Mutables.
+    callbacks that are called in ``forward`` in mutables.
+
+    Parameters


scarlett2018 · 2020-02-07T04:12:16Z

docs/en_US/NAS/DARTS.md

+
+## Limitations
+
+* DARTS doesn't support DataParallel and needs to be customized in order to support DistributedDataParallel.


do we have reference for how to customize the support for distributeddataparallel?

scarlett2018 · 2020-02-07T05:50:41Z

docs/en_US/NAS/Overview.md

 * To [report a bug](https://github.com/microsoft/nni/issues/new?template=bug-report.md) for this feature in GitHub;
-* To [file a feature or improvement request](https://github.com/microsoft/nni/issues/new?template=enhancement.md) for this feature in GitHub;
-* To know more about [Feature Engineering with NNI](https://github.com/microsoft/nni/blob/master/docs/en_US/FeatureEngineering/Overview.md);


let's keep the references for FE, Model compression and HPO. That's just for users who would like to explore more features from NNI. no hurt for putting the links there.

scarlett2018 · 2020-02-07T05:54:08Z

docs/en_US/NAS/NasGuide.md

+
+Assuming now we've got a baseline model, what should we do to be empowered with NAS? Take [MNIST on PyTorch](https://github.com/pytorch/examples/blob/master/mnist/main.py) as an example, the code might look like this:
+
+```python


after implementing codes for search space, do we need to talk about the SearchSpace json file?

penghouwen and others added 30 commits January 14, 2020 14:35

integrate c-darts algorithm

ab8fc7b

revise cdarts2nni after code review

83a7aa4

update readme

4543d3f

new revisions

28b579c

Merge remote-tracking branch 'upstream/master' into cdarts2nni

5f66784

# Conflicts: # README.md # docs/en_US/NAS/Overview.md

fix code style

022faa1

fix code style

25778f5

disable wrong import order

f959050

fix syntax issues

2e96bdf

update nas docs

386a4a9

remove trailing whitespace in trainer

457be73

mock apex import

2320cdd

elaborate documentation

2dcab31

update

36882f0

update docs

19eb69e

fix crossref

46c229d

fix typo

bbb8873

update mutator and trainer

3861109

remove trailing whitespace

cf0471f

fix dosctring format

7c87a7f

add license

1a167ba

resolve comments in ops.py

5d39705

update

6bc3ef7

update

67f3f77

update

6d89107

update

18de19e

Merge branch 'master' into doc-nas-2

ba75d68

update

8fa0a82

update

757c2c3

update

b511c06

update

45125a6

ultmaster changed the base branch from master to dev-refactor-doc January 19, 2020 12:12

remove trailing space

950bdb8

scarlett2018 requested review from QuanluZhang and scarlett2018 January 20, 2020 02:36

QuanluZhang reviewed Feb 1, 2020

View reviewed changes

update

3e6ffeb

QuanluZhang requested a review from xuehui1991 February 5, 2020 02:49

xuehui1991 reviewed Feb 5, 2020

View reviewed changes

scarlett2018 reviewed Feb 7, 2020

View reviewed changes

QuanluZhang merged commit d2c610a into microsoft:dev-refactor-doc Feb 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update guide and reference of NAS #1972

Update guide and reference of NAS #1972

ultmaster commented Jan 19, 2020

QuanluZhang Feb 1, 2020

QuanluZhang Feb 1, 2020

QuanluZhang Feb 1, 2020

QuanluZhang Feb 1, 2020

QuanluZhang Feb 1, 2020

QuanluZhang Feb 1, 2020

QuanluZhang Feb 1, 2020

QuanluZhang commented Feb 1, 2020

QuanluZhang Feb 1, 2020

QuanluZhang Feb 1, 2020

QuanluZhang Feb 1, 2020

QuanluZhang Feb 1, 2020

QuanluZhang Feb 1, 2020

QuanluZhang Feb 1, 2020

QuanluZhang Feb 1, 2020

QuanluZhang Feb 1, 2020

QuanluZhang Feb 1, 2020

QuanluZhang Feb 1, 2020

QuanluZhang Feb 1, 2020

xuehui1991 Feb 5, 2020

xuehui1991 Feb 5, 2020

xuehui1991 Feb 5, 2020

xuehui1991 Feb 5, 2020

xuehui1991 Feb 5, 2020

xuehui1991 Feb 5, 2020

xuehui1991 Feb 5, 2020

xuehui1991 Feb 5, 2020

xuehui1991 Feb 5, 2020

xuehui1991 Feb 5, 2020

scarlett2018 Feb 7, 2020

scarlett2018 Feb 7, 2020

scarlett2018 Feb 7, 2020


		## Implement a Search Space

		Assuming now we've got a baseline model, what should we do to be empowered with NAS? Take [MNIST on PyTorch](https://github.com/pytorch/examples/blob/master/mnist/main.py) as an example, the story looks like this:


		The general purpose of InputChoice is a callable module that receives a list of tensors and output the concatenation/sum/mean of some of them, or `None` if none is selected. As a module, we need to initialize `InputChoice` in `__init__`, and use it in `forward`. We will see later that this is to allow search algorithms to identify these choices, and do necessary preparation.

		Users can specify a key for each mutable. By default NNI will assign one for you that is globally unique, but in case users want to share choices (for example, there are two `LayerChoice` with the same candidate operations, and you want them to have the same choice, i.e., if first one chooses the i-th op, the second one also chooses the i-th op), they can give them the same key. The key marks the identity for this choice, and will be used in dumped checkpoint. So if you want to increase the readability of your exported architecture, manually assigning keys to each mutable would be a good idea.


		## Use a Search Algorithm

		Different in how the search space is explored and trials are spawned, there are at least two different ways users can do search. One runs NAS distributedly, which can be as naive as enumerating all the architectures and training each one from scratch, or leveraging more advanced technique. Since training many different architectures are known to be expensive, another family of methods, called one-shot NAS, builds a supernet containing every candidate in the search space as its subnetwork, and in each step a subnetwork or combination of several subnetworks is trained.


		Users can directly run their training file by `python(3) train.py`, without `nnictl`. After training, users could export the best one of the found models through `trainer.export()`.

		Normally, the trainer exposes a few things that you can customize, for example, loss function, metrics function, optimizer, and datasets. These should satisfy the needs from most usages, and we do our best to make sure our built-in trainers work on as many models, tasks and datasets as possible. But there is no guarantee. For example, some trainers have assumption that the task has to be a classification task; some trainers might have a different definition of "epoch" (e.g., an ENAS epoch = some child steps + some controller steps). So after a few tryouts, if you want to actually use the trainers on your very customized applications, you might very soon need to [customize your trainer](#extend-the-ability-of-one-shot-trainers).


		You could use [NNI tuners](../Tuner/BuiltinTuner.md) to do the search. Currently, only PPO Tuner supports NAS search space.

		We support standalone mode for easy debugging, where you could directly run the trial command without launching an NNI experiment. This is for checking whether your trial code can correctly run. The first candidate(s) are chosen for `LayerChoice` and `InputChoice` in this standalone mode.


		After applying, the model is then fixed and ready for a final training. The model works as a single model, although it might contain more parameters than expected. For deeper reasons and possible workaround, see [Trainers](./NasReference.md#retrain).

		## Customize a Search Algorithm


		Users might want to do multiple things if they are using the trainers on real tasks, for example, distributed training, half-precision training, logging periodically, writing tensorboard, dumping checkpoints and so on. As mentioned previously, some trainers do have support for some of the items listed above; others might not. Generally, there are two recommended ways to add anything you want to an existing trainer: inherit an existing trainer and override, or copy an existing trainer and modify.

		Either way, you are walking into the scope of implementing a new trainer. Basically, implementing a one-shot trainer is no different from any traditional deep learning trainer, except that a new concept called mutator will reveal itself. So that the implementation will be different in at least two places:


		Meanwhile, NNI provides some useful tools so that users can implement trainers more easily. See [Trainers](./NasReference.md#trainers) for details.

		### Invent New Mutators


		## Limitations

		* DARTS doesn't support DataParallel and needs to be customized in order to support DistributedDataParallel.


		Assuming now we've got a baseline model, what should we do to be empowered with NAS? Take [MNIST on PyTorch](https://github.com/pytorch/examples/blob/master/mnist/main.py) as an example, the code might look like this:

		```python

Update guide and reference of NAS #1972

Update guide and reference of NAS #1972

Conversation

ultmaster commented Jan 19, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QuanluZhang commented Feb 1, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment