-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Conversation
# Conflicts: # README.md # docs/en_US/NAS/Overview.md
docs/en_US/NAS/NasGuide.md
Outdated
|
||
## Implement a Search Space | ||
|
||
Assuming now we've got a baseline model, what should we do to be empowered with NAS? Take [MNIST on PyTorch](https://github.com/pytorch/examples/blob/master/mnist/main.py) as an example, the story looks like this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
story?
docs/en_US/NAS/NasGuide.md
Outdated
return output | ||
``` | ||
|
||
The general purpose of InputChoice is a callable module that receives a list of tensors and output the concatenation/sum/mean of some of them, or `None` if none is selected. As a module, we **need to initialize `InputChoice` in `__init__`**, and use it in `forward`. We will see later that this is to allow search algorithms to identify these choices, and do necessary preparation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to tell the default op: concat or sum or mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"As a module, we need to initialize InputChoice
in __init__
, and use it in forward
.", this sentence is not clear, you can either make it clear or add reference to show where you explained this in detail
docs/en_US/NAS/NasGuide.md
Outdated
|
||
The general purpose of InputChoice is a callable module that receives a list of tensors and output the concatenation/sum/mean of some of them, or `None` if none is selected. As a module, we **need to initialize `InputChoice` in `__init__`**, and use it in `forward`. We will see later that this is to allow search algorithms to identify these choices, and do necessary preparation. | ||
|
||
Users can specify a **key** for each mutable. By default NNI will assign one for you that is globally unique, but in case users want to share choices (for example, there are two `LayerChoice` with the same candidate operations, and you want them to have the same choice, i.e., if first one chooses the i-th op, the second one also chooses the i-th op), they can give them the same key. The key marks the identity for this choice, and will be used in dumped checkpoint. So if you want to increase the readability of your exported architecture, manually assigning keys to each mutable would be a good idea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"for each mutable", seems you have not explained mutable
docs/en_US/NAS/NasGuide.md
Outdated
|
||
## Use a Search Algorithm | ||
|
||
Different in how the search space is explored and trials are spawned, there are at least two different ways users can do search. One runs NAS distributedly, which can be as naive as enumerating all the architectures and training each one from scratch, or leveraging more advanced technique. Since training many different architectures are known to be expensive, another family of methods, called one-shot NAS, builds a supernet containing every candidate in the search space as its subnetwork, and in each step a subnetwork or combination of several subnetworks is trained. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“leveraging more advanced technique” better to add reference here
docs/en_US/NAS/NasGuide.md
Outdated
trainer.export(file="model_dir/final_architecture.json") # export the final architecture to file | ||
``` | ||
|
||
Users can directly run their training file by `python(3) train.py`, without `nnictl`. After training, users could export the best one of the found models through `trainer.export()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can simply write "python3 train.py"
docs/en_US/NAS/NasGuide.md
Outdated
|
||
Users can directly run their training file by `python(3) train.py`, without `nnictl`. After training, users could export the best one of the found models through `trainer.export()`. | ||
|
||
Normally, the trainer exposes a few things that you can customize, for example, loss function, metrics function, optimizer, and datasets. These should satisfy the needs from most usages, and we do our best to make sure our built-in trainers work on as many models, tasks and datasets as possible. But there is no guarantee. For example, some trainers have assumption that the task has to be a classification task; some trainers might have a different definition of "epoch" (e.g., an ENAS epoch = some child steps + some controller steps). So after a few tryouts, if you want to actually use the trainers on your very customized applications, you might very soon need to [customize your trainer](#extend-the-ability-of-one-shot-trainers). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"exposes a few things" -> "exposes a few arguments"
@ultmaster suggest to add a quick start to show usage, basically it could include how to run a NAS example, how to use users' own dataset, how to export model, how to retrain a model, how to visualize results, etc. |
docs/en_US/NAS/NasGuide.md
Outdated
nni.report_final_result(acc) # report the performance of the chosen architecture | ||
``` | ||
|
||
The search space should be automatically generated and sent to tuner. As with NNI NAS API the search space is embedded in user code, users could use "[nnictl ss_gen](../Tutorial/Nnictl.md)" to generate search space file. Then, put the path of the generated search space in the field `searchSpacePath` of `config.yml`. The other fields in `config.yml` can be filled by referring [this tutorial](../Tutorial/QuickStart.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove "automatically"
|
||
You could use [NNI tuners](../Tuner/BuiltinTuner.md) to do the search. Currently, only PPO Tuner supports NAS search space. | ||
|
||
We support standalone mode for easy debugging, where you could directly run the trial command without launching an NNI experiment. This is for checking whether your trial code can correctly run. The first candidate(s) are chosen for `LayerChoice` and `InputChoice` in this standalone mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should highlight this part, in dedicated section or in quick start.
docs/en_US/NAS/NasGuide.md
Outdated
apply_fixed_architecture(model, "model_dir/final_architecture.json") | ||
``` | ||
|
||
After applying, the model is then fixed and ready for a final training. The model works as a single model, although it might contain more parameters than expected. For deeper reasons and possible workaround, see [Trainers](./NasReference.md#retrain). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest to give a link to an example code of retrain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also need to explain how to load checkpointed model weights
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to explain the format of exported architecture
Users can directly run their training file by `python(3) train.py`, without `nnictl`. After training, users could export the best one of the found models through `trainer.export()`. | ||
|
||
Normally, the trainer exposes a few things that you can customize, for example, loss function, metrics function, optimizer, and datasets. These should satisfy the needs from most usages, and we do our best to make sure our built-in trainers work on as many models, tasks and datasets as possible. But there is no guarantee. For example, some trainers have assumption that the task has to be a classification task; some trainers might have a different definition of "epoch" (e.g., an ENAS epoch = some child steps + some controller steps). So after a few tryouts, if you want to actually use the trainers on your very customized applications, you might very soon need to [customize your trainer](#extend-the-ability-of-one-shot-trainers). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also should describe distributed training for one-shot nas
|
||
After applying, the model is then fixed and ready for a final training. The model works as a single model, although it might contain more parameters than expected. For deeper reasons and possible workaround, see [Trainers](./NasReference.md#retrain). | ||
|
||
## Customize a Search Algorithm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest to use a dedicated doc file for "customize a search algorithm"
|
||
Users might want to do multiple things if they are using the trainers on real tasks, for example, distributed training, half-precision training, logging periodically, writing tensorboard, dumping checkpoints and so on. As mentioned previously, some trainers do have support for some of the items listed above; others might not. Generally, there are two recommended ways to add anything you want to an existing trainer: inherit an existing trainer and override, or copy an existing trainer and modify. | ||
|
||
Either way, you are walking into the scope of implementing a new trainer. Basically, implementing a one-shot trainer is no different from any traditional deep learning trainer, except that a new concept called mutator will reveal itself. So that the implementation will be different in at least two places: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should briefly explain mutator
here, so that users could get a high level idea of what it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example why we introduce mutator
docs/en_US/NAS/NasGuide.md
Outdated
|
||
Meanwhile, NNI provides some useful tools so that users can implement trainers more easily. See [Trainers](./NasReference.md#trainers) for details. | ||
|
||
### Invent New Mutators |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
write a new mutator
docs/en_US/NAS/NasGuide.md
Outdated
will immediately set one possible candidate in the search space as the active one. | ||
``` | ||
|
||
### Search with Distribution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can think about a better title
""" | ||
Computes and stores the average and current value. | ||
|
||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameters? Or attribute?
@@ -15,43 +15,42 @@ | |||
class SPOSSupernetTrainer(Trainer): | |||
""" | |||
This trainer trains a supernet that can be used for evolution search. | |||
|
|||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attribute?
""" | ||
A random mutator with flops limit. | ||
|
||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
""" | ||
SPOS evolution tuner. | ||
|
||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
x4 = torch.zeros_like(x) | ||
return self.input_choice([x1, x2, x3, x4]) | ||
|
||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
""" | ||
A mutator that mutates the graph with RL. | ||
|
||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attributes
""" | ||
DARTS trainer. | ||
|
||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attributes
""" | ||
CDARTS trainer. | ||
|
||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attributes
@@ -127,18 +127,15 @@ def export(self, logger): | |||
class DartsDiscreteMutator(Mutator): | |||
""" | |||
A mutator that applies the final sampling result of a parent mutator on another model to train. | |||
|
|||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attributes
callbacks that are called in ``forward`` in Mutables. | ||
callbacks that are called in ``forward`` in mutables. | ||
|
||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attributes
|
||
## Limitations | ||
|
||
* DARTS doesn't support DataParallel and needs to be customized in order to support DistributedDataParallel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have reference for how to customize the support for distributeddataparallel?
* To [report a bug](https://github.com/microsoft/nni/issues/new?template=bug-report.md) for this feature in GitHub; | ||
* To [file a feature or improvement request](https://github.com/microsoft/nni/issues/new?template=enhancement.md) for this feature in GitHub; | ||
* To know more about [Feature Engineering with NNI](https://github.com/microsoft/nni/blob/master/docs/en_US/FeatureEngineering/Overview.md); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep the references for FE, Model compression and HPO. That's just for users who would like to explore more features from NNI. no hurt for putting the links there.
|
||
Assuming now we've got a baseline model, what should we do to be empowered with NAS? Take [MNIST on PyTorch](https://github.com/pytorch/examples/blob/master/mnist/main.py) as an example, the code might look like this: | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after implementing codes for search space, do we need to talk about the SearchSpace json file?
No description provided.