Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Merge dev-nas-tuner back to master #1531

Merged
merged 12 commits into from
Sep 16, 2019
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
displayName: 'Install nni toolkit via source code'
- script: |
python3 -m pip install flake8 --user
IGNORE=./tools/nni_annotation/testcase/*:F821,./examples/trials/mnist-nas/*/mnist*.py:F821
IGNORE=./tools/nni_annotation/testcase/*:F821,./examples/trials/mnist-nas/*/mnist*.py:F821,./examples/trials/nas_cifar10/src/cifar10/general_child.py:F821
python3 -m flake8 . --count --per-file-ignores=$IGNORE --select=E9,F63,F72,F82 --show-source --statistics
displayName: 'Run flake8 tests to find Python syntax errors and undefined names'
- script: |
Expand Down
57 changes: 47 additions & 10 deletions docs/en_US/Tuner/BuiltinTuner.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Currently we support the following algorithms:
|[__Metis Tuner__](#MetisTuner)|Metis offers the following benefits when it comes to tuning parameters: While most tools only predict the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guesswork. While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter. [Reference Paper](https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/)|
|[__BOHB__](#BOHB)|BOHB is a follow-up work of Hyperband. It targets the weakness of Hyperband that new configurations are generated randomly without leveraging finished trials. For the name BOHB, HB means Hyperband, BO means Bayesian Optimization. BOHB leverages finished trials by building multiple TPE models, a proportion of new configurations are generated through these models. [Reference Paper](https://arxiv.org/abs/1807.01774)|
|[__GP Tuner__](#GPTuner)|Gaussian Process Tuner is a sequential model-based optimization (SMBO) approach with Gaussian Process as the surrogate. [Reference Paper](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf), [Github Repo](https://github.com/fmfn/BayesianOptimization)|
|[__PPO Tuner__](#PPOTuner)|PPO Tuner is an Reinforcement Learning tuner based on PPO algorithm. [Reference Paper](https://arxiv.org/abs/1707.06347)|

## Usage of Built-in Tuners

Expand All @@ -38,7 +39,7 @@ Note: Please follow the format when you write your `config.yml` file. Some built
TPE, as a black-box optimization, can be used in various scenarios and shows good performance in general. Especially when you have limited computation resource and can only try a small number of trials. From a large amount of experiments, we could found that TPE is far better than Random Search. [Detailed Description](./HyperoptTuner.md)


**Requirement of classArg**
**Requirement of classArgs**

* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.

Expand Down Expand Up @@ -66,7 +67,7 @@ tuner:

Random search is suggested when each trial does not take too long (e.g., each trial can be completed very soon, or early stopped by assessor quickly), and you have enough computation resource. Or you want to uniformly explore the search space. Random Search could be considered as baseline of search algorithm. [Detailed Description](./HyperoptTuner.md)

**Requirement of classArg:**
**Requirement of classArgs**

* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.

Expand All @@ -91,7 +92,7 @@ tuner:
Anneal is suggested when each trial does not take too long, and you have enough computation resource(almost same with Random Search). Or the variables in search space could be sample from some prior distribution. [Detailed Description](./HyperoptTuner.md)


**Requirement of classArg**
**Requirement of classArgs**

* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.

Expand All @@ -117,7 +118,7 @@ tuner:

Its requirement of computation resource is relatively high. Specifically, it requires large initial population to avoid falling into local optimum. If your trial is short or leverages assessor, this tuner is a good choice. And, it is more suggested when your trial code supports weight transfer, that is, the trial could inherit the converged weights from its parent(s). This can greatly speed up the training progress. [Detailed Description](./EvolutionTuner.md)

**Requirement of classArg**
**Requirement of classArgs**

* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.

Expand Down Expand Up @@ -156,7 +157,7 @@ nnictl package install --name=SMAC

Similar to TPE, SMAC is also a black-box tuner which can be tried in various scenarios, and is suggested when computation resource is limited. It is optimized for discrete hyperparameters, thus, suggested when most of your hyperparameters are discrete. [Detailed Description](./SmacTuner.md)

**Requirement of classArg**
**Requirement of classArgs**

* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.

Expand Down Expand Up @@ -243,7 +244,7 @@ tuner:

It is suggested when you have limited computation resource but have relatively large search space. It performs well in the scenario that intermediate result (e.g., accuracy) can reflect good or bad of final result (e.g., accuracy) to some extent. [Detailed Description](./HyperbandAdvisor.md)

**Requirement of classArg**
**Requirement of classArgs**

* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
* **R** (*int, optional, default = 60*) - the maximum budget given to a trial (could be the number of mini-batches or epochs) can be allocated to a trial. Each trial should use TRIAL_BUDGET to control how long it runs.
Expand Down Expand Up @@ -277,7 +278,7 @@ NetworkMorphism requires [PyTorch](https://pytorch.org/get-started/locally) and

It is suggested that you want to apply deep learning methods to your task (your own dataset) but you have no idea of how to choose or design a network. You modify the [example](https://github.com/Microsoft/nni/tree/master/examples/trials/network_morphism/cifar10/cifar10_keras.py) to fit your own dataset and your own data augmentation method. Also you can change the batch size, learning rate or optimizer. It is feasible for different tasks to find a good network architecture. Now this tuner only supports the computer vision domain. [Detailed Description](./NetworkmorphismTuner.md)

**Requirement of classArg**
**Requirement of classArgs**

* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
* **task** (*('cv'), optional, default = 'cv'*) - The domain of experiment, for now, this tuner only supports the computer vision(cv) domain.
Expand Down Expand Up @@ -313,7 +314,7 @@ Note that the only acceptable types of search space are `choice`, `quniform`, `u

Similar to TPE and SMAC, Metis is a black-box tuner. If your system takes a long time to finish each trial, Metis is more favorable than other approaches such as random search. Furthermore, Metis provides guidance on the subsequent trial. Here is an [example](https://github.com/Microsoft/nni/tree/master/examples/trials/auto-gbdt/search_space_metis.json) about the use of Metis. User only need to send the final result like `accuracy` to tuner, by calling the NNI SDK. [Detailed Description](./MetisTuner.md)

**Requirement of classArg**
**Requirement of classArgs**

* **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.

Expand Down Expand Up @@ -347,7 +348,7 @@ nnictl package install --name=BOHB

Similar to Hyperband, it is suggested when you have limited computation resource but have relatively large search space. It performs well in the scenario that intermediate result (e.g., accuracy) can reflect good or bad of final result (e.g., accuracy) to some extent. In this case, it may converges to a better configuration due to Bayesian optimization usage. [Detailed Description](./BohbAdvisor.md)

**Requirement of classArg**
**Requirement of classArgs**

* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will target to maximize metrics. If 'minimize', tuner will target to minimize metrics.
* **min_budget** (*int, optional, default = 1*) - The smallest budget assign to a trial job, (budget could be the number of mini-batches or epochs). Needs to be positive.
Expand Down Expand Up @@ -386,7 +387,7 @@ Note that the only acceptable types of search space are `choice`, `randint`, `un

As a strategy in Sequential Model-based Global Optimization(SMBO) algorithm, GP Tuner uses a proxy optimization problem (finding the maximum of the acquisition function) that, albeit still a hard problem, is cheaper (in the computational sense) and common tools can be employed. Therefore GP Tuner is most adequate for situations where the function to be optimized is a very expensive endeavor. GP can be used when the computation resource is limited. While GP Tuner has a computational cost that grows at *O(N^3)* due to the requirement of inverting the Gram matrix, so it's not suitable when lots of trials are needed. [Detailed Description](./GPTuner.md)

**Requirement of classArg**
**Requirement of classArgs**

* **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
* **utility** (*'ei', 'ucb' or 'poi', optional, default = 'ei'*) - The kind of utility function(acquisition function). 'ei', 'ucb' and 'poi' corresponds to 'Expected Improvement', 'Upper Confidence Bound' and 'Probability of Improvement' respectively.
Expand Down Expand Up @@ -415,3 +416,39 @@ tuner:
selection_num_warm_up: 100000
selection_num_starting_points: 250
```

<a name="PPOTuner"></a>

![](https://placehold.it/15/1589F0/000000?text=+) `PPO Tuner`

> Built-in Tuner Name: **PPOTuner**

Note that the only acceptable type of search space is `mutable_layer`. `optional_input_size` can only be 0, 1, or [0, 1].

**Suggested scenario**

PPOTuner is a Reinforcement Learning tuner based on PPO algorithm. When you are using NNI NAS interface in your trial code to do neural architecture search, PPOTuner is recommended. It has relatively high data efficiency but is suggested when you have large amount of computation resource. You could try it on very simple task, such as the [mnist-nas](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas) example. [Detailed Description](./PPOTuner.md)

**Requirement of classArgs**

* **optimize_mode** (*'maximize' or 'minimize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
* **trials_per_update** (*int, optional, default = 20*) - The number of trials to be used for one update. This number is recommended to be larger than `trialConcurrency` and `trialConcurrency` be a aliquot devisor of `trials_per_update`. Note that trials_per_update should be divisible by minibatch_size.
* **epochs_per_update** (*int, optional, default = 4*) - The number of epochs for one update.
* **minibatch_size** (*int, optional, default = 4*) - Mini-batch size (i.e., number of trials for a mini-batch) for the update. Note that, trials_per_update should be divisible by minibatch_size.
* **ent_coef** (*float, optional, default = 0.0*) - Policy entropy coefficient in the optimization objective.
* **lr** (*float, optional, default = 3e-4*) - Learning rate of the model (lstm network), constant.
* **vf_coef** (*float, optional, default = 0.5*) - Value function loss coefficient in the optimization objective.
* **max_grad_norm** (*float, optional, default = 0.5*) - Gradient norm clipping coefficient.
* **gamma** (*float, optional, default = 0.99*) - Discounting factor.
* **lam** (*float, optional, default = 0.95*) - Advantage estimation discounting factor (lambda in the paper).
* **cliprange** (*float, optional, default = 0.2*) - Cliprange in the PPO algorithm, constant.

**Usage example**

```yaml
# config.yml
tuner:
builtinTunerName: PPOTuner
classArgs:
optimize_mode: maximize
```
20 changes: 20 additions & 0 deletions docs/en_US/Tuner/PPOTuner.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
PPO Tuner on NNI
===

## PPOTuner

This is a tuner generally for NNI's NAS interface, it uses [ppo algorithm](https://arxiv.org/abs/1707.06347). The implementation inherits the main logic of the implementation [here](https://github.com/openai/baselines/tree/master/baselines/ppo2) (i.e., ppo2 from OpenAI), and is adapted for NAS scenario.

It could successfully tune the [mnist-nas example](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas), and has the following result:

![](../../img/ppo_mnist.png)

We also tune [the macro search space for image classification in the enas paper](https://github.com/microsoft/nni/tree/master/examples/trials/nas_cifar10) (with limited epoch number for each trial, i.e., 8 epochs), which is implemented using the NAS interface and tuned with PPOTuner. Use Figure 7 in the [enas paper](https://arxiv.org/pdf/1802.03268.pdf) to show how the search space looks like

![](../../img/enas_search_space.png)

The figure above is a chosen architecture, we use it to show how the search space looks like. Each square is a layer whose operation can be chosen from 6 operations. Each dash line is a skip connection, each square layer could choose 0 or 1 skip connection getting the output of a previous layer. __Note that__ in original macro search space each square layer could choose any number of skip connections, while in our implementation it is only allowed to choose 0 or 1.

The result is shown in figure below (with the experiment config [here](https://github.com/microsoft/nni/blob/master/examples/trials/nas_cifar10/config_ppo.yml)):

![](../../img/ppo_cifar10.png)
Binary file added docs/img/enas_search_space.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/ppo_cifar10.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/ppo_mnist.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
16 changes: 16 additions & 0 deletions examples/trials/mnist-nas/classic_mode/config_hpo.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
authorName: default
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: local
#choice: true, false
useAnnotation: true
tuner:
builtinTunerName: TPE
trial:
command: python3 mnist.py --batch_num 200
codeDir: .
gpuNum: 0
nasMode: classic_mode
19 changes: 19 additions & 0 deletions examples/trials/mnist-nas/config_ppo.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
authorName: NNI-example
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 100h
maxTrialNum: 10000
#choice: local, remote, pai
trainingServicePlatform: local
#choice: true, false
useAnnotation: true
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner
QuanluZhang marked this conversation as resolved.
Show resolved Hide resolved
#SMAC, PPO (SMAC and PPO should be installed through nnictl)
builtinTunerName: PPOTuner
classArgs:
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 0
9 changes: 8 additions & 1 deletion examples/trials/nas_cifar10/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,14 @@
===

Now we have an NAS example [NNI-NAS-Example](https://github.com/Crysple/NNI-NAS-Example) run in NNI using NAS interface from our contributors.

We have included its trial code in this folder, and provided example config files to show how to use PPO tuner to tune the trial code.

> Download data

- `cd data && . download.sh`
- `tar xzf cifar-10-python.tar.gz && mv cifar-batches cifar10`

Thanks our lovely contributors.

And welcome more and more people to join us!
And welcome more and more people to join us!
31 changes: 31 additions & 0 deletions examples/trials/nas_cifar10/config_pai_ppo.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
authorName: Unknown
experimentName: enas_macro
trialConcurrency: 20
maxExecDuration: 2400h
maxTrialNum: 20000
#choice: local, remote
trainingServicePlatform: pai
#choice: true, false
useAnnotation: true
multiPhase: false
versionCheck: false
nniManagerIp: 0.0.0.0
tuner:
builtinTunerName: PPOTuner
classArgs:
optimize_mode: maximize
trials_per_update: 60
epochs_per_update: 20
minibatch_size: 6
trial:
command: sh ./macro_cifar10_pai.sh
codeDir: ./
gpuNum: 1
cpuNum: 1
memoryMB: 8196
image: msranni/nni:latest
virtualCluster: nni
paiConfig:
userName: your_account
passWord: your_pwd
host: 0.0.0.0
21 changes: 21 additions & 0 deletions examples/trials/nas_cifar10/config_ppo.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
authorName: Unknown
experimentName: enas_macro
trialConcurrency: 4
maxExecDuration: 2400h
maxTrialNum: 20000
#choice: local, remote
trainingServicePlatform: local
#choice: true, false
useAnnotation: true
multiPhase: false
tuner:
builtinTunerName: PPOTuner
classArgs:
optimize_mode: maximize
trials_per_update: 60
epochs_per_update: 12
minibatch_size: 10
trial:
command: sh ./macro_cifar10.sh
codeDir: ./
gpuNum: 1
1 change: 1 addition & 0 deletions examples/trials/nas_cifar10/data/download.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
35 changes: 35 additions & 0 deletions examples/trials/nas_cifar10/macro_cifar10.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/bin/bash
set -e
export PYTHONPATH="$(pwd)"

python3 src/cifar10/nni_child_cifar10.py \
--data_format="NCHW" \
--search_for="macro" \
--reset_output_dir \
--data_path="data/cifar10" \
--output_dir="outputs" \
--train_data_size=45000 \
--batch_size=100 \
--num_epochs=8 \
--log_every=50 \
--eval_every_epochs=1 \
--child_use_aux_heads \
--child_num_layers=12 \
--child_out_filters=36 \
--child_l2_reg=0.0002 \
--child_num_branches=6 \
--child_num_cell_layers=5 \
--child_keep_prob=0.50 \
--child_drop_path_keep_prob=0.60 \
--child_lr_cosine \
--child_lr_max=0.05 \
--child_lr_min=0.001 \
--child_lr_T_0=10 \
--child_lr_T_mul=2 \
--controller_search_whole_channels \
--controller_train_every=1 \
--controller_num_aggregate=20 \
--controller_train_steps=50 \
--child_mode="subgraph" \
"$@"

35 changes: 35 additions & 0 deletions examples/trials/nas_cifar10/macro_cifar10_pai.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/bin/bash
set -e
export PYTHONPATH="$(pwd)"

python3 src/cifar10/nni_child_cifar10.py \
--data_format="NCHW" \
--search_for="macro" \
--reset_output_dir \
--data_path="data/cifar10" \
--output_dir="outputs" \
--train_data_size=45000 \
--batch_size=100 \
--num_epochs=30 \
--log_every=50 \
--eval_every_epochs=1 \
--child_use_aux_heads \
--child_num_layers=12 \
--child_out_filters=36 \
--child_l2_reg=0.0002 \
--child_num_branches=6 \
--child_num_cell_layers=5 \
--child_keep_prob=0.50 \
--child_drop_path_keep_prob=0.60 \
--child_lr_cosine \
--child_lr_max=0.05 \
--child_lr_min=0.001 \
--child_lr_T_0=10 \
--child_lr_T_mul=2 \
--controller_search_whole_channels \
--controller_train_every=1 \
--controller_num_aggregate=20 \
--controller_train_steps=50 \
--child_mode="subgraph" \
"$@"

Empty file.
Empty file.
Loading