microsoft · QuanluZhang · Sep 16, 2019 · Aug 7, 2019 · Aug 7, 2019 · Aug 18, 2019
diff --git a/azure-pipelines.yml b/azure-pipelines.yml
@@ -15,7 +15,7 @@ jobs:
     displayName: 'Install nni toolkit via source code'
   - script: |
       python3 -m pip install flake8 --user
-      IGNORE=./tools/nni_annotation/testcase/*:F821,./examples/trials/mnist-nas/*/mnist*.py:F821
+      IGNORE=./tools/nni_annotation/testcase/*:F821,./examples/trials/mnist-nas/*/mnist*.py:F821,./examples/trials/nas_cifar10/src/cifar10/general_child.py:F821
       python3 -m flake8 . --count --per-file-ignores=$IGNORE --select=E9,F63,F72,F82 --show-source --statistics
     displayName: 'Run flake8 tests to find Python syntax errors and undefined names'
   - script: |

diff --git a/docs/en_US/Tuner/BuiltinTuner.md b/docs/en_US/Tuner/BuiltinTuner.md
@@ -20,6 +20,7 @@ Currently we support the following algorithms:
 |[__Metis Tuner__](#MetisTuner)|Metis offers the following benefits when it comes to tuning parameters: While most tools only predict the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guesswork. While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter. [Reference Paper](https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/)|
 |[__BOHB__](#BOHB)|BOHB is a follow-up work of Hyperband. It targets the weakness of Hyperband that new configurations are generated randomly without leveraging finished trials. For the name BOHB, HB means Hyperband, BO means Bayesian Optimization. BOHB leverages finished trials by building multiple TPE models, a proportion of new configurations are generated through these models. [Reference Paper](https://arxiv.org/abs/1807.01774)|
 |[__GP Tuner__](#GPTuner)|Gaussian Process Tuner is a sequential model-based optimization (SMBO) approach with Gaussian Process as the surrogate. [Reference Paper](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf), [Github Repo](https://github.com/fmfn/BayesianOptimization)|
+|[__PPO Tuner__](#PPOTuner)|PPO Tuner is an Reinforcement Learning tuner based on PPO algorithm. [Reference Paper](https://arxiv.org/abs/1707.06347)|
 
 ## Usage of Built-in Tuners
 
@@ -38,7 +39,7 @@ Note: Please follow the format when you write your `config.yml` file. Some built
 TPE, as a black-box optimization, can be used in various scenarios and shows good performance in general. Especially when you have limited computation resource and can only try a small number of trials. From a large amount of experiments, we could found that TPE is far better than Random Search. [Detailed Description](./HyperoptTuner.md)
 
 
-**Requirement of classArg**
+**Requirement of classArgs**
 
 * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
 
@@ -66,7 +67,7 @@ tuner:
 
 Random search is suggested when each trial does not take too long (e.g., each trial can be completed very soon, or early stopped by assessor quickly), and you have enough computation resource. Or you want to uniformly explore the search space. Random Search could be considered as baseline of search algorithm. [Detailed Description](./HyperoptTuner.md)
 
-**Requirement of classArg:**
+**Requirement of classArgs**
 
 * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
 
@@ -91,7 +92,7 @@ tuner:
 Anneal is suggested when each trial does not take too long, and you have enough computation resource(almost same with Random Search). Or the variables in search space could be sample from some prior distribution. [Detailed Description](./HyperoptTuner.md)
 
 
-**Requirement of classArg**
+**Requirement of classArgs**
 
 * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
 
@@ -117,7 +118,7 @@ tuner:
 
 Its requirement of computation resource is relatively high. Specifically, it requires large initial population to avoid falling into local optimum. If your trial is short or leverages assessor, this tuner is a good choice. And, it is more suggested when your trial code supports weight transfer, that is, the trial could inherit the converged weights from its parent(s). This can greatly speed up the training progress. [Detailed Description](./EvolutionTuner.md)
 
-**Requirement of classArg**
+**Requirement of classArgs**
 
 * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
 
@@ -156,7 +157,7 @@ nnictl package install --name=SMAC
 
 Similar to TPE, SMAC is also a black-box tuner which can be tried in various scenarios, and is suggested when computation resource is limited. It is optimized for discrete hyperparameters, thus, suggested when most of your hyperparameters are discrete. [Detailed Description](./SmacTuner.md)
 
-**Requirement of classArg**
+**Requirement of classArgs**
 
 * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
 
@@ -243,7 +244,7 @@ tuner:
 
 It is suggested when you have limited computation resource but have relatively large search space. It performs well in the scenario that intermediate result (e.g., accuracy) can reflect good or bad of final result (e.g., accuracy) to some extent. [Detailed Description](./HyperbandAdvisor.md)
 
-**Requirement of classArg**
+**Requirement of classArgs**
 
 * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
 * **R** (*int, optional, default = 60*) - the maximum budget given to a trial (could be the number of mini-batches or epochs) can be allocated to a trial. Each trial should use TRIAL_BUDGET to control how long it runs.
@@ -277,7 +278,7 @@ NetworkMorphism requires [PyTorch](https://pytorch.org/get-started/locally) and
 
 It is suggested that you want to apply deep learning methods to your task (your own dataset) but you have no idea of how to choose or design a network. You modify the [example](https://github.com/Microsoft/nni/tree/master/examples/trials/network_morphism/cifar10/cifar10_keras.py) to fit your own dataset and your own data augmentation method. Also you can change the batch size, learning rate or optimizer. It is feasible for different tasks to find a good network architecture. Now this tuner only supports the computer vision domain. [Detailed Description](./NetworkmorphismTuner.md)
 
-**Requirement of classArg**
+**Requirement of classArgs**
 
 * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
 * **task** (*('cv'), optional, default = 'cv'*) - The domain of experiment, for now, this tuner only supports the computer vision(cv) domain.
@@ -313,7 +314,7 @@ Note that the only acceptable types of search space are `choice`, `quniform`, `u
 
 Similar to TPE and SMAC, Metis is a black-box tuner. If your system takes a long time to finish each trial, Metis is more favorable than other approaches such as random search. Furthermore, Metis provides guidance on the subsequent trial. Here is an [example](https://github.com/Microsoft/nni/tree/master/examples/trials/auto-gbdt/search_space_metis.json) about the use of Metis. User only need to send the final result like `accuracy` to tuner, by calling the NNI SDK. [Detailed Description](./MetisTuner.md)
 
-**Requirement of classArg**
+**Requirement of classArgs**
 
 * **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
 
@@ -347,7 +348,7 @@ nnictl package install --name=BOHB
 
 Similar to Hyperband, it is suggested when you have limited computation resource but have relatively large search space. It performs well in the scenario that intermediate result (e.g., accuracy) can reflect good or bad of final result (e.g., accuracy) to some extent. In this case, it may converges to a better configuration due to Bayesian optimization usage. [Detailed Description](./BohbAdvisor.md)
 
-**Requirement of classArg**
+**Requirement of classArgs**
 
 * **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will target to maximize metrics. If 'minimize', tuner will target to minimize metrics.
 * **min_budget** (*int, optional, default = 1*) - The smallest budget assign to a trial job, (budget could be the number of mini-batches or epochs). Needs to be positive.
@@ -386,7 +387,7 @@ Note that the only acceptable types of search space are `choice`, `randint`, `un
 
 As a strategy in Sequential Model-based Global Optimization(SMBO) algorithm, GP Tuner uses a proxy optimization problem (finding the maximum of the acquisition function) that, albeit still a hard problem, is cheaper (in the computational sense) and common tools can be employed. Therefore GP Tuner is most adequate for situations where the function to be optimized is a very expensive endeavor. GP can be used when the computation resource is limited. While GP Tuner has a computational cost that grows at *O(N^3)* due to the requirement of inverting the Gram matrix, so it's not suitable when lots of trials are needed. [Detailed Description](./GPTuner.md)
 
-**Requirement of classArg**
+**Requirement of classArgs**
 
 * **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
 * **utility** (*'ei', 'ucb' or 'poi', optional, default = 'ei'*) - The kind of utility function(acquisition function). 'ei', 'ucb' and 'poi' corresponds to 'Expected Improvement', 'Upper Confidence Bound' and 'Probability of Improvement' respectively. 
@@ -415,3 +416,39 @@ tuner:
     selection_num_warm_up: 100000
     selection_num_starting_points: 250
 ```
+
+<a name="PPOTuner"></a>
+
+![](https://placehold.it/15/1589F0/000000?text=+) `PPO Tuner`
+
+> Built-in Tuner Name: **PPOTuner**
+
+Note that the only acceptable type of search space is `mutable_layer`. `optional_input_size` can only be 0, 1, or [0, 1].
+
+**Suggested scenario**
+
+PPOTuner is a Reinforcement Learning tuner based on PPO algorithm. When you are using NNI NAS interface in your trial code to do neural architecture search, PPOTuner is recommended. It has relatively high data efficiency but is suggested when you have large amount of computation resource. You could try it on very simple task, such as the [mnist-nas](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas) example. [Detailed Description](./PPOTuner.md)
+
+**Requirement of classArgs**
+
+* **optimize_mode** (*'maximize' or 'minimize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
+* **trials_per_update** (*int, optional, default = 20*) - The number of trials to be used for one update. This number is recommended to be larger than `trialConcurrency` and `trialConcurrency` be a aliquot devisor of  `trials_per_update`. Note that trials_per_update should be divisible by minibatch_size.
+* **epochs_per_update** (*int, optional, default = 4*) - The number of epochs for one update.
+* **minibatch_size** (*int, optional, default = 4*) - Mini-batch size (i.e., number of trials for a mini-batch) for the update. Note that, trials_per_update should be divisible by minibatch_size.
+* **ent_coef** (*float, optional, default = 0.0*) - Policy entropy coefficient in the optimization objective.
+* **lr** (*float, optional, default = 3e-4*) - Learning rate of the model (lstm network), constant.
+* **vf_coef** (*float, optional, default = 0.5*) - Value function loss coefficient in the optimization objective.
+* **max_grad_norm** (*float, optional, default = 0.5*) - Gradient norm clipping coefficient.
+* **gamma** (*float, optional, default = 0.99*) - Discounting factor.
+* **lam** (*float, optional, default = 0.95*) - Advantage estimation discounting factor (lambda in the paper).
+* **cliprange** (*float, optional, default = 0.2*) - Cliprange in the PPO algorithm, constant.
+
+**Usage example**
+
+```yaml
+# config.yml
+tuner:
+  builtinTunerName: PPOTuner
+  classArgs:
+    optimize_mode: maximize
+```
diff --git a/docs/en_US/Tuner/PPOTuner.md b/docs/en_US/Tuner/PPOTuner.md
@@ -0,0 +1,20 @@
+PPO Tuner on NNI
+===
+
+## PPOTuner
+
+This is a tuner generally for NNI's NAS interface, it uses [ppo algorithm](https://arxiv.org/abs/1707.06347). The implementation inherits the main logic of the implementation [here](https://github.com/openai/baselines/tree/master/baselines/ppo2) (i.e., ppo2 from OpenAI), and is adapted for NAS scenario.
+
+It could successfully tune the [mnist-nas example](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas), and has the following result:
+
+![](../../img/ppo_mnist.png)
+
+We also tune [the macro search space for image classification in the enas paper](https://github.com/microsoft/nni/tree/master/examples/trials/nas_cifar10) (with limited epoch number for each trial, i.e., 8 epochs), which is implemented using the NAS interface and tuned with PPOTuner. Use Figure 7 in the [enas paper](https://arxiv.org/pdf/1802.03268.pdf) to show how the search space looks like
+
+![](../../img/enas_search_space.png)
+
+The figure above is a chosen architecture, we use it to show how the search space looks like. Each square is a layer whose operation can be chosen from 6 operations. Each dash line is a skip connection, each square layer could choose 0 or 1 skip connection getting the output of a previous layer. __Note that__ in original macro search space each square layer could choose any number of skip connections, while in our implementation it is only allowed to choose 0 or 1.
+
+The result is shown in figure below (with the experiment config [here](https://github.com/microsoft/nni/blob/master/examples/trials/nas_cifar10/config_ppo.yml)):
+
+![](../../img/ppo_cifar10.png)
diff --git a/docs/img/enas_search_space.png b/docs/img/enas_search_space.png
diff --git a/docs/img/ppo_cifar10.png b/docs/img/ppo_cifar10.png
diff --git a/docs/img/ppo_mnist.png b/docs/img/ppo_mnist.png
diff --git a/examples/trials/mnist-nas/classic_mode/config_hpo.yml b/examples/trials/mnist-nas/classic_mode/config_hpo.yml
@@ -0,0 +1,16 @@
+authorName: default
+experimentName: example_mnist
+trialConcurrency: 1
+maxExecDuration: 1h
+maxTrialNum: 10
+#choice: local, remote, pai
+trainingServicePlatform: local
+#choice: true, false
+useAnnotation: true
+tuner:
+  builtinTunerName: TPE
+trial:
+  command: python3 mnist.py --batch_num 200
+  codeDir: .
+  gpuNum: 0
+  nasMode: classic_mode
diff --git a/examples/trials/mnist-nas/config_ppo.yml b/examples/trials/mnist-nas/config_ppo.yml
@@ -0,0 +1,19 @@
+authorName: NNI-example
+experimentName: example_mnist
+trialConcurrency: 1
+maxExecDuration: 100h
+maxTrialNum: 10000
+#choice: local, remote, pai
+trainingServicePlatform: local
+#choice: true, false
+useAnnotation: true
+tuner:
+  #choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner
+  #SMAC, PPO (SMAC and PPO should be installed through nnictl)
+  builtinTunerName: PPOTuner
+  classArgs:
+    optimize_mode: maximize
+trial:
+  command: python3 mnist.py
+  codeDir: .
+  gpuNum: 0
diff --git a/examples/trials/nas_cifar10/README.md b/examples/trials/nas_cifar10/README.md
@@ -2,7 +2,14 @@
  ===	
 
 Now we have an NAS example [NNI-NAS-Example](https://github.com/Crysple/NNI-NAS-Example) run in NNI using NAS interface from our contributors.	
+
+We have included its trial code in this folder, and provided example config files to show how to use PPO tuner to tune the trial code.
+
+> Download data
+
+- `cd data && . download.sh`
+- `tar xzf cifar-10-python.tar.gz && mv cifar-batches cifar10`
 
 Thanks our lovely contributors. 	
 
-And welcome more and more people to join us!
+And welcome more and more people to join us!
diff --git a/examples/trials/nas_cifar10/config_pai_ppo.yml b/examples/trials/nas_cifar10/config_pai_ppo.yml
@@ -0,0 +1,31 @@
+authorName: Unknown
+experimentName: enas_macro
+trialConcurrency: 20
+maxExecDuration: 2400h
+maxTrialNum: 20000
+#choice: local, remote
+trainingServicePlatform: pai
+#choice: true, false
+useAnnotation: true
+multiPhase: false
+versionCheck: false
+nniManagerIp: 0.0.0.0
+tuner:
+  builtinTunerName: PPOTuner
+  classArgs:
+    optimize_mode: maximize
+    trials_per_update: 60
+    epochs_per_update: 20
+    minibatch_size: 6
+trial:
+  command: sh ./macro_cifar10_pai.sh
+  codeDir: ./
+  gpuNum: 1
+  cpuNum: 1
+  memoryMB: 8196
+  image: msranni/nni:latest
+  virtualCluster: nni
+paiConfig:
+  userName: your_account
+  passWord: your_pwd
+  host: 0.0.0.0
diff --git a/examples/trials/nas_cifar10/config_ppo.yml b/examples/trials/nas_cifar10/config_ppo.yml
@@ -0,0 +1,21 @@
+authorName: Unknown
+experimentName: enas_macro
+trialConcurrency: 4
+maxExecDuration: 2400h
+maxTrialNum: 20000
+#choice: local, remote
+trainingServicePlatform: local
+#choice: true, false
+useAnnotation: true
+multiPhase: false
+tuner:
+  builtinTunerName: PPOTuner
+  classArgs:
+    optimize_mode: maximize
+    trials_per_update: 60
+    epochs_per_update: 12
+    minibatch_size: 10
+trial:
+  command: sh ./macro_cifar10.sh
+  codeDir: ./
+  gpuNum: 1
diff --git a/examples/trials/nas_cifar10/data/download.sh b/examples/trials/nas_cifar10/data/download.sh
@@ -0,0 +1 @@
+wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
diff --git a/examples/trials/nas_cifar10/macro_cifar10.sh b/examples/trials/nas_cifar10/macro_cifar10.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+set -e
+export PYTHONPATH="$(pwd)"
+
+python3 src/cifar10/nni_child_cifar10.py \
+  --data_format="NCHW" \
+  --search_for="macro" \
+  --reset_output_dir \
+  --data_path="data/cifar10" \
+  --output_dir="outputs" \
+  --train_data_size=45000 \
+  --batch_size=100 \
+  --num_epochs=8 \
+  --log_every=50 \
+  --eval_every_epochs=1 \
+  --child_use_aux_heads \
+  --child_num_layers=12 \
+  --child_out_filters=36 \
+  --child_l2_reg=0.0002 \
+  --child_num_branches=6 \
+  --child_num_cell_layers=5 \
+  --child_keep_prob=0.50 \
+  --child_drop_path_keep_prob=0.60 \
+  --child_lr_cosine \
+  --child_lr_max=0.05 \
+  --child_lr_min=0.001 \
+  --child_lr_T_0=10 \
+  --child_lr_T_mul=2 \
+  --controller_search_whole_channels \
+  --controller_train_every=1 \
+  --controller_num_aggregate=20 \
+  --controller_train_steps=50 \
+  --child_mode="subgraph" \
+  "$@"
+
diff --git a/examples/trials/nas_cifar10/macro_cifar10_pai.sh b/examples/trials/nas_cifar10/macro_cifar10_pai.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+set -e
+export PYTHONPATH="$(pwd)"
+
+python3 src/cifar10/nni_child_cifar10.py \
+  --data_format="NCHW" \
+  --search_for="macro" \
+  --reset_output_dir \
+  --data_path="data/cifar10" \
+  --output_dir="outputs" \
+  --train_data_size=45000 \
+  --batch_size=100 \
+  --num_epochs=30 \
+  --log_every=50 \
+  --eval_every_epochs=1 \
+  --child_use_aux_heads \
+  --child_num_layers=12 \
+  --child_out_filters=36 \
+  --child_l2_reg=0.0002 \
+  --child_num_branches=6 \
+  --child_num_cell_layers=5 \
+  --child_keep_prob=0.50 \
+  --child_drop_path_keep_prob=0.60 \
+  --child_lr_cosine \
+  --child_lr_max=0.05 \
+  --child_lr_min=0.001 \
+  --child_lr_T_0=10 \
+  --child_lr_T_mul=2 \
+  --controller_search_whole_channels \
+  --controller_train_every=1 \
+  --controller_num_aggregate=20 \
+  --controller_train_steps=50 \
+  --child_mode="subgraph" \
+  "$@"
+
diff --git a/examples/trials/nas_cifar10/src/__init__.py b/examples/trials/nas_cifar10/src/__init__.py
diff --git a/examples/trials/nas_cifar10/src/cifar10/__init__.py b/examples/trials/nas_cifar10/src/cifar10/__init__.py
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz