From b34845fe68bcb9c35e7d8ead6093e6a2d60b1d72 Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Wed, 11 Mar 2020 19:12:26 -0700
Subject: [PATCH 01/34] Fix broken english in Overview
Fix a lot of akward or misleading phrasing.
Fix a few spelling errors.
Fixed past tence vs presen tense (can vs. could, supports vs supported)
---
docs/en_US/Overview.md | 46 +++++++++++++++++++++---------------------
1 file changed, 23 insertions(+), 23 deletions(-)
diff --git a/docs/en_US/Overview.md b/docs/en_US/Overview.md
index eb54a61f4f..2b47934d55 100644
--- a/docs/en_US/Overview.md
+++ b/docs/en_US/Overview.md
@@ -1,11 +1,11 @@
# Overview
-NNI (Neural Network Intelligence) is a toolkit to help users design and tune machine learning models (e.g., hyperparameters), neural network architectures, or complex system's parameters, in an efficient and automatic way. NNI has several appealing properties: easy-to-use, scalability, flexibility, and efficiency.
+NNI (Neural Network Intelligence) is a toolkit to help users design and tune machine learning models (e.g., hyperparameters), neural network architectures, or complex system's parameters, in an efficient and automatic way. NNI has several appealing properties: ease-of-use, scalability, flexibility, and efficiency.
-* **Easy-to-use**: NNI can be easily installed through python pip. Only several lines need to be added to your code in order to use NNI's power. You can use both commandline tool and WebUI to work with your experiments.
-* **Scalability**: Tuning hyperparameters or neural architecture often demands large amount of computation resource, while NNI is designed to fully leverage different computation resources, such as remote machines, training platforms (e.g., OpenPAI, Kubernetes). Hundreds of trials could run in parallel by depending on the capacity of your configured training platforms.
-* **Flexibility**: Besides rich built-in algorithms, NNI allows users to customize various hyperparameter tuning algorithms, neural architecture search algorithms, early stopping algorithms, etc. Users could also extend NNI with more training platforms, such as virtual machines, kubernetes service on the cloud. Moreover, NNI can connect to external environments to tune special applications/models on them.
-* **Efficiency**: We are intensively working on more efficient model tuning from both system level and algorithm level. For example, leveraging early feedback to speedup tuning procedure.
+* **Ease-of-use**: NNI can be easily installed through python pip. Only several lines need to be added to your code in order to use NNI's power. You can use both the commandline tool and WebUI to work with your experiments.
+* **Scalability**: Tuning hyperparameters or the neural architecture often demands a large number of computational resources, while NNI is designed to fully leverage different computation resources, such as remote machines, training platforms (e.g., OpenPAI, Kubernetes). Hundreds of trials could run in parallel by depending on the capacity of your configured training platforms.
+* **Flexibility**: Besides rich built-in algorithms, NNI allows users to customize various hyperparameter tuning algorithms, neural architecture search algorithms, early stopping algorithms, etc. Users can also extend NNI with more training platforms, such as virtual machines, kubernetes service on the cloud. Moreover, NNI can connect to external environments to tune special applications/models on them.
+* **Efficiency**: We are intensively working on more efficient model tuning on both the system and algorithm level. For example, we leverage early feedback to speedup the tuning procedure.
The figure below shows high-level architecture of NNI.
@@ -15,23 +15,23 @@ The figure below shows high-level architecture of NNI.
## Key Concepts
-* *Experiment*: An experiment is one task of, for example, finding out the best hyperparameters of a model, finding out the best neural network architecture. It consists of trials and AutoML algorithms.
+* *Experiment*: One task of, for example, finding out the best hyperparameters of a model, finding out the best neural network architecture, etc. It consists of trials and AutoML algorithms.
-* *Search Space*: It means the feasible region for tuning the model. For example, the value range of each hyperparameters.
+* *Search Space*: The feasible region for tuning the model. For example, the value range of each hyperparameter.
-* *Configuration*: A configuration is an instance from the search space, that is, each hyperparameter has a specific value.
+* *Configuration*: An instance from the search space, that is, each hyperparameter has a specific value.
-* *Trial*: Trial is an individual attempt at applying a new configuration (e.g., a set of hyperparameter values, a specific nerual architecture). Trial code should be able to run with the provided configuration.
+* *Trial*: An individual attempt at applying a new configuration (e.g., a set of hyperparameter values, a specific neural architecture, etc.). Trial code should be able to run with the provided configuration.
-* *Tuner*: Tuner is an AutoML algorithm, which generates a new configuration for the next try. A new trial will run with this configuration.
+* *Tuner*: An AutoML algorithm, which generates a new configuration for the next try. A new trial will run with this configuration.
-* *Assessor*: Assessor analyzes trial's intermediate results (e.g., periodically evaluated accuracy on test dataset) to tell whether this trial can be early stopped or not.
+* *Assessor*: Analyze a trial's intermediate results (e.g., periodically evaluated accuracy on test dataset) to tell whether this trial can be early stopped or not.
-* *Training Platform*: It means where trials are executed. Depending on your experiment's configuration, it could be your local machine, or remote servers, or large-scale training platform (e.g., OpenPAI, Kubernetes).
+* *Training Platform*: Where trials are executed. Depending on your experiment's configuration, it could be your local machine, or remote servers, or large-scale training platform (e.g., OpenPAI, Kubernetes).
-Basically, an experiment runs as follows: Tuner receives search space and generates configurations. These configurations will be submitted to training platforms, such as local machine, remote machines, or training clusters. Their performances are reported back to Tuner. Then, new configurations are generated and submitted.
+Basically, an experiment runs as follows: Tuner receives search space and generates configurations. These configurations will be submitted to training platforms, such as the local machine, remote machines, or training clusters. Their performances are reported back to Tuner. Then, new configurations are generated and submitted.
-For each experiment, user only needs to define a search space and update a few lines of code, and then leverage NNI built-in Tuner/Assessor and training platforms to search the best hyperparameters and/or neural architecture. There are basically 3 steps:
+For each experiment, the user only needs to define a search space and update a few lines of code, and then leverage NNI built-in Tuner/Assessor and training platforms to search the best hyperparameters and/or neural architecture. There are basically 3 steps:
>Step 1: [Define search space](Tutorial/SearchSpaceSpec.md)
@@ -44,31 +44,31 @@ For each experiment, user only needs to define a search space and update a few l
-More details about how to run an experiment, please refer to [Get Started](Tutorial/QuickStart.md).
+For more details about how to run an experiment, please refer to [Get Started](Tutorial/QuickStart.md).
## Core Features
-NNI provides a key capacity to run multiple instances in parallel to find best combinations of parameters. This feature can be used in various domains, like find best hyperparameters for a deep learning model, or find best configuration for database and other complex system with real data.
+NNI provides a key capacity to run multiple instances in parallel to find the best combinations of parameters. This feature can be used in various domains, like finding the best hyperparameters for a deep learning model or finding the best configuration for database and other complex systems with real data.
-NNI is also like to provide algorithm toolkits for machine learning and deep learning, especially neural architecture search (NAS) algorithms, model compression algorithms, and feature engineering algorithms.
+NNI also provides algorithm toolkits for machine learning and deep learning, especially neural architecture search (NAS) algorithms, model compression algorithms, and feature engineering algorithms.
### Hyperparameter Tuning
-This is a core and basic feature of NNI, we provide many popular [automatic tuning algorithms](Tuner/BuiltinTuner.md) (i.e., tuner) and [early stop algorithms](Assessor/BuiltinAssessor.md) (i.e., assessor). You could follow [Quick Start](Tutorial/QuickStart.md) to tune your model (or system). Basically, there are the above three steps and then start an NNI experiment.
+This is a core and basic feature of NNI, we provide many popular [automatic tuning algorithms](Tuner/BuiltinTuner.md) (i.e., tuner) and [early stop algorithms](Assessor/BuiltinAssessor.md) (i.e., assessor). You can follow [Quick Start](Tutorial/QuickStart.md) to tune your model (or system). Basically, there are the above three steps and then starting an NNI experiment.
### General NAS Framework
-This NAS framework is for users to easily specify candidate neural architectures, for example, could specify multiple candidate operations (e.g., separable conv, dilated conv) for a single layer, and specify possible skip connections. NNI will find the best candidate automatically. On the other hand, the NAS framework provides simple interface for another type of users (e.g., NAS algorithm researchers) to implement new NAS algorithms. Detailed description and usage can be found [here](NAS/Overview.md).
+This NAS framework is for users to easily specify candidate neural architectures, for example, one can specify multiple candidate operations (e.g., separable conv, dilated conv) for a single layer, and specify possible skip connections. NNI will find the best candidate automatically. On the other hand, the NAS framework provides a simple interface for another type of user (e.g., NAS algorithm researchers) to implement new NAS algorithms. A detailed description of NAS and its usage can be found [here](NAS/Overview.md).
-NNI has supported many one-shot NAS algorithms, such as ENAS, DARTS, through NNI trial SDK. To use these algorithms you do not have to start an NNI experiment. Instead, to import an algorithm in your trial code, and simply run your trial code. If you want to tune the hyperparameters in the algorithms or want to run multiple instances, you could choose a tuner and start an NNI experiment.
+NNI has support for many one-shot NAS algorithms such as ENAS and DARTS through NNI trial SDK. To use these algorithms you do not have to start an NNI experiment. Instead, import an algorithm in your trial code and simply run your trial code. If you want to tune the hyperparameters in the algorithms or want to run multiple instances, you can choose a tuner and start an NNI experiment.
Other than one-shot NAS, NAS can also run in a classic mode where each candidate architecture runs as an independent trial job. In this mode, similar to hyperparameter tuning, users have to start an NNI experiment and choose a tuner for NAS.
### Model Compression
-Model Compression on NNI includes pruning algorithms and quantization algorithms. These algorithms are provided through NNI trial SDK. Users could directly use them in their trial code and run the trial code without starting an NNI experiment. Detailed description and usage can be found [here](Compressor/Overview.md).
+Model Compression on NNI includes pruning algorithms and quantization algorithms. These algorithms are provided through NNI trial SDK. Users can directly use them in their trial code and run the trial code without starting an NNI experiment. A detailed description of model compression and its usage can be found [here](Compressor/Overview.md).
-There are different types of hyperparamters in model compression. One type is the hyperparameters in input configuration, e.g., sparsity, quantization bits, to a compression algorithm. The other type is the hyperparamters in compression algorithms. Here, Hyperparameter tuning of NNI could help a lot in finding the best compressed model automatically. A simple example can be found [here](Compressor/AutoCompression.md).
+There are different types of hyperparameters in model compression. One type is the hyperparameters in input configuration (e.g., sparsity, quantization bits) to a compression algorithm. The other type is the hyperparameters in compression algorithms. Here, Hyperparameter tuning of NNI can help a lot in finding the best compressed model automatically. A simple example can be found [here](Compressor/AutoCompression.md).
### Automatic Feature Engineering
-Automatic feature engineering is for users to find the best features for the following tasks. Detailed description and usage can be found [here](FeatureEngineering/Overview.md). It is supported through NNI trial SDK, which means you do not have to create an NNI experiment. Instead, simply import a built-in auto-feature-engineering algorithm in your trial code and directly run your trial code.
+Automatic feature engineering is for users to find the best features for their tasks. A detailed description of automatic feature engineering and its usage can be found [here](FeatureEngineering/Overview.md). It is supported through NNI trial SDK, which means you do not have to create an NNI experiment. Instead, simply import a built-in auto-feature-engineering algorithm in your trial code and directly run your trial code.
The auto-feature-engineering algorithms usually have a bunch of hyperparameters themselves. If you want to automatically tune those hyperparameters, you can leverage hyperparameter tuning of NNI, that is, choose a tuning algorithm (i.e., tuner) and start an NNI experiment for it.
From 4d07201a1b1354faa75da45533400d992fa1192c Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Wed, 11 Mar 2020 19:17:38 -0700
Subject: [PATCH 02/34] Sentences shouldn't typically begin with "and", in
installation.rst
---
docs/en_US/installation.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/en_US/installation.rst b/docs/en_US/installation.rst
index 2606ceaa05..5688b3b4c8 100644
--- a/docs/en_US/installation.rst
+++ b/docs/en_US/installation.rst
@@ -2,7 +2,7 @@
Installation
############
-Currently we support installation on Linux, Mac and Windows. And also allow you to use docker.
+Currently we support installation on Linux, Mac and Windows. We also allow you to use docker.
.. toctree::
:maxdepth: 2
From c6ff99bf051a5eccda0a022b26d28b20bb05e5b9 Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Wed, 11 Mar 2020 19:19:31 -0700
Subject: [PATCH 03/34] Fixed a bit of bad grammar and awkward phrasing in
Linux installation instructions.
---
docs/en_US/Tutorial/InstallationLinux.md | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/docs/en_US/Tutorial/InstallationLinux.md b/docs/en_US/Tutorial/InstallationLinux.md
index 20987f2f84..9d6710d3af 100644
--- a/docs/en_US/Tutorial/InstallationLinux.md
+++ b/docs/en_US/Tutorial/InstallationLinux.md
@@ -2,7 +2,7 @@
## Installation
-Installation on Linux and macOS follow the same instruction below.
+Installation on Linux and macOS follow the same instructions, given below.
### Install NNI through pip
@@ -14,7 +14,7 @@ Installation on Linux and macOS follow the same instruction below.
### Install NNI through source code
- If you are interested on special or latest code version, you can install NNI through source code.
+ If you are interested in special or the latest code versions, you can install NNI through source code.
Prerequisites: `python 64-bit >=3.5`, `git`, `wget`
@@ -26,7 +26,7 @@ Installation on Linux and macOS follow the same instruction below.
### Use NNI in a docker image
- You can also install NNI in a docker image. Please follow the instructions [here](https://github.com/Microsoft/nni/tree/master/deployment/docker/README.md) to build NNI docker image. The NNI docker image can also be retrieved from Docker Hub through the command `docker pull msranni/nni:latest`.
+ You can also install NNI in a docker image. Please follow the instructions [here](https://github.com/Microsoft/nni/tree/master/deployment/docker/README.md) to build an NNI docker image. The NNI docker image can also be retrieved from Docker Hub through the command `docker pull msranni/nni:latest`.
## Verify installation
@@ -72,7 +72,7 @@ You can use these commands to get more information about the experiment
-----------------------------------------------------------------------
```
-* Open the `Web UI url` in your browser, you can view detail information of the experiment and all the submitted trial jobs as shown below. [Here](../Tutorial/WebUI.md) are more Web UI pages.
+* Open the `Web UI url` in your browser, you can view detailed information about the experiment and all the submitted trial jobs as shown below. [Here](../Tutorial/WebUI.md) are more Web UI pages.
![overview](../../img/webui_overview_page.png)
From 2ffe8a09c04bfa8821ebc8e7239ccd6456e57810 Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Wed, 11 Mar 2020 19:26:07 -0700
Subject: [PATCH 04/34] Additional, single correction in linux instructions.
---
docs/en_US/Tutorial/InstallationLinux.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/en_US/Tutorial/InstallationLinux.md b/docs/en_US/Tutorial/InstallationLinux.md
index 9d6710d3af..adc475fce8 100644
--- a/docs/en_US/Tutorial/InstallationLinux.md
+++ b/docs/en_US/Tutorial/InstallationLinux.md
@@ -32,7 +32,7 @@ Installation on Linux and macOS follow the same instructions, given below.
The following example is built on TensorFlow 1.x. Make sure **TensorFlow 1.x is used** when running it.
-* Download the examples via clone the source code.
+* Download the examples via cloning the source code.
```bash
git clone -b v1.4 https://github.com/Microsoft/nni.git
From b85e2e4f2abe99acb1a93d3650e792e637ac4671 Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Wed, 11 Mar 2020 19:28:15 -0700
Subject: [PATCH 05/34] Fix a lot of bad grammar and phrasing in windows
installation instructions
---
docs/en_US/Tutorial/InstallationWin.md | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/docs/en_US/Tutorial/InstallationWin.md b/docs/en_US/Tutorial/InstallationWin.md
index d33dcbb3c3..d6b7a0a121 100644
--- a/docs/en_US/Tutorial/InstallationWin.md
+++ b/docs/en_US/Tutorial/InstallationWin.md
@@ -14,7 +14,7 @@ Anaconda or Miniconda is highly recommended to manage multiple Python environmen
### Install NNI through source code
- If you are interested on special or latest code version, you can install NNI through source code.
+ If you are interested in special or the latest code versions, you can install NNI through source code.
Prerequisites: `python 64-bit >=3.5`, `git`, `PowerShell`.
@@ -70,7 +70,7 @@ You can use these commands to get more information about the experiment
-----------------------------------------------------------------------
```
-* Open the `Web UI url` in your browser, you can view detail information of the experiment and all the submitted trial jobs as shown below. [Here](../Tutorial/WebUI.md) are more Web UI pages.
+* Open the `Web UI url` in your browser, you can view detailed information about the experiment and all the submitted trial jobs as shown below. [Here](../Tutorial/WebUI.md) are more Web UI pages.
![overview](../../img/webui_overview_page.png)
@@ -94,35 +94,35 @@ Below are the minimum system requirements for NNI on Windows, Windows 10.1809 is
### simplejson failed when installing NNI
-Make sure C++ 14.0 compiler installed.
+Make sure a C++ 14.0 compiler is installed.
>building 'simplejson._speedups' extension error: [WinError 3] The system cannot find the path specified
### Trial failed with missing DLL in command line or PowerShell
-This error caused by missing LIBIFCOREMD.DLL and LIBMMD.DLL and fail to install SciPy. Using Anaconda or Miniconda with Python(64-bit) can solve it.
+This error is caused by missing LIBIFCOREMD.DLL and LIBMMD.DLL and failure to install SciPy. Using Anaconda or Miniconda with Python(64-bit) can solve it.
>ImportError: DLL load failed
### Trial failed on webUI
Please check the trial log file stderr for more details.
-If there is a stderr file, please check out. Two possible cases are as follows:
+If there is a stderr file, please check it. Two possible cases are:
-* forget to change the trial command `python3` into `python` in each experiment YAML.
-* forget to install experiment dependencies such as TensorFlow, Keras and so on.
+* forgetting to change the trial command `python3` to `python` in each experiment YAML.
+* forgetting to install experiment dependencies such as TensorFlow, Keras and so on.
### Fail to use BOHB on Windows
-Make sure C++ 14.0 compiler installed then try to run `nnictl package install --name=BOHB` to install the dependencies.
+Make sure a C++ 14.0 compiler is installed when trying to run `nnictl package install --name=BOHB` to install the dependencies.
### Not supported tuner on Windows
-SMAC is not supported currently, the specific reason can be referred to this [GitHub issue](https://github.com/automl/SMAC3/issues/483).
+SMAC is not supported currently; for the specific reason refer to this [GitHub issue](https://github.com/automl/SMAC3/issues/483).
### Use a Windows server as a remote worker
-Currently you can't.
+Currently, you can't.
Note:
-* If there is any error like `Segmentation fault`, please refer to [FAQ](FAQ.md)
+* If an error like `Segmentation fault` is encountered, please refer to [FAQ](FAQ.md)
## Further reading
From 476247af99d24721888197f2bd472bd47f5d9aa5 Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Wed, 11 Mar 2020 19:50:08 -0700
Subject: [PATCH 06/34] Fix a variety of grammar and spelling problems in
docker instructions
Lots of akward phrasing.
Lots of tense issues (could vs can)
Lots of spelling errors (especcially "offical")
Lots of missing articles
Docker is a proper noun and should be capitalized
---
docs/en_US/Tutorial/HowToUseDocker.md | 74 +++++++++++++--------------
1 file changed, 37 insertions(+), 37 deletions(-)
diff --git a/docs/en_US/Tutorial/HowToUseDocker.md b/docs/en_US/Tutorial/HowToUseDocker.md
index d480094329..b028b50056 100644
--- a/docs/en_US/Tutorial/HowToUseDocker.md
+++ b/docs/en_US/Tutorial/HowToUseDocker.md
@@ -3,89 +3,89 @@
## Overview
-[Docker](https://www.docker.com/) is a tool to make it easier for users to deploy and run applications based on their own operating system by starting containers. Docker is not a virtual machine, it does not create a virtual operating system, bug it allows different applications to use the same OS kernel, and isolate different applications by container.
+[Docker](https://www.docker.com/) is a tool to make it easier for users to deploy and run applications based on their own operating system by starting containers. Docker is not a virtual machine, it does not create a virtual operating system, but it allows different applications to use the same OS kernel and isolate different applications by container.
-Users could start NNI experiments using docker, and NNI provides an offical docker image [msranni/nni](https://hub.docker.com/r/msranni/nni) in docker hub.
+Users can start NNI experiments using Docker. NNI also provides an official Docker image [msranni/nni](https://hub.docker.com/r/msranni/nni) on Docker Hub.
-## Using docker in local machine
+## Using Docker in local machine
-### Step 1: Installation of docker
-Before you start using docker to start NNI experiments, you should install a docker software in your local machine. [Refer](https://docs.docker.com/install/linux/docker-ce/ubuntu/)
+### Step 1: Installation of Docker
+Before you start using Docker for NNI experiments, you should install Docker on your local machine. [Refer to this](https://docs.docker.com/install/linux/docker-ce/ubuntu/).
-### Step2: Start docker container
-If you have installed the docker package in your local machine, you could start a docker container instance to run NNI examples. You should notice that because NNI will start a web UI process in container and continue to listen to a port, you need to specify the port mapping between your host machine and docker container to give access to web UI outside the container. By visting the host ip address and port, you could redirect to the web UI process started in docker container, and visit web UI content.
+### Step 2: Start a Docker container
+If you have installed the Docker package in your local machine, you can start a Docker container instance to run NNI examples. You should notice that because NNI will start a web UI process in a container and continue to listen to a port, you need to specify the port mapping between your host machine and Docker container to give access to web UI outside the container. By visiting the host IP address and port, you can redirect to the web UI process started in Docker container and visit web UI content.
-For example, you could start a new docker container from following command:
+For example, you could start a new Docker container from the following command:
```
docker run -i -t -p [hostPort]:[containerPort] [image]
```
-`-i:` Start a docker in an interactive mode.
+`-i:` Start a Docker in an interactive mode.
-`-t:` Docker assign the container a input terminal.
+`-t:` Docker assign the container an input terminal.
`-p:` Port mapping, map host port to a container port.
-For more information about docker command, please [refer](https://docs.docker.com/v17.09/edge/engine/reference/run/)
+For more information about Docker commands, please [refer to this](https://docs.docker.com/v17.09/edge/engine/reference/run/).
Note:
```
- NNI only support Ubuntu and MacOS system in local mode for the moment, please use correct docker image type.If you want to use gpu in docker container, please use nvidia-docker.
+ NNI only supports Ubuntu and MacOS systems in local mode for the moment, please use correct Docker image type. If you want to use gpu in a Docker container, please use nvidia-docker.
```
-### Step3: Run NNI in docker container
+### Step 3: Run NNI in a Docker container
-If you start a docker image using NNI's offical image `msranni/nni`, you could directly start NNI experiments by using `nnictl` command. Our offical image has NNI's running environment and basic python and deep learning frameworks environment.
+If you start a Docker image using NNI's official image `msranni/nni`, you can directly start NNI experiments by using the `nnictl` command. Our official image has NNI's running environment and basic python and deep learning frameworks preinstalled.
-If you start your own docker image, you may need to install NNI package first, please refer to [NNI installation](InstallationLinux.md).
+If you start your own Docker image, you may need to install the NNI package first; please refer to [NNI installation](InstallationLinux.md).
-If you want to run NNI's offical examples, you may need to clone NNI repo in github using
+If you want to run NNI's official examples, you may need to clone the NNI repo in GitHub using
```
git clone https://github.com/Microsoft/nni.git
```
-then you could enter `nni/examples/trials` to start an experiment.
+then you can enter `nni/examples/trials` to start an experiment.
-After you prepare NNI's environment, you could start a new experiment using `nnictl` command, [refer](QuickStart.md)
+After you prepare NNI's environment, you can start a new experiment using the `nnictl` command. [Refer to this](QuickStart.md).
-## Using docker in remote platform
+## Using Docker on a remote platform
-NNI support starting experiments in [remoteTrainingService](../TrainingService/RemoteMachineMode.md), and run trial jobs in remote machines. As docker could start an independent Ubuntu system as SSH server, docker container could be used as the remote machine in NNI's remot mode.
+NNI supports starting experiments in [remoteTrainingService](../TrainingService/RemoteMachineMode.md), and running trial jobs on remote machines. As Docker can start an independent Ubuntu system as an SSH server, a Docker container can be used as the remote machine in NNI's remote mode.
-### Step 1: Setting docker environment
+### Step 1: Setting a Docker environment
-You should install a docker software in your remote machine first, please [refer](https://docs.docker.com/install/linux/docker-ce/ubuntu/).
+You should install the Docker software on your remote machine first, please [refer to this](https://docs.docker.com/install/linux/docker-ce/ubuntu/).
-To make sure your docker container could be connected by NNI experiments, you should build your own docker image to set SSH server or use images with SSH configuration. If you want to use docker container as SSH server, you should configure SSH password login or private key login, please [refer](https://docs.docker.com/engine/examples/running_ssh_service/).
+To make sure your Docker container can be connected by NNI experiments, you should build your own Docker image to set an SSH server or use images with an SSH configuration. If you want to use a Docker container as an SSH server, you should configure the SSH password login or private key login; please [refer to this](https://docs.docker.com/engine/examples/running_ssh_service/).
Note:
```
-NNI's offical image msranni/nni does not support SSH server for the time being, you should build your own docker image with SSH configuration or use other images as remote server.
+NNI's official image msranni/nni does not support SSH servers for the time being; you should build your own Docker image with an SSH configuration or use other images as a remote server.
```
-### Step2: Start docker container in remote machine
+### Step 2: Start a Docker container on a remote machine
-SSH server need a port, you need to expose docker's SSH port to NNI as the connection port. For example, if you set your container's SSH port as **`A`**, you should map container's port **`A`** to your remote host machine's another port **`B`**, NNI will connect port **`B`** as SSH port, and your host machine will map the connection from port **`B`** to port **`A`**, then NNI could connect to your docker container.
+An SSH server needs a port; you need to expose Docker's SSH port to NNI as the connection port. For example, if you set your container's SSH port as **`A`**, you should map the container's port **`A`** to your remote host machine's other port **`B`**, NNI will connect port **`B`** as an SSH port, and your host machine will map the connection from port **`B`** to port **`A`** then NNI could connect to your Docker container.
-For example, you could start your docker container using following commands:
+For example, you could start your Docker container using the following commands:
```
docker run -dit -p [hostPort]:[containerPort] [image]
```
-The `containerPort` is the SSH port used in your docker container, and the `hostPort` is your host machine's port exposed to NNI. You could set your NNI's config file to connect to `hostPort`, and the connection will be transmitted to your docker container.
-For more information about docker command, please [refer](https://docs.docker.com/v17.09/edge/engine/reference/run/).
+The `containerPort` is the SSH port used in your Docker container and the `hostPort` is your host machine's port exposed to NNI. You can set your NNI's config file to connect to `hostPort` and the connection will be transmitted to your Docker container.
+For more information about Docker commands, please [refer to this](https://docs.docker.com/v17.09/edge/engine/reference/run/).
Note:
```
-If you use your own docker image as remote server, please make sure that this image has basic python environment and NNI SDK runtime environment. If you want to use gpu in docker container, please use nvidia-docker.
+If you use your own Docker image as a remote server, please make sure that this image has a basic python environment and an NNI SDK runtime environment. If you want to use a GPU in a Docker container, please use nvidia-docker.
```
-### Step3: Run NNI experiments
+### Step 3: Run NNI experiments
-You could set your config file as remote platform, and setting the `machineList` configuration to connect your docker SSH server, [refer](../TrainingService/RemoteMachineMode.md). Note that you should set correct `port`,`username` and `passwd` or `sshKeyPath` of your host machine.
+You can set your config file as a remote platform and set the `machineList` configuration to connect to your Docker SSH server; [refer to this](../TrainingService/RemoteMachineMode.md). Note that you should set the correct `port`, `username`, and `passWd` or `sshKeyPath` of your host machine.
-`port:` The host machine's port, mapping to docker's SSH port.
+`port:` The host machine's port, mapping to Docker's SSH port.
-`username:` The username of docker container.
+`username:` The username of the Docker container.
-`passWd:` The password of docker container.
+`passWd:` The password of the Docker container.
-`sshKeyPath:` The path of private key of docker container.
+`sshKeyPath:` The path of the private key of the Docker container.
-After the configuration of config file, you could start an experiment, [refer](QuickStart.md)
+After the configuration of the config file, you could start an experiment, [refer to this](QuickStart.md).
From 67ebb181ef11b900d4e32139daa6d8e91763baa9 Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Wed, 11 Mar 2020 19:50:21 -0700
Subject: [PATCH 07/34] Missing article in windows install instructions.
---
docs/en_US/Tutorial/InstallationWin.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/en_US/Tutorial/InstallationWin.md b/docs/en_US/Tutorial/InstallationWin.md
index d6b7a0a121..9180492164 100644
--- a/docs/en_US/Tutorial/InstallationWin.md
+++ b/docs/en_US/Tutorial/InstallationWin.md
@@ -122,7 +122,7 @@ Currently, you can't.
Note:
-* If an error like `Segmentation fault` is encountered, please refer to [FAQ](FAQ.md)
+* If an error like `Segmentation fault` is encountered, please refer to the [FAQ](FAQ.md)
## Further reading
From 5a827a00e88f630b68e2e16a1180725c3953deeb Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Wed, 11 Mar 2020 20:00:40 -0700
Subject: [PATCH 08/34] Change some "refer to this"s to "see here"s.
---
docs/en_US/Tutorial/HowToUseDocker.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/docs/en_US/Tutorial/HowToUseDocker.md b/docs/en_US/Tutorial/HowToUseDocker.md
index b028b50056..0f2d2c1b47 100644
--- a/docs/en_US/Tutorial/HowToUseDocker.md
+++ b/docs/en_US/Tutorial/HowToUseDocker.md
@@ -10,7 +10,7 @@ Users can start NNI experiments using Docker. NNI also provides an official Dock
## Using Docker in local machine
### Step 1: Installation of Docker
-Before you start using Docker for NNI experiments, you should install Docker on your local machine. [Refer to this](https://docs.docker.com/install/linux/docker-ce/ubuntu/).
+Before you start using Docker for NNI experiments, you should install Docker on your local machine. [See here](https://docs.docker.com/install/linux/docker-ce/ubuntu/).
### Step 2: Start a Docker container
If you have installed the Docker package in your local machine, you can start a Docker container instance to run NNI examples. You should notice that because NNI will start a web UI process in a container and continue to listen to a port, you need to specify the port mapping between your host machine and Docker container to give access to web UI outside the container. By visiting the host IP address and port, you can redirect to the web UI process started in Docker container and visit web UI content.
@@ -43,7 +43,7 @@ git clone https://github.com/Microsoft/nni.git
```
then you can enter `nni/examples/trials` to start an experiment.
-After you prepare NNI's environment, you can start a new experiment using the `nnictl` command. [Refer to this](QuickStart.md).
+After you prepare NNI's environment, you can start a new experiment using the `nnictl` command. [See here](QuickStart.md).
## Using Docker on a remote platform
From 7c0ba6624d3bdaefc57c0f66a5a919855fb95247 Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Wed, 11 Mar 2020 20:19:14 -0700
Subject: [PATCH 09/34] Fix a lot of bad grammar and confusing wording in Quick
Start
tab "something" should be the "something" tab.
Tense issues (e.g., Modified vs. Modify).
---
docs/en_US/Tutorial/QuickStart.md | 70 +++++++++++++++----------------
1 file changed, 35 insertions(+), 35 deletions(-)
diff --git a/docs/en_US/Tutorial/QuickStart.md b/docs/en_US/Tutorial/QuickStart.md
index 9990d234d1..81c5b9e05d 100644
--- a/docs/en_US/Tutorial/QuickStart.md
+++ b/docs/en_US/Tutorial/QuickStart.md
@@ -2,7 +2,7 @@
## Installation
-We support Linux macOS and Windows in current stage, Ubuntu 16.04 or higher, macOS 10.14.1 and Windows 10.1809 are tested and supported. Simply run the following `pip install` in an environment that has `python >= 3.5`.
+We currently support Linux, macOS, and Windows. Ubuntu 16.04 or higher, macOS 10.14.1, and Windows 10.1809 are tested and supported. Simply run the following `pip install` in an environment that has `python >= 3.5`.
**Linux and macOS**
@@ -18,15 +18,15 @@ We support Linux macOS and Windows in current stage, Ubuntu 16.04 or higher, mac
Note:
-* For Linux and macOS `--user` can be added if you want to install NNI in your home directory, which does not require any special privileges.
-* If there is any error like `Segmentation fault`, please refer to [FAQ](FAQ.md)
-* For the `system requirements` of NNI, please refer to [Install NNI on Linux&Mac](InstallationLinux.md) or [Windows](InstallationWin.md)
+* For Linux and macOS, `--user` can be added if you want to install NNI in your home directory; this does not require any special privileges.
+* If there is an error like `Segmentation fault`, please refer to the [FAQ](FAQ.md).
+* For the `system requirements` of NNI, please refer to [Install NNI on Linux&Mac](InstallationLinux.md) or [Windows](InstallationWin.md).
## "Hello World" example on MNIST
-NNI is a toolkit to help users run automated machine learning experiments. It can automatically do the cyclic process of getting hyperparameters, running trials, testing results, tuning hyperparameters. Now, we show how to use NNI to help you find the optimal hyperparameters.
+NNI is a toolkit to help users run automated machine learning experiments. It can automatically do the cyclic process of getting hyperparameters, running trials, testing results, and tuning hyperparameters. Here, we'll show how to use NNI to help you find the optimal hyperparameters for a MNIST model.
-Here is an example script to train a CNN on MNIST dataset **without NNI**:
+Here is an example script to train a CNN on the MNIST dataset **without NNI**:
```python
def run_trial(params):
@@ -48,11 +48,11 @@ if __name__ == '__main__':
run_trial(params)
```
-Note: If you want to see the full implementation, please refer to [examples/trials/mnist-tfv1/mnist_before.py](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-tfv1/mnist_before.py)
+Note: If you want to see the full implementation, please refer to [examples/trials/mnist-tfv1/mnist_before.py](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-tfv1/mnist_before.py).
-The above code can only try one set of parameters at a time, if we want to tune learning rate, we need to manually modify the hyperparameter and start the trial again and again.
+The above code can only try one set of parameters at a time; if we want to tune learning rate, we need to manually modify the hyperparameter and start the trial again and again.
-NNI is born for helping user do the tuning jobs, the NNI working process is presented below:
+NNI is born to help the user do tuning jobs; the NNI working process is presented below:
```text
input: search space, trial code, config file
@@ -67,11 +67,11 @@ output: one optimal hyperparameter configuration
7: return hyperparameter value with best final result
```
-If you want to use NNI to automatically train your model and find the optimal hyper-parameters, you need to do three changes base on your code:
+If you want to use NNI to automatically train your model and find the optimal hyper-parameters, you need to do three changes based on your code:
**Three steps to start an experiment**
-**Step 1**: Give a `Search Space` file in JSON, includes the `name` and the `distribution` (discrete valued or continuous valued) of all the hyperparameters you need to search.
+**Step 1**: Give a `Search Space` file in JSON, including the `name` and the `distribution` (discrete-valued or continuous-valued) of all the hyperparameters you need to search.
```diff
- params = {'data_dir': '/tmp/tensorflow/mnist/input_data', 'dropout_rate': 0.5, 'channel_1_num': 32, 'channel_2_num': 64,
@@ -87,7 +87,7 @@ If you want to use NNI to automatically train your model and find the optimal hy
*Implemented code directory: [search_space.json](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-tfv1/search_space.json)*
-**Step 2**: Modified your `Trial` file to get the hyperparameter set from NNI and report the final result to NNI.
+**Step 2**: Modify your `Trial` file to get the hyperparameter set from NNI and report the final result to NNI.
```diff
+ import nni
@@ -112,7 +112,7 @@ If you want to use NNI to automatically train your model and find the optimal hy
*Implemented code directory: [mnist.py](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-tfv1/mnist.py)*
-**Step 3**: Define a `config` file in YAML, which declare the `path` to search space and trial, also give `other information` such as tuning algorithm, max trial number and max duration arguments.
+**Step 3**: Define a `config` file in YAML which declares the `path` to the search space and trial files. It also gives other information such as the tuning algorithm, max trial number, and max duration arguments.
```yaml
authorName: default
@@ -133,15 +133,15 @@ trial:
gpuNum: 0
```
-Note, **for Windows, you need to change trial command `python3` to `python`**
+Note, **for Windows, you need to change the trial command from `python3` to `python`**.
*Implemented code directory: [config.yml](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-tfv1/config.yml)*
-All the codes above are already prepared and stored in [examples/trials/mnist-tfv1/](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-tfv1).
+All the cod above is already prepared and stored in [examples/trials/mnist-tfv1/](https://github.com/Microsoft/nni/tree/master/examples/trials/mnist-tfv1).
**Linux and macOS**
-Run the **config.yml** file from your command line to start MNIST experiment.
+Run the **config.yml** file from your command line to start an MNIST experiment.
```bash
nnictl create --config nni/examples/trials/mnist-tfv1/config.yml
@@ -149,17 +149,17 @@ Run the **config.yml** file from your command line to start MNIST experiment.
**Windows**
-Run the **config_windows.yml** file from your command line to start MNIST experiment.
+Run the **config_windows.yml** file from your command line to start an MNIST experiment.
-Note, if you're using NNI on Windows, it needs to change `python3` to `python` in the config.yml file, or use the config_windows.yml file to start the experiment.
+Note: if you're using NNI on Windows, you need to change `python3` to `python` in the config.yml file or use the config_windows.yml file to start the experiment.
```bash
nnictl create --config nni\examples\trials\mnist-tfv1\config_windows.yml
```
-Note, `nnictl` is a command line tool, which can be used to control experiments, such as start/stop/resume an experiment, start/stop NNIBoard, etc. Click [here](Nnictl.md) for more usage of `nnictl`
+Note: `nnictl` is a command line tool that can be used to control experiments, such as start/stop/resume an experiment, start/stop NNIBoard, etc. Click [here](Nnictl.md) for more usage of `nnictl`
-Wait for the message `INFO: Successfully started experiment!` in the command line. This message indicates that your experiment has been successfully started. And this is what we expected to get:
+Wait for the message `INFO: Successfully started experiment!` in the command line. This message indicates that your experiment has been successfully started. And this is what we expect to get:
```text
INFO: Starting restful server...
@@ -187,53 +187,53 @@ You can use these commands to get more information about the experiment
-----------------------------------------------------------------------
```
-If you prepare `trial`, `search space` and `config` according to the above steps and successfully create a NNI job, NNI will automatically tune the optimal hyper-parameters and run different hyper-parameters sets for each trial according to the requirements you set. You can clearly sees its progress by NNI WebUI.
+If you prepared `trial`, `search space`, and `config` according to the above steps and successfully created an NNI job, NNI will automatically tune the optimal hyper-parameters and run different hyper-parameter sets for each trial according to the requirements you set. You can clearly see its progress through the NNI WebUI.
## WebUI
-After you start your experiment in NNI successfully, you can find a message in the command-line interface to tell you `Web UI url` like this:
+After you start your experiment in NNI successfully, you can find a message in the command-line interface that tells you the `Web UI url` like this:
```text
The Web UI urls are: [Your IP]:8080
```
-Open the `Web UI url`(In this information is: `[Your IP]:8080`) in your browser, you can view detail information of the experiment and all the submitted trial jobs as shown below. If you can not open the WebUI link in your terminal, you can refer to [FAQ](FAQ.md).
+Open the `Web UI url` (Here it's: `[Your IP]:8080`) in your browser; you can view detailed information about the experiment and all the submitted trial jobs as shown below. If you cannot open the WebUI link in your terminal, please refer to the [FAQ](FAQ.md).
### View summary page
-Click the tab "Overview".
+Click the "Overview" tab.
-Information about this experiment will be shown in the WebUI, including the experiment trial profile and search space message. NNI also support `download these information and parameters` through the **Download** button. You can download the experiment result anytime in the middle for the running or at the end of the execution, etc.
+Information about this experiment will be shown in the WebUI, including the experiment trial profile and search space message. NNI also supports downloading this information and the parameters through the **Download** button. You can download the experiment results anytime while the experiment is running, or you can wait until the end of the execution, etc.
![](../../img/QuickStart1.png)
-Top 10 trials will be listed in the Overview page, you can browse all the trials in "Trials Detail" page.
+The top 10 trials will be listed on the Overview page. You can browse all the trials on the "Trials Detail" page.
![](../../img/QuickStart2.png)
### View trials detail page
-Click the tab "Default Metric" to see the point graph of all trials. Hover to see its specific default metric and search space message.
+Click the "Default Metric" tab to see the point graph of all trials. Hover to see specific default metrics and search space messages.
![](../../img/QuickStart3.png)
-Click the tab "Hyper Parameter" to see the parallel graph.
+Click the "Hyper Parameter" tab to see the parallel graph.
-* You can select the percentage to see top trials.
-* Choose two axis to swap its positions
+* You can select the percentage to see the top trials.
+* Choose two axis to swap their positions.
![](../../img/QuickStart4.png)
-Click the tab "Trial Duration" to see the bar graph.
+Click the "Trial Duration" tab to see the bar graph.
![](../../img/QuickStart5.png)
-Below is the status of the all trials. Specifically:
+Below is the status of all trials. Specifically:
-* Trial detail: trial's id, trial's duration, start time, end time, status, accuracy and search space file.
+* Trial detail: trial's id, duration, start time, end time, status, accuracy, and search space file.
* If you run on the OpenPAI platform, you can also see the hdfsLogPath.
-* Kill: you can kill a job that status is running.
-* Support to search for a specific trial.
+* Kill: you can kill a job that has the `Running` status.
+* Support: Used to search for a specific trial.
![](../../img/QuickStart6.png)
From 03a5d45a72be7dc0adfb96b634f71364642e7e74 Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Thu, 12 Mar 2020 09:56:59 -0700
Subject: [PATCH 10/34] Fix some akward phrasing in hyperparameter tuning
directory.
---
docs/en_US/hyperparameter_tune.rst | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/docs/en_US/hyperparameter_tune.rst b/docs/en_US/hyperparameter_tune.rst
index d49c3109af..69078abf71 100644
--- a/docs/en_US/hyperparameter_tune.rst
+++ b/docs/en_US/hyperparameter_tune.rst
@@ -2,16 +2,16 @@
Auto (Hyper-parameter) Tuning
#############################
-Auto tuning is one of the key features provided by NNI, a main application scenario is
-hyper-parameter tuning. Trial code is the one to be tuned, we provide a lot of popular
+Auto tuning is one of the key features provided by NNI; a main application scenario being
+hyper-parameter tuning. Tuning specifically applies to trial code. We provide a lot of popular
auto tuning algorithms (called Tuner), and some early stop algorithms (called Assessor).
-NNI supports running trial on various training platforms, for example, on a local machine,
-on several servers in a distributed manner, or on platforms such as OpenPAI, Kubernetes.
+NNI supports running trials on various training platforms, for example, on a local machine,
+on several servers in a distributed manner, or on platforms such as OpenPAI, Kubernetes, etc.
Other key features of NNI, such as model compression, feature engineering, can also be further
-enhanced by auto tuning, which is described when introduing those features.
+enhanced by auto tuning, which we'll described when introducing those features.
-NNI has high extensibility, advanced users could customized their own Tuner, Assessor, and Training Service
+NNI has high extensibility, advanced users can customize their own Tuner, Assessor, and Training Service
according to their needs.
.. toctree::
From ba5a0306662bf12966ff9c46bb0e9b7f27412ac1 Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Thu, 12 Mar 2020 11:50:01 -0700
Subject: [PATCH 11/34] Clean up grammar and phrasing in trial setup.
---
docs/en_US/TrialExample/Trials.md | 44 +++++++++++++++----------------
1 file changed, 22 insertions(+), 22 deletions(-)
diff --git a/docs/en_US/TrialExample/Trials.md b/docs/en_US/TrialExample/Trials.md
index 2c00b1a146..1212fdc97b 100644
--- a/docs/en_US/TrialExample/Trials.md
+++ b/docs/en_US/TrialExample/Trials.md
@@ -1,8 +1,8 @@
# Write a Trial Run on NNI
-A **Trial** in NNI is an individual attempt at applying a configuration (e.g., a set of hyper-parameters) on a model.
+A **Trial** in NNI is an individual attempt at applying a configuration (e.g., a set of hyper-parameters) to a model.
-To define an NNI trial, you need to firstly define the set of parameters (i.e., search space) and then update the model. NNI provide two approaches for you to define a trial: [NNI API](#nni-api) and [NNI Python annotation](#nni-annotation). You could also refer to [here](#more-examples) for more trial examples.
+To define an NNI trial, you need to first define the set of parameters (i.e., search space) and then update the model. NNI provides two approaches for you to define a trial: [NNI API](#nni-api) and [NNI Python annotation](#nni-annotation). You could also refer to [here](#more-examples) for more trial examples.
## NNI API
@@ -20,9 +20,9 @@ An example is shown below:
}
```
-Refer to [SearchSpaceSpec.md](../Tutorial/SearchSpaceSpec.md) to learn more about search space. Tuner will generate configurations from this search space, that is, choosing a value for each hyperparameter from the range.
+Refer to [SearchSpaceSpec.md](../Tutorial/SearchSpaceSpec.md) to learn more about search spaces. Tuner will generate configurations from this search space, that is, choosing a value for each hyperparameter from the range.
-### Step 2 - Update model codes
+### Step 2 - Update model code
- Import NNI
@@ -44,18 +44,18 @@ RECEIVED_PARAMS = nni.get_next_parameter()
nni.report_intermediate_result(metrics)
```
-`metrics` could be any python object. If users use NNI built-in tuner/assessor, `metrics` can only have two formats: 1) a number e.g., float, int, 2) a dict object that has a key named `default` whose value is a number. This `metrics` is reported to [assessor](../Assessor/BuiltinAssessor.md). Usually, `metrics` could be periodically evaluated loss or accuracy.
+`metrics` can be any python object. If users use the NNI built-in tuner/assessor, `metrics` can only have two formats: 1) a number e.g., float, int, or 2) a dict object that has a key named `default` whose value is a number. These `metrics` are reported to [assessor](../Assessor/BuiltinAssessor.md). Often, `metrics` includes the periodically evaluated loss or accuracy.
- Report performance of the configuration
```python
nni.report_final_result(metrics)
```
-`metrics` also could be any python object. If users use NNI built-in tuner/assessor, `metrics` follows the same format rule as that in `report_intermediate_result`, the number indicates the model's performance, for example, the model's accuracy, loss etc. This `metrics` is reported to [tuner](../Tuner/BuiltinTuner.md).
+`metrics` can also be any python object. If users use the NNI built-in tuner/assessor, `metrics` follows the same format rule as that in `report_intermediate_result`, the number indicates the model's performance, for example, the model's accuracy, loss etc. These `metrics` are reported to [tuner](../Tuner/BuiltinTuner.md).
### Step 3 - Enable NNI API
-To enable NNI API mode, you need to set useAnnotation to *false* and provide the path of SearchSpace file (you just defined in step 1):
+To enable NNI API mode, you need to set useAnnotation to *false* and provide the path of the SearchSpace file was defined in step 1:
```yaml
useAnnotation: false
@@ -69,23 +69,23 @@ You can refer to [here](../Tutorial/ExperimentConfig.md) for more information ab
## NNI Python Annotation
-An alternative to writing a trial is to use NNI's syntax for python. Simple as any annotation, NNI annotation is working like comments in your codes. You don't have to make structure or any other big changes to your existing codes. With a few lines of NNI annotation, you will be able to:
+An alternative to writing a trial is to use NNI's syntax for python. NNI annotations are simple, similar to comments. You don't have to make structural changes to your existing code. With a few lines of NNI annotation, you will be able to:
* annotate the variables you want to tune
-* specify in which range you want to tune the variables
-* annotate which variable you want to report as intermediate result to `assessor`
+* specify the range in which you want to tune the variables
+* annotate which variable you want to report as an intermediate result to `assessor`
* annotate which variable you want to report as the final result (e.g. model accuracy) to `tuner`.
Again, take MNIST as an example, it only requires 2 steps to write a trial with NNI Annotation.
### Step 1 - Update codes with annotations
-The following is a tensorflow code snippet for NNI Annotation, where the highlighted four lines are annotations that help you to:
+The following is a TensorFlow code snippet for NNI Annotation where the highlighted four lines are annotations that:
1. tune batch\_size and dropout\_rate
2. report test\_acc every 100 steps
- 3. at last report test\_acc as final result.
+ 3. lastly report test\_acc as the final result.
-What noteworthy is: as these newly added codes are annotations, it does not actually change your previous codes logic, you can still run your code as usual in environments without NNI installed.
+It's worth noting that, as these newly added codes are merely annotations, you can still run your code as usual in environments without NNI installed.
```diff
with tf.Session() as sess:
@@ -114,7 +114,7 @@ with tf.Session() as sess:
```
**NOTE**:
-- `@nni.variable` will take effect on its following line, which is an assignment statement whose leftvalue must be specified by the keyword `name` in `@nni.variable`.
+- `@nni.variable` will affect its following line which should be an assignment statement whose left-hand side must be the same as the keyword `name` in the `@nni.variable` statement.
- `@nni.report_intermediate_result`/`@nni.report_final_result` will send the data to assessor/tuner at that line.
For more information about annotation syntax and its usage, please refer to [Annotation](../Tutorial/AnnotationSpec.md).
@@ -127,9 +127,9 @@ In the YAML configure file, you need to set *useAnnotation* to true to enable NN
useAnnotation: true
```
-## Standalone mode for debug
+## Standalone mode for debugging
-NNI supports standalone mode for trial code to run without starting an NNI experiment. This is for finding out bugs in trial code more conveniently. NNI annotation natively supports standalone mode, as the added NNI related lines are comments. For NNI trial APIs, the APIs have changed behaviors in standalone mode, some APIs return dummy values, and some APIs do not really report values. Please refer to the following table for the full list of these APIs.
+NNI supports a standalone mode for trial code to run without starting an NNI experiment. This is for finding out bugs in trial code more conveniently. NNI annotation natively supports standalone mode, as the added NNI related lines are comments. For NNI trial APIs, the APIs have changed behaviors in standalone mode, some APIs return dummy values, and some APIs do not really report values. Please refer to the following table for the full list of these APIs.
```python
# NOTE: please assign default values to the hyperparameters in your trial code
nni.get_next_parameter # return {}
@@ -140,17 +140,17 @@ nni.get_trial_id # return "STANDALONE"
nni.get_sequence_id # return 0
```
-You can try standalone mode with the [mnist example](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-tfv1). Simply run `python3 mnist.py` under the code directory. The trial code successfully runs with default hyperparameter values.
+You can try standalone mode with the [mnist example](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-tfv1). Simply run `python3 mnist.py` under the code directory. The trial code should successfully run with the default hyperparameter values.
-For more debuggability, please refer to [How to Debug](../Tutorial/HowToDebug.md)
+For more information on debugging, please refer to [How to Debug](../Tutorial/HowToDebug.md)
## Where are my trials?
### Local Mode
-In NNI, every trial has a dedicated directory for them to output their own data. In each trial, an environment variable called `NNI_OUTPUT_DIR` is exported. Under this directory, you could find each trial's code, data and other possible log. In addition, each trial's log (including stdout) will be re-directed to a file named `trial.log` under that directory.
+In NNI, every trial has a dedicated directory for them to output their own data. In each trial, an environment variable called `NNI_OUTPUT_DIR` is exported. Under this directory, you can find each trial's code, data, and other logs. In addition, each trial's log (including stdout) will be re-directed to a file named `trial.log` under that directory.
-If NNI Annotation is used, trial's converted code is in another temporary directory. You can check that in a file named `run.sh` under the directory indicated by `NNI_OUTPUT_DIR`. The second line (i.e., the `cd` command) of this file will change directory to the actual directory where code is located. Below is an example of `run.sh`:
+If NNI Annotation is used, the trial's converted code is in another temporary directory. You can check that in a file named `run.sh` under the directory indicated by `NNI_OUTPUT_DIR`. The second line (i.e., the `cd` command) of this file will change directory to the actual directory where code is located. Below is an example of `run.sh`:
```bash
#!/bin/bash
@@ -168,9 +168,9 @@ echo $? `date +%s%3N` >/home/user_name/nni/experiments/$experiment_id$/trials/$t
### Other Modes
-When running trials on other platform like remote machine or PAI, the environment variable `NNI_OUTPUT_DIR` only refers to the output directory of the trial, while trial code and `run.sh` might not be there. However, the `trial.log` will be transmitted back to local machine in trial's directory, which defaults to `~/nni/experiments/$experiment_id$/trials/$trial_id$/`
+When running trials on other platforms like remote machine or PAI, the environment variable `NNI_OUTPUT_DIR` only refers to the output directory of the trial, while the trial code and `run.sh` might not be there. However, the `trial.log` will be transmitted back to the local machine in the trial's directory, which defaults to `~/nni/experiments/$experiment_id$/trials/$trial_id$/`
-For more information, please refer to [HowToDebug](../Tutorial/HowToDebug.md)
+For more information, please refer to [HowToDebug](../Tutorial/HowToDebug.md).
## More Trial Examples
From 0eead79a8c5ad27fb16727538015d5b762fada9c Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Thu, 12 Mar 2020 11:51:36 -0700
Subject: [PATCH 12/34] Fix broken english in tuner directory.
---
docs/en_US/builtin_tuner.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/en_US/builtin_tuner.rst b/docs/en_US/builtin_tuner.rst
index f8eb7546cd..d2785ea848 100644
--- a/docs/en_US/builtin_tuner.rst
+++ b/docs/en_US/builtin_tuner.rst
@@ -3,7 +3,7 @@ Builtin-Tuners
NNI provides an easy way to adopt an approach to set up parameter tuning algorithms, we call them **Tuner**.
-Tuner receives metrics from `Trial` to evaluate the performance of a specific parameters/architecture configures. And tuner sends next hyper-parameter or architecture configure to Trial.
+Tuner receives metrics from `Trial` to evaluate the performance of a specific parameters/architecture configuration. Tuner sends the next hyper-parameter or architecture configuration to Trial.
.. toctree::
From a79dbb2fe29fe0da00c4c90960c79eab23cc844e Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Fri, 13 Mar 2020 11:15:10 -0700
Subject: [PATCH 13/34] Correct a bunch of bad wording throughout the
Hyperparameter Tuning overview
Lots of missing articles.
Swapped out "Example usage" for "Example config", because that's what it is. Usage isn't examplified at all.
I have no idea what the note at the end of the TPE section is trying to say, so I left it untouched, but it should be changed to something that make sense.
---
docs/en_US/Tuner/BuiltinTuner.md | 180 +++++++++++++++----------------
1 file changed, 90 insertions(+), 90 deletions(-)
diff --git a/docs/en_US/Tuner/BuiltinTuner.md b/docs/en_US/Tuner/BuiltinTuner.md
index f438239067..176c299131 100644
--- a/docs/en_US/Tuner/BuiltinTuner.md
+++ b/docs/en_US/Tuner/BuiltinTuner.md
@@ -1,32 +1,32 @@
# Built-in Tuners for Hyperparameter Tuning
-NNI provides state-of-the-art tuning algorithms as our built-in tuners and makes them easy to use. Below is the brief summary of NNI currently built-in tuners:
+NNI provides state-of-the-art tuning algorithms as part of our built-in tuners and makes them easy to use. Below is the brief summary of NNI's current built-in tuners:
-Note: Click the **Tuner's name** to get the Tuner's installation requirements, suggested scenario and using example. The link for a detailed description of the algorithm is at the end of the suggested scenario of each tuner. Here is an [article](../CommunitySharings/HpoComparision.md) about the comparison of different Tuners on several problems.
+Note: Click the **Tuner's name** to get the Tuner's installation requirements, suggested scenario, and an example configuration. A link for a detailed description of each algorithm is located at the end of the suggested scenario for each tuner. Here is an [article](../CommunitySharings/HpoComparision.md) comparing different Tuners on several problems.
-Currently we support the following algorithms:
+Currently, we support the following algorithms:
|Tuner|Brief Introduction of Algorithm|
|---|---|
|[__TPE__](#TPE)|The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model. [Reference Paper](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf)|
|[__Random Search__](#Random)|In Random Search for Hyper-Parameter Optimization show that Random Search might be surprisingly simple and effective. We suggest that we could use Random Search as the baseline when we have no knowledge about the prior distribution of hyper-parameters. [Reference Paper](http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf)|
|[__Anneal__](#Anneal)|This simple annealing algorithm begins by sampling from the prior, but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on the random search that leverages smoothness in the response surface. The annealing rate is not adaptive.|
-|[__Naïve Evolution__](#Evolution)|Naïve Evolution comes from Large-Scale Evolution of Image Classifiers. It randomly initializes a population-based on search space. For each generation, it chooses better ones and does some mutation (e.g., change a hyperparameter, add/remove one layer) on them to get the next generation. Naïve Evolution requires many trials to works, but it's very simple and easy to expand new features. [Reference paper](https://arxiv.org/pdf/1703.01041.pdf)|
-|[__SMAC__](#SMAC)|SMAC is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO, in order to handle categorical parameters. The SMAC supported by NNI is a wrapper on the SMAC3 GitHub repo. Notice, SMAC need to be installed by `nnictl package` command. [Reference Paper,](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) [GitHub Repo](https://github.com/automl/SMAC3)|
+|[__Naïve Evolution__](#Evolution)|Naïve Evolution comes from Large-Scale Evolution of Image Classifiers. It randomly initializes a population-based on search space. For each generation, it chooses better ones and does some mutation (e.g., change a hyperparameter, add/remove one layer) on them to get the next generation. Naïve Evolution requires many trials to work, but it's very simple and easy to expand new features. [Reference paper](https://arxiv.org/pdf/1703.01041.pdf)|
+|[__SMAC__](#SMAC)|SMAC is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO, in order to handle categorical parameters. The SMAC supported by NNI is a wrapper on the SMAC3 GitHub repo. Notice, SMAC needs to be installed by `nnictl package` command. [Reference Paper,](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) [GitHub Repo](https://github.com/automl/SMAC3)|
|[__Batch tuner__](#Batch)|Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in search space spec.|
|[__Grid Search__](#GridSearch)|Grid Search performs an exhaustive searching through a manually specified subset of the hyperparameter space defined in the searchspace file. Note that the only acceptable types of search space are choice, quniform, randint. |
-|[__Hyperband__](#Hyperband)|Hyperband tries to use the limited resource to explore as many configurations as possible, and finds out the promising ones to get the final result. The basic idea is generating many configurations and to run them for the small number of trial budget to find out promising one, then further training those promising ones to select several more promising one.[Reference Paper](https://arxiv.org/pdf/1603.06560.pdf)|
-|[__Network Morphism__](#NetworkMorphism)|Network Morphism provides functions to automatically search for architecture of deep learning models. Every child network inherits the knowledge from its parent network and morphs into diverse types of networks, including changes of depth, width, and skip-connection. Next, it estimates the value of a child network using the historic architecture and metric pairs. Then it selects the most promising one to train. [Reference Paper](https://arxiv.org/abs/1806.10282)|
+|[__Hyperband__](#Hyperband)|Hyperband tries to use limited resources to explore as many configurations as possible and returns the most promising ones as a final result. The basic idea is to generate many configurations and run them for a small number of trials. The half least-promising configurations are thrown out, the remaining are further trained along with a selection of new configurations. The size of these populations is sensitive to resource constraints (e.g. allotted search time). [Reference Paper](https://arxiv.org/pdf/1603.06560.pdf)|
+|[__Network Morphism__](#NetworkMorphism)|Network Morphism provides functions to automatically search for deep learning architectures. It generates child networks that inherit the knowledge from their parent network which it is a morph from. This includes changes in depth, width, and skip-connections. Next, it estimates the value of a child network using historic architecture and metric pairs. Then it selects the most promising one to train. [Reference Paper](https://arxiv.org/abs/1806.10282)|
|[__Metis Tuner__](#MetisTuner)|Metis offers the following benefits when it comes to tuning parameters: While most tools only predict the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guesswork. While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter. [Reference Paper](https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/)|
-|[__BOHB__](#BOHB)|BOHB is a follow-up work of Hyperband. It targets the weakness of Hyperband that new configurations are generated randomly without leveraging finished trials. For the name BOHB, HB means Hyperband, BO means Bayesian Optimization. BOHB leverages finished trials by building multiple TPE models, a proportion of new configurations are generated through these models. [Reference Paper](https://arxiv.org/abs/1807.01774)|
+|[__BOHB__](#BOHB)|BOHB is a follow-up work to Hyperband. It targets the weakness of Hyperband that new configurations are generated randomly without leveraging finished trials. For the name BOHB, HB means Hyperband, BO means Bayesian Optimization. BOHB leverages finished trials by building multiple TPE models, a proportion of new configurations are generated through these models. [Reference Paper](https://arxiv.org/abs/1807.01774)|
|[__GP Tuner__](#GPTuner)|Gaussian Process Tuner is a sequential model-based optimization (SMBO) approach with Gaussian Process as the surrogate. [Reference Paper](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf), [Github Repo](https://github.com/fmfn/BayesianOptimization)|
|[__PPO Tuner__](#PPOTuner)|PPO Tuner is a Reinforcement Learning tuner based on PPO algorithm. [Reference Paper](https://arxiv.org/abs/1707.06347)|
## Usage of Built-in Tuners
-Use built-in tuner provided by NNI SDK requires to declare the **builtinTunerName** and **classArgs** in `config.yml` file. In this part, we will introduce the detailed usage about the suggested scenarios, classArg requirements and example for each tuner.
+Using a built-in tuner provided by the NNI SDK requires one to declare the **builtinTunerName** and **classArgs** in the `config.yml` file. In this part, we will introduce each tuner along with information about usage and suggested scenarios, classArg requirements, and an example configuration.
-Note: Please follow the format when you write your `config.yml` file. Some built-in tuner need to be installed by `nnictl package`, like SMAC.
+Note: Please follow the format when you write your `config.yml` file. Some built-in tuners need to be installed using `nnictl package`, like SMAC.
@@ -36,16 +36,16 @@ Note: Please follow the format when you write your `config.yml` file. Some built
**Suggested scenario**
-TPE, as a black-box optimization, can be used in various scenarios and shows good performance in general. Especially when you have limited computation resource and can only try a small number of trials. From a large amount of experiments, we could found that TPE is far better than Random Search. [Detailed Description](./HyperoptTuner.md)
+TPE, as a black-box optimization, can be used in various scenarios and shows good performance in general. Especially when you have limited computation resources and can only try a small number of trials. From a large amount of experiments, we found that TPE is far better than Random Search. [Detailed Description](./HyperoptTuner.md)
-**Requirement of classArgs**
+**classArgs Requirements:**
-* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
+* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
-Note: We have optimized the parallelism of TPE for large-scale trial-concurrency. For the principle of optimization or turn-on optimization, please refer to [TPE document](./HyperoptTuner.md).
+Note: We have optimized the parallelism of TPE for large-scale trial concurrency. For the principle of optimization or turn-on optimization, please refer to [TPE document](./HyperoptTuner.md).
-**Usage example:**
+**Example Configuration:**
```yaml
# config.yml
@@ -65,13 +65,13 @@ tuner:
**Suggested scenario**
-Random search is suggested when each trial does not take too long (e.g., each trial can be completed very soon, or early stopped by assessor quickly), and you have enough computation resource. Or you want to uniformly explore the search space. Random Search could be considered as baseline of search algorithm. [Detailed Description](./HyperoptTuner.md)
+Random search is suggested when each trial does not take very long (e.g., each trial can be completed very quickly, or early stopped by the assessor), and you have enough computational resources. It's also useful if you want to uniformly explore the search space. Random Search can be considered a baseline search algorithm. [Detailed Description](./HyperoptTuner.md)
-**Requirement of classArgs**
+**classArgs Requirements:**
-* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
+* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
-**Usage example**
+**Example Configuration:**
```yaml
# config.yml
@@ -89,14 +89,14 @@ tuner:
**Suggested scenario**
-Anneal is suggested when each trial does not take too long, and you have enough computation resource(almost same with Random Search). Or the variables in search space could be sample from some prior distribution. [Detailed Description](./HyperoptTuner.md)
+Anneal is suggested when each trial does not take very long and you have enough computation resources (very similar to Random Search). It's also useful when the variables in the search space can be sample from some prior distribution. [Detailed Description](./HyperoptTuner.md)
-**Requirement of classArgs**
+**classArgs Requirements:**
-* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
+* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
-**Usage example**
+**Example Configuration:**
```yaml
# config.yml
@@ -116,15 +116,15 @@ tuner:
**Suggested scenario**
-Its requirement of computation resource is relatively high. Specifically, it requires large initial population to avoid falling into local optimum. If your trial is short or leverages assessor, this tuner is a good choice. And, it is more suggested when your trial code supports weight transfer, that is, the trial could inherit the converged weights from its parent(s). This can greatly speed up the training progress. [Detailed Description](./EvolutionTuner.md)
+Its computational resource requirements are relatively high. Specifically, it requires a large initial population to avoid falling into a local optimum. If your trial is short or leverages assessor, this tuner is a good choice. It is also suggested when your trial code supports weight transfer; that is, the trial could inherit the converged weights from its parent(s). This can greatly speed up the training process. [Detailed Description](./EvolutionTuner.md)
-**Requirement of classArgs**
+**classArgs Requirements:**
-* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
+* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
-* **population_size** (*int value (should > 0), optional, default = 20*) - the initial size of the population (trial num) in evolution tuner. Suggests `population_size` be much larger than `concurrency`, so users can get the most out of the algorithm (and at least `concurrency`, or the tuner will fail on their first generation of parameters).
+* **population_size** (*int value (should > 0), optional, default = 20*) - the initial size of the population (trial num) in the evolution tuner. It's suggested that `population_size` be much larger than `concurrency` so users can get the most out of the algorithm (and at least `concurrency`, or the tuner will fail on its first generation of parameters).
-**Usage example**
+**Example Configuration:**
```yaml
# config.yml
@@ -143,11 +143,11 @@ tuner:
> Built-in Tuner Name: **SMAC**
-**Please note that SMAC doesn't support running on Windows currently. The specific reason can be referred to this [GitHub issue](https://github.com/automl/SMAC3/issues/483).**
+**Please note that SMAC doesn't support running on Windows currently. For the specific reason, please refer to this [GitHub issue](https://github.com/automl/SMAC3/issues/483).**
**Installation**
-SMAC need to be installed by following command before first use. As a reminder, `swig` is required for SMAC: for Ubuntu `swig` can be installed with `apt`.
+SMAC needs to be installed by following command before the first usage. As a reminder, `swig` is required for SMAC: for Ubuntu `swig` can be installed with `apt`.
```bash
nnictl package install --name=SMAC
@@ -155,14 +155,14 @@ nnictl package install --name=SMAC
**Suggested scenario**
-Similar to TPE, SMAC is also a black-box tuner which can be tried in various scenarios, and is suggested when computation resource is limited. It is optimized for discrete hyperparameters, thus, suggested when most of your hyperparameters are discrete. [Detailed Description](./SmacTuner.md)
+Similar to TPE, SMAC is also a black-box tuner that can be tried in various scenarios and is suggested when computational resources are limited. It is optimized for discrete hyperparameters, thus, it's suggested when most of your hyperparameters are discrete. [Detailed Description](./SmacTuner.md)
-**Requirement of classArgs**
+**classArgs Requirements:**
-* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
-* **config_dedup** (*True or False, optional, default = False*) - If True, the tuner will not generate a configuration that has been already generated. If False, a configuration may be generated twice, but it is rare for relatively large search space.
+* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
+* **config_dedup** (*True or False, optional, default = False*) - If True, the tuner will not generate a configuration that has been already generated. If False, a configuration may be generated twice, but it is rare for a relatively large search space.
-**Usage example**
+**Example Configuration:**
```yaml
# config.yml
@@ -182,9 +182,9 @@ tuner:
**Suggested scenario**
-If the configurations you want to try have been decided, you can list them in searchspace file (using `choice`) and run them using batch tuner. [Detailed Description](./BatchTuner.md)
+If the configurations you want to try have been decided beforehand, you can list them in search space file (using `choice`) and run them using batch tuner. [Detailed Description](./BatchTuner.md)
-**Usage example**
+**Example Configuration:**
```yaml
# config.yml
@@ -194,7 +194,7 @@ tuner:
-Note that the search space that BatchTuner supported like:
+Note that the search space for BatchTuner should look like:
```json
{
@@ -211,7 +211,7 @@ Note that the search space that BatchTuner supported like:
}
```
-The search space file including the high-level key `combine_params`. The type of params in search space must be `choice` and the `values` including all the combined-params value.
+The search space file should include the high-level key `combine_params`. The type of params in the search space must be `choice` and the `values` must include all the combined params values.
@@ -221,11 +221,11 @@ The search space file including the high-level key `combine_params`. The type of
**Suggested scenario**
-Note that the only acceptable types of search space are `choice`, `quniform`, `randint`.
+Note that the only acceptable types within the search space are `choice`, `quniform`, and `randint`.
-It is suggested when search space is small, it is feasible to exhaustively sweeping the whole search space. [Detailed Description](./GridsearchTuner.md)
+This is suggested when the search space is small. It's suggested when it is feasible to exhaustively sweep the whole search space. [Detailed Description](./GridsearchTuner.md)
-**Usage example**
+**Example Configuration:**
```yaml
# config.yml
@@ -243,15 +243,15 @@ tuner:
**Suggested scenario**
-It is suggested when you have limited computation resource but have relatively large search space. It performs well in the scenario that intermediate result (e.g., accuracy) can reflect good or bad of final result (e.g., accuracy) to some extent. [Detailed Description](./HyperbandAdvisor.md)
+This is suggested when you have limited computational resources but have a relatively large search space. It performs well in scenarios where intermediate results can indicate good or bad final results to some extent. For example, when models that are more accurate early on in training are also more accurate later on. [Detailed Description](./HyperbandAdvisor.md)
-**Requirement of classArgs**
+**classArgs Requirements:**
-* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
-* **R** (*int, optional, default = 60*) - the maximum budget given to a trial (could be the number of mini-batches or epochs) can be allocated to a trial. Each trial should use TRIAL_BUDGET to control how long it runs.
-* **eta** (*int, optional, default = 3*) - `(eta-1)/eta` is the proportion of discarded trials
+* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
+* **R** (*int, optional, default = 60*) - the maximum budget given to a trial (could be the number of mini-batches or epochs). Each trial should use TRIAL_BUDGET to control how long they run.
+* **eta** (*int, optional, default = 3*) - `(eta-1)/eta` is the proportion of discarded trials.
-**Usage example**
+**Example Configuration:**
```yaml
# config.yml
@@ -277,17 +277,17 @@ NetworkMorphism requires [PyTorch](https://pytorch.org/get-started/locally) and
**Suggested scenario**
-It is suggested that you want to apply deep learning methods to your task (your own dataset) but you have no idea of how to choose or design a network. You modify the [example](https://github.com/Microsoft/nni/tree/master/examples/trials/network_morphism/cifar10/cifar10_keras.py) to fit your own dataset and your own data augmentation method. Also you can change the batch size, learning rate or optimizer. It is feasible for different tasks to find a good network architecture. Now this tuner only supports the computer vision domain. [Detailed Description](./NetworkmorphismTuner.md)
+This is suggested when you want to apply deep learning methods to your task but you have no idea how to choose or design a network. You may modify this [example](https://github.com/Microsoft/nni/tree/master/examples/trials/network_morphism/cifar10/cifar10_keras.py) to fit your own dataset and your own data augmentation method. Also you can change the batch size, learning rate, or optimizer. Currently, this tuner only supports the computer vision domain. [Detailed Description](./NetworkmorphismTuner.md)
-**Requirement of classArgs**
+**classArgs Requirements:**
-* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
-* **task** (*('cv'), optional, default = 'cv'*) - The domain of experiment, for now, this tuner only supports the computer vision(cv) domain.
+* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
+* **task** (*('cv'), optional, default = 'cv'*) - The domain of the experiment. For now, this tuner only supports the computer vision (CV) domain.
* **input_width** (*int, optional, default = 32*) - input image width
* **input_channel** (*int, optional, default = 3*) - input image channel
* **n_output_node** (*int, optional, default = 10*) - number of classes
-**Usage example**
+**Example Configuration:**
```yaml
# config.yml
@@ -309,17 +309,17 @@ tuner:
> Built-in Tuner Name: **MetisTuner**
-Note that the only acceptable types of search space are `quniform`, `uniform` and `randint` and numerical `choice`. Only numerical values are supported since the values will be used to evaluate the 'distance' between different points.
+Note that the only acceptable types of search space types are `quniform`, `uniform`, `randint`, and numerical `choice`. Only numerical values are supported since the values will be used to evaluate the 'distance' between different points.
**Suggested scenario**
-Similar to TPE and SMAC, Metis is a black-box tuner. If your system takes a long time to finish each trial, Metis is more favorable than other approaches such as random search. Furthermore, Metis provides guidance on the subsequent trial. Here is an [example](https://github.com/Microsoft/nni/tree/master/examples/trials/auto-gbdt/search_space_metis.json) about the use of Metis. User only need to send the final result like `accuracy` to tuner, by calling the NNI SDK. [Detailed Description](./MetisTuner.md)
+Similar to TPE and SMAC, Metis is a black-box tuner. If your system takes a long time to finish each trial, Metis is more favorable than other approaches such as random search. Furthermore, Metis provides guidance on subsequent trials. Here is an [example](https://github.com/Microsoft/nni/tree/master/examples/trials/auto-gbdt/search_space_metis.json) on the use of Metis. Users only need to send the final result, such as `accuracy`, to the tuner by calling the NNI SDK. [Detailed Description](./MetisTuner.md)
-**Requirement of classArgs**
+**classArgs Requirements:**
-* **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
+* **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
-**Usage example**
+**Example Configuration:**
```yaml
# config.yml
@@ -339,7 +339,7 @@ tuner:
**Installation**
-BOHB advisor requires [ConfigSpace](https://github.com/automl/ConfigSpace) package, ConfigSpace need to be installed by following command before first use.
+BOHB advisor requires [ConfigSpace](https://github.com/automl/ConfigSpace) package. ConfigSpace can be installed using the following command.
```bash
nnictl package install --name=BOHB
@@ -347,24 +347,24 @@ nnictl package install --name=BOHB
**Suggested scenario**
-Similar to Hyperband, it is suggested when you have limited computation resource but have relatively large search space. It performs well in the scenario that intermediate result (e.g., accuracy) can reflect good or bad of final result (e.g., accuracy) to some extent. In this case, it may converges to a better configuration due to Bayesian optimization usage. [Detailed Description](./BohbAdvisor.md)
+Similar to Hyperband, BOHB is suggested when you have limited computational resources but have a relatively large search space. It performs well in scenarios where intermediate results can indicate good or bad final results to some extent. In this case, it may converge to a better configuration than Hyperband due to its usage of Bayesian optimization. [Detailed Description](./BohbAdvisor.md)
-**Requirement of classArgs**
+**classArgs Requirements:**
-* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will target to maximize metrics. If 'minimize', tuner will target to minimize metrics.
-* **min_budget** (*int, optional, default = 1*) - The smallest budget assign to a trial job, (budget could be the number of mini-batches or epochs). Needs to be positive.
-* **max_budget** (*int, optional, default = 3*) - The largest budget assign to a trial job, (budget could be the number of mini-batches or epochs). Needs to be larger than min_budget.
+* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will try to maximize metrics. If 'minimize', tuner will try to minimize metrics.
+* **min_budget** (*int, optional, default = 1*) - The smallest budget to assign to a trial job, (budget can be the number of mini-batches or epochs). Needs to be positive.
+* **max_budget** (*int, optional, default = 3*) - The largest budget to assign to a trial job, (budget can be the number of mini-batches or epochs). Needs to be larger than min_budget.
* **eta** (*int, optional, default = 3*) - In each iteration, a complete run of sequential halving is executed. In it, after evaluating each configuration on the same subset size, only a fraction of 1/eta of them 'advances' to the next round. Must be greater or equal to 2.
-* **min_points_in_model**(*int, optional, default = None*): number of observations to start building a KDE. Default 'None' means dim+1, when the number of completed trial in this budget is equal or larger than `max{dim+1, min_points_in_model}`, BOHB will start to build a KDE model of this budget, then use KDE model to guide the configuration selection. Need to be positive.(dim means the number of hyperparameters in search space)
-* **top_n_percent**(*int, optional, default = 15*): percentage (between 1 and 99, default 15) of the observations that are considered good. Good points and bad points are used for building KDE models. For example, if you have 100 observed trials and top_n_percent is 15, then top 15 point will used for building good point models "l(x)", the remaining 85 point will used for building bad point models "g(x)".
-* **num_samples**(*int, optional, default = 64*): number of samples to optimize EI (default 64). In this case, we will sample "num_samples"(default = 64) points, and compare the result of l(x)/g(x), then return one with the maximum l(x)/g(x) value as the next configuration if the optimize_mode is maximize. Otherwise, we return the smallest one.
+* **min_points_in_model**(*int, optional, default = None*): number of observations to start building a KDE. Default 'None' means dim+1; when the number of completed trials in this budget is equal to or larger than `max{dim+1, min_points_in_model}`, BOHB will start to build a KDE model of this budget then use said KDE model to guide configuration selection. Needs to be positive. (dim means the number of hyperparameters in search space)
+* **top_n_percent**(*int, optional, default = 15*): percentage (between 1 and 99) of the observations which are considered good. Good points and bad points are used for building KDE models. For example, if you have 100 observed trials and top_n_percent is 15, then the top 15% of points will be used for building the good points models "l(x)". The remaining 85% of points will be used for building the bad point models "g(x)".
+* **num_samples**(*int, optional, default = 64*): number of samples to optimize EI (default 64). In this case, we will sample "num_samples" points and compare the result of l(x)/g(x). Then we will return the one with the maximum l(x)/g(x) value as the next configuration if the optimize_mode is `maximize`. Otherwise, we return the smallest one.
* **random_fraction**(*float, optional, default = 0.33*): fraction of purely random configurations that are sampled from the prior without the model.
-* **bandwidth_factor**(*float, optional, default = 3.0*): to encourage diversity, the points proposed to optimize EI, are sampled from a 'widened' KDE where the bandwidth is multiplied by this factor. Suggest to use default value if you are not familiar with KDE.
-* **min_bandwidth**(*float, optional, default = 0.001*): to keep diversity, even when all (good) samples have the same value for one of the parameters, a minimum bandwidth (default: 1e-3) is used instead of zero. Suggest to use default value if you are not familiar with KDE.
+* **bandwidth_factor**(*float, optional, default = 3.0*): to encourage diversity, the points proposed to optimize EI are sampled from a 'widened' KDE where the bandwidth is multiplied by this factor. We suggest using the default value if you are not familiar with KDE.
+* **min_bandwidth**(*float, optional, default = 0.001*): to keep diversity, even when all (good) samples have the same value for one of the parameters, a minimum bandwidth (default: 1e-3) is used instead of zero. We suggest using the default value if you are not familiar with KDE.
-*Please note that currently float type only support decimal representation, you have to use 0.333 instead of 1/3 and 0.001 instead of 1e-3.*
+*Please note that the float type currently only supports decimal representations. You have to use 0.333 instead of 1/3 and 0.001 instead of 1e-3.*
-**Usage example**
+**Example Configuration:**
```yaml
advisor:
@@ -382,25 +382,25 @@ advisor:
> Built-in Tuner Name: **GPTuner**
-Note that the only acceptable types of search space are `randint`, `uniform`, `quniform`, `loguniform`, `qloguniform`, and numerical `choice`. Only numerical values are supported since the values will be used to evaluate the 'distance' between different points.
+Note that the only acceptable types within the search space are `randint`, `uniform`, `quniform`, `loguniform`, `qloguniform`, and numerical `choice`. Only numerical values are supported since the values will be used to evaluate the 'distance' between different points.
**Suggested scenario**
-As a strategy in Sequential Model-based Global Optimization(SMBO) algorithm, GP Tuner uses a proxy optimization problem (finding the maximum of the acquisition function) that, albeit still a hard problem, is cheaper (in the computational sense) and common tools can be employed. Therefore GP Tuner is most adequate for situations where the function to be optimized is a very expensive endeavor. GP can be used when the computation resource is limited. While GP Tuner has a computational cost that grows at *O(N^3)* due to the requirement of inverting the Gram matrix, so it's not suitable when lots of trials are needed. [Detailed Description](./GPTuner.md)
+As a strategy in a Sequential Model-based Global Optimization (SMBO) algorithm, GP Tuner uses a proxy optimization problem (finding the maximum of the acquisition function) that, albeit still a hard problem, is cheaper (in the computational sense) to solve and common tools can be employed to solve it. Therefore, GP Tuner is most adequate for situations where the function to be optimized is very expensive to evaluate. GP can be used when computational resources are limited. However, GP Tuner has a computational cost that grows at *O(N^3)* due to the requirement of inverting the Gram matrix, so it's not suitable when lots of trials are needed. [Detailed Description](./GPTuner.md)
-**Requirement of classArgs**
+**classArgs Requirements:**
-* **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
-* **utility** (*'ei', 'ucb' or 'poi', optional, default = 'ei'*) - The kind of utility function(acquisition function). 'ei', 'ucb' and 'poi' corresponds to 'Expected Improvement', 'Upper Confidence Bound' and 'Probability of Improvement' respectively.
-* **kappa** (*float, optional, default = 5*) - Used by utility function 'ucb'. The bigger `kappa` is, the more the tuner will be exploratory.
-* **xi** (*float, optional, default = 0*) - Used by utility function 'ei' and 'poi'. The bigger `xi` is, the more the tuner will be exploratory.
-* **nu** (*float, optional, default = 2.5*) - Used to specify Matern kernel. The smaller nu, the less smooth the approximated function is.
-* **alpha** (*float, optional, default = 1e-6*) - Used to specify Gaussian Process Regressor. Larger values correspond to increased noise level in the observations.
-* **cold_start_num** (*int, optional, default = 10*) - Number of random exploration to perform before Gaussian Process. Random exploration can help by diversifying the exploration space.
-* **selection_num_warm_up** (*int, optional, default = 1e5*) - Number of random points to evaluate for getting the point which maximizes the acquisition function.
+* **optimize_mode** (*'maximize' or 'minimize', optional, default = 'maximize'*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
+* **utility** (*'ei', 'ucb' or 'poi', optional, default = 'ei'*) - The utility function (acquisition function). 'ei', 'ucb', and 'poi' correspond to 'Expected Improvement', 'Upper Confidence Bound', and 'Probability of Improvement', respectively.
+* **kappa** (*float, optional, default = 5*) - Used by the 'ucb' utility function. The bigger `kappa` is, the more exploratory the tuner will be.
+* **xi** (*float, optional, default = 0*) - Used by the 'ei' and 'poi' utility functions. The bigger `xi` is, the more exploratory the tuner will be.
+* **nu** (*float, optional, default = 2.5*) - Used to specify the Matern kernel. The smaller nu, the less smooth the approximated function is.
+* **alpha** (*float, optional, default = 1e-6*) - Used to specify the Gaussian Process Regressor. Larger values correspond to an increased noise level in the observations.
+* **cold_start_num** (*int, optional, default = 10*) - Number of random explorations to perform before the Gaussian Process. Random exploration can help by diversifying the exploration space.
+* **selection_num_warm_up** (*int, optional, default = 1e5*) - Number of random points to evaluate when getting the point which maximizes the acquisition function.
* **selection_num_starting_points** (*int, optional, default = 250*) - Number of times to run L-BFGS-B from a random starting point after the warmup.
-**Usage example**
+**Example Configuration:**
```yaml
# config.yml
@@ -424,27 +424,27 @@ tuner:
> Built-in Tuner Name: **PPOTuner**
-Note that the only acceptable type of search space is `mutable_layer`. `optional_input_size` can only be 0, 1, or [0, 1].
+Note that the only acceptable types within the search space is `mutable_layer`. `optional_input_size` can only be 0, 1, or [0, 1].
**Suggested scenario**
-PPOTuner is a Reinforcement Learning tuner based on PPO algorithm. When you are using NNI NAS interface in your trial code to do neural architecture search, PPOTuner can be used. In general, Reinforcement Learning algorithm need more computing resource, though PPO algorithm is more efficient than others relatively. So it's recommended to use this tuner when there are large amount of computing resource. You could try it on very simple task, such as the [mnist-nas](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas) example. [See details](./PPOTuner.md)
+PPOTuner is a Reinforcement Learning tuner based on the PPO algorithm. PPOTuner can be used when using the NNI NAS interface to do neural architecture search. In general, the Reinforcement Learning algorithm needs more computing resources, though the PPO algorithm is relatively more efficient than others. It's recommended to use this tuner when you have a large amount of computional resources available. You could try it on a very simple task, such as the [mnist-nas](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas) example. [See details](./PPOTuner.md)
-**Requirement of classArgs**
+**classArgs Requirements:**
-* **optimize_mode** (*'maximize' or 'minimize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
+* **optimize_mode** (*'maximize' or 'minimize'*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
* **trials_per_update** (*int, optional, default = 20*) - The number of trials to be used for one update. It must be divisible by minibatch_size. `trials_per_update` is recommended to be an exact multiple of `trialConcurrency` for better concurrency of trials.
* **epochs_per_update** (*int, optional, default = 4*) - The number of epochs for one update.
-* **minibatch_size** (*int, optional, default = 4*) - Mini-batch size (i.e., number of trials for a mini-batch) for the update. Note that, trials_per_update must be divisible by minibatch_size.
+* **minibatch_size** (*int, optional, default = 4*) - Mini-batch size (i.e., number of trials for a mini-batch) for the update. Note that trials_per_update must be divisible by minibatch_size.
* **ent_coef** (*float, optional, default = 0.0*) - Policy entropy coefficient in the optimization objective.
-* **lr** (*float, optional, default = 3e-4*) - Learning rate of the model (lstm network), constant.
+* **lr** (*float, optional, default = 3e-4*) - Learning rate of the model (lstm network); constant.
* **vf_coef** (*float, optional, default = 0.5*) - Value function loss coefficient in the optimization objective.
* **max_grad_norm** (*float, optional, default = 0.5*) - Gradient norm clipping coefficient.
* **gamma** (*float, optional, default = 0.99*) - Discounting factor.
* **lam** (*float, optional, default = 0.95*) - Advantage estimation discounting factor (lambda in the paper).
* **cliprange** (*float, optional, default = 0.2*) - Cliprange in the PPO algorithm, constant.
-**Usage example**
+**Example Configuration:**
```yaml
# config.yml
From 5bdd14bb7412d4fa4aa8bc01266156617d8c3736 Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Fri, 13 Mar 2020 21:05:28 -0700
Subject: [PATCH 14/34] Fixing, as best I canm weird wording in TPE, Random
Search, Anneal Tuners
Fixed many incomplete sentences and bad wording.
The first sentence in the Parallel TPE optimization section doesn't make sense to me, but I left it in case it's supposed to be that way. That sentence was copied from the blog post.
---
docs/en_US/Tuner/HyperoptTuner.md | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/docs/en_US/Tuner/HyperoptTuner.md b/docs/en_US/Tuner/HyperoptTuner.md
index 71506c2f5d..7fd366e68b 100644
--- a/docs/en_US/Tuner/HyperoptTuner.md
+++ b/docs/en_US/Tuner/HyperoptTuner.md
@@ -3,11 +3,11 @@ TPE, Random Search, Anneal Tuners on NNI
## TPE
-The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model. The TPE approach models P(x|y) and P(y) where x represents hyperparameters and y the associated evaluate matric. P(x|y) is modeled by transforming the generative process of hyperparameters, replacing the distributions of the configuration prior with non-parametric densities. This optimization approach is described in detail in [Algorithms for Hyper-Parameter Optimization](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf).
+The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization (SMBO) approach. SMBO methods sequentially construct models to approximate the performance of hyperparameters based on historical measurements, and then subsequently choose new hyperparameters to test based on this model. The TPE approach models P(x|y) and P(y) where x represents hyperparameters and y the associated evaluation matric. P(x|y) is modeled by transforming the generative process of hyperparameters, replacing the distributions of the configuration prior with non-parametric densities. This optimization approach is described in detail in [Algorithms for Hyper-Parameter Optimization](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf).
### Parallel TPE optimization
-TPE approaches were actually run asynchronously in order to make use of multiple compute nodes and to avoid wasting time waiting for trial evaluations to complete. The original intention of the algorithm design is to optimize sequential. When we use TPE with a large concurrency, its performance will be bad. We have optimized this phenomenon using Constant Liar algorithm. For the principle of optimization, please refer to our [research blog](../CommunitySharings/ParallelizingTpeSearch.md).
+TPE approaches were actually run asynchronously in order to make use of multiple compute nodes and to avoid wasting time waiting for trial evaluations to complete. The original algorithm design was optimized for sequential computation. If we were to use TPE with much concurrency, its performance will be bad. We have optimized this case using the Constant Liar algorithm. For these principles of optimization, please refer to our [research blog](../CommunitySharings/ParallelizingTpeSearch.md).
### Usage
@@ -23,15 +23,15 @@ tuner:
```
**Requirement of classArg**
-* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will target to maximize metrics. If 'minimize', tuner will target to minimize metrics.
-* **parallel_optimize** (*bool, optional, default = False*) - If True, TPE will use Constant Liar algorithm to optimize parallel hyperparameter tuning. Otherwise, TPE will not discriminate between sequential or parallel situations.
-* **constant_liar_type** (*min or max or mean, optional, default = min*) - The type of constant liar to use, will logically be determined on the basis of the values taken by y at X. Corresponding to three values, min{Y}, max{Y}, and mean{Y}.
+* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will try to maximize metrics. If 'minimize', tuner will try to minimize metrics.
+* **parallel_optimize** (*bool, optional, default = False*) - If True, TPE will use the Constant Liar algorithm to optimize parallel hyperparameter tuning. Otherwise, TPE will not discriminate between sequential or parallel situations.
+* **constant_liar_type** (*min or max or mean, optional, default = min*) - The type of constant liar to use, will logically be determined on the basis of the values taken by y at X. There are three possible values, min{Y}, max{Y}, and mean{Y}.
## Random Search
-In [Random Search for Hyper-Parameter Optimization](http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf) show that Random Search might be surprisingly simple and effective. We suggests that we could use Random Search as baseline when we have no knowledge about the prior distribution of hyper-parameters.
+In [Random Search for Hyper-Parameter Optimization](http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf) we show that Random Search might be surprisingly effective despite its simplicity. We suggest using Random Search as a baseline when no knowledge about the prior distribution of hyper-parameters is available.
## Anneal
-This simple annealing algorithm begins by sampling from the prior, but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on random search that leverages smoothness in the response surface. The annealing rate is not adaptive.
+This simple annealing algorithm begins by sampling from the prior but tends over time to sample from points closer and closer to the best ones observed. This algorithm is a simple variation on random search that leverages smoothness in the response surface. The annealing rate is not adaptive.
From 0b48e4a675f71f186dd1bd20c14299b7cfff1ecf Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Fri, 13 Mar 2020 21:43:13 -0700
Subject: [PATCH 15/34] Improve wording in naive evolution description.
---
docs/en_US/Tuner/EvolutionTuner.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/en_US/Tuner/EvolutionTuner.md b/docs/en_US/Tuner/EvolutionTuner.md
index e257178813..7785439122 100644
--- a/docs/en_US/Tuner/EvolutionTuner.md
+++ b/docs/en_US/Tuner/EvolutionTuner.md
@@ -3,4 +3,4 @@ Naive Evolution Tuners on NNI
## Naive Evolution
-Naive Evolution comes from [Large-Scale Evolution of Image Classifiers](https://arxiv.org/pdf/1703.01041.pdf). It randomly initializes a population based on search space. For each generation, it chooses better ones and do some mutation (e.g., change a hyperparameter, add/remove one layer) on them to get the next generation. Naive Evolution requires many trials to works, but it's very simple and easily to expand new features.
\ No newline at end of file
+Naive Evolution comes from [Large-Scale Evolution of Image Classifiers](https://arxiv.org/pdf/1703.01041.pdf). It randomly initializes a population based on the search space. For each generation, it chooses better ones and does some mutation (e.g., changes a hyperparameter, adds/removes one layer, etc.) on them to get the next generation. Naive Evolution requires many trials to works but it's very simple and it's easily expanded with new features.
From b7617dd33a93266091c8b7d8092b58fdd1d17d3a Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Fri, 13 Mar 2020 22:11:46 -0700
Subject: [PATCH 16/34] Minor changes to SMAC page wording.
---
docs/en_US/Tuner/SmacTuner.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/docs/en_US/Tuner/SmacTuner.md b/docs/en_US/Tuner/SmacTuner.md
index 301e02da65..d65ab07212 100644
--- a/docs/en_US/Tuner/SmacTuner.md
+++ b/docs/en_US/Tuner/SmacTuner.md
@@ -3,6 +3,6 @@ SMAC Tuner on NNI
## SMAC
-[SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO, in order to handle categorical parameters. The SMAC supported by nni is a wrapper on [the SMAC3 github repo](https://github.com/automl/SMAC3).
+[SMAC](https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf) is based on Sequential Model-Based Optimization (SMBO). It adapts the most prominent previously used model class (Gaussian stochastic process models) and introduces the model class of random forests to SMBO in order to handle categorical parameters. The SMAC supported by nni is a wrapper on [the SMAC3 github repo](https://github.com/automl/SMAC3).
-Note that SMAC on nni only supports a subset of the types in [search space spec](../Tutorial/SearchSpaceSpec.md), including `choice`, `randint`, `uniform`, `loguniform`, `quniform`.
\ No newline at end of file
+Note that SMAC on nni only supports a subset of the types in the [search space spec](../Tutorial/SearchSpaceSpec.md): `choice`, `randint`, `uniform`, `loguniform`, and `quniform`.
\ No newline at end of file
From 32255cc7537351c88c975bf70b624788b568996a Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Fri, 13 Mar 2020 22:17:25 -0700
Subject: [PATCH 17/34] Improve some wording, but mostly formatting, on Metis
Tuner page.
---
docs/en_US/Tuner/MetisTuner.md | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/docs/en_US/Tuner/MetisTuner.md b/docs/en_US/Tuner/MetisTuner.md
index d653796e01..e11dc7520f 100644
--- a/docs/en_US/Tuner/MetisTuner.md
+++ b/docs/en_US/Tuner/MetisTuner.md
@@ -3,18 +3,18 @@ Metis Tuner on NNI
## Metis Tuner
-[Metis](https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/) offers the following benefits when it comes to tuning parameters: While most tools only predicts the optimal configuration, Metis gives you two outputs: (a) current prediction of optimal configuration, and (b) suggestion for the next trial. No more guess work!
+[Metis](https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/) offers several benefits over other tuning algorithms. While most tools only predict the optimal configuration, Metis gives you two outputs, a prediction for the optimal configuration and a suggestion for the next trial. No more guess work!
-While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to re-sample a particular hyper-parameter.
+While most tools assume training datasets do not have noisy data, Metis actually tells you if you need to resample a particular hyper-parameter.
-While most tools have problems of being exploitation-heavy, Metis' search strategy balances exploration, exploitation, and (optional) re-sampling.
+While most tools have problems of being exploitation-heavy, Metis' search strategy balances exploration, exploitation, and (optional) resampling.
-Metis belongs to the class of sequential model-based optimization (SMBO), and it is based on the Bayesian Optimization framework. To model the parameter-vs-performance space, Metis uses both Gaussian Process and GMM. Since each trial can impose a high time cost, Metis heavily trades inference computations with naive trial. At each iteration, Metis does two tasks:
+Metis belongs to the class of sequential model-based optimization (SMBO) algorithms and it is based on the Bayesian Optimization framework. To model the parameter-vs-performance space, Metis uses both a Gaussian Process and GMM. Since each trial can impose a high time cost, Metis heavily trades inference computations with naive trial. At each iteration, Metis does two tasks:
-It finds the global optimal point in the Gaussian Process space. This point represents the optimal configuration.
+* It finds the global optimal point in the Gaussian Process space. This point represents the optimal configuration.
-It identifies the next hyper-parameter candidate. This is achieved by inferring the potential information gain of exploration, exploitation, and re-sampling.
+* It identifies the next hyper-parameter candidate. This is achieved by inferring the potential information gain of exploration, exploitation, and resampling.
-Note that the only acceptable types of search space are `quniform`, `uniform` and `randint` and numerical `choice`.
+Note that the only acceptable types within the search space are `quniform`, `uniform`, `randint`, and numerical `choice`.
-More details can be found in our paper: https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/
\ No newline at end of file
+More details can be found in our [paper](https://www.microsoft.com/en-us/research/publication/metis-robustly-tuning-tail-latencies-cloud-systems/).
From d53978b8555c6a37a5eac011dd07c457fa61024c Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Fri, 13 Mar 2020 22:18:20 -0700
Subject: [PATCH 18/34] Minor grammatical fix in Matis page.
---
docs/en_US/Tuner/MetisTuner.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/en_US/Tuner/MetisTuner.md b/docs/en_US/Tuner/MetisTuner.md
index e11dc7520f..9709708512 100644
--- a/docs/en_US/Tuner/MetisTuner.md
+++ b/docs/en_US/Tuner/MetisTuner.md
@@ -9,7 +9,7 @@ While most tools assume training datasets do not have noisy data, Metis actually
While most tools have problems of being exploitation-heavy, Metis' search strategy balances exploration, exploitation, and (optional) resampling.
-Metis belongs to the class of sequential model-based optimization (SMBO) algorithms and it is based on the Bayesian Optimization framework. To model the parameter-vs-performance space, Metis uses both a Gaussian Process and GMM. Since each trial can impose a high time cost, Metis heavily trades inference computations with naive trial. At each iteration, Metis does two tasks:
+Metis belongs to the class of sequential model-based optimization (SMBO) algorithms and it is based on the Bayesian Optimization framework. To model the parameter-vs-performance space, Metis uses both a Gaussian Process and GMM. Since each trial can impose a high time cost, Metis heavily trades inference computations with naive trials. At each iteration, Metis does two tasks:
* It finds the global optimal point in the Gaussian Process space. This point represents the optimal configuration.
From 080e048c6ef306c521f0fd0e082150d2a2ffd09e Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Fri, 13 Mar 2020 22:34:35 -0700
Subject: [PATCH 19/34] Minor edits to Batch tuner description.
---
docs/en_US/Tuner/BatchTuner.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/docs/en_US/Tuner/BatchTuner.md b/docs/en_US/Tuner/BatchTuner.md
index 35fcf9bab3..060ab883b1 100644
--- a/docs/en_US/Tuner/BatchTuner.md
+++ b/docs/en_US/Tuner/BatchTuner.md
@@ -3,6 +3,6 @@ Batch Tuner on NNI
## Batch Tuner
-Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type choice in [search space spec](../Tutorial/SearchSpaceSpec.md).
+Batch tuner allows users to simply provide several configurations (i.e., choices of hyper-parameters) for their trial code. After finishing all the configurations, the experiment is done. Batch tuner only supports the type `choice` in the [search space spec](../Tutorial/SearchSpaceSpec.md).
-Suggested scenario: If the configurations you want to try have been decided, you can list them in SearchSpace file (using choice) and run them using batch tuner.
+Suggested scenario: If the configurations you want to try have been decided, you can list them in the SearchSpace file (using `choice`) and run them using the batch tuner.
From 94948eea598858f448b6aa2858436447ef19371d Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Fri, 13 Mar 2020 22:38:46 -0700
Subject: [PATCH 20/34] Minor fixes to gridsearch description.
---
docs/en_US/Tuner/GridsearchTuner.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/docs/en_US/Tuner/GridsearchTuner.md b/docs/en_US/Tuner/GridsearchTuner.md
index b8b1a9230e..6f7c007a44 100644
--- a/docs/en_US/Tuner/GridsearchTuner.md
+++ b/docs/en_US/Tuner/GridsearchTuner.md
@@ -3,6 +3,6 @@ Grid Search on NNI
## Grid Search
-Grid Search performs an exhaustive searching through a manually specified subset of the hyperparameter space defined in the searchspace file.
+Grid Search performs an exhaustive search through a manually specified subset of the hyperparameter space defined in the searchspace file.
-Note that the only acceptable types of search space are `choice`, `quniform`, `randint`.
\ No newline at end of file
+Note that the only acceptable types within the search space are `choice`, `quniform`, and `randint`.
From f7e3d0b5dc8f025403215349161644dbd539b19e Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Sat, 14 Mar 2020 16:28:08 -0700
Subject: [PATCH 21/34] Better wording for GPTuner description.
---
docs/en_US/Tuner/GPTuner.md | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/docs/en_US/Tuner/GPTuner.md b/docs/en_US/Tuner/GPTuner.md
index 0159c3d5c9..42a1af7531 100644
--- a/docs/en_US/Tuner/GPTuner.md
+++ b/docs/en_US/Tuner/GPTuner.md
@@ -3,10 +3,10 @@ GP Tuner on NNI
## GP Tuner
-Bayesian optimization works by constructing a posterior distribution of functions (Gaussian Process here) that best describes the function you want to optimize. As the number of observations grows, the posterior distribution improves, and the algorithm becomes more certain of which regions in parameter space are worth exploring and which are not.
+Bayesian optimization works by constructing a posterior distribution of functions (a Gaussian Process) that best describes the function you want to optimize. As the number of observations grows, the posterior distribution improves, and the algorithm becomes more certain of which regions in parameter space are worth exploring and which are not.
-GP Tuner is designed to minimize/maximize the number of steps required to find a combination of parameters that are close to the optimal combination. To do so, this method uses a proxy optimization problem (finding the maximum of the acquisition function) that, albeit still a hard problem, is cheaper (in the computational sense) and common tools can be employed. Therefore Bayesian Optimization is most adequate for situations where sampling the function to be optimized is a very expensive endeavor.
+GP Tuner is designed to minimize/maximize the number of steps required to find a combination of parameters that are close to the optimal combination. To do so, this method uses a proxy optimization problem (finding the maximum of the acquisition function) that, albeit still a hard problem, is cheaper (in the computational sense) to solve, and it's amenable to common tools. Therefore, Bayesian Optimization is suggested for situations where sampling the function to be optimized is very expensive.
-Note that the only acceptable types of search space are `randint`, `uniform`, `quniform`, `loguniform`, `qloguniform`, and numerical `choice`.
+Note that the only acceptable types within the search space are `randint`, `uniform`, `quniform`, `loguniform`, `qloguniform`, and numerical `choice`.
-This optimization approach is described in Section 3 of [Algorithms for Hyper-Parameter Optimization](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf).
+This optimization approach is described in Section 3 of [Algorithms for Hyper-Parameter Optimization](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf).
From 36a2fa42d7ad2d46211f72b032e861e49eef9ab6 Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Sat, 14 Mar 2020 18:43:43 -0700
Subject: [PATCH 22/34] Fix a lot of wording in the Network Morphism
description.
---
docs/en_US/Tuner/NetworkmorphismTuner.md | 54 ++++++++++++------------
1 file changed, 27 insertions(+), 27 deletions(-)
diff --git a/docs/en_US/Tuner/NetworkmorphismTuner.md b/docs/en_US/Tuner/NetworkmorphismTuner.md
index ced08e3f9a..3bc2047fcf 100644
--- a/docs/en_US/Tuner/NetworkmorphismTuner.md
+++ b/docs/en_US/Tuner/NetworkmorphismTuner.md
@@ -2,9 +2,9 @@
## 1. Introduction
-[Autokeras](https://arxiv.org/abs/1806.10282) is a popular automl tools using Network Morphism. The basic idea of Autokeras is to use Bayesian Regression to estimate the metric of the Neural Network Architecture. Each time, it generates several child networks from father networks. Then it uses a naïve Bayesian regression estimate its metric value from history trained results of network and metric value pair. Next, it chooses the the child which has best estimated performance and adds it to the training queue. Inspired by its work and referring to its [code](https://github.com/jhfjhfj1/autokeras), we implement our Network Morphism method in our NNI platform.
+[Autokeras](https://arxiv.org/abs/1806.10282) is a popular autoML tool using Network Morphism. The basic idea of Autokeras is to use Bayesian Regression to estimate the metric of the Neural Network Architecture. Each time, it generates several child networks from father networks. Then it uses a naïve Bayesian regression to estimate its metric value from the history of trained results of network and metric value pairs. Next, it chooses the child which has the best, estimated performance and adds it to the training queue. Inspired by the work of Autokeras and referring to its [code](https://github.com/jhfjhfj1/autokeras), we implemented our Network Morphism method on the NNI platform.
-If you want to know about network morphism trial usage, please check [Readme.md](https://github.com/Microsoft/nni/blob/master/examples/trials/network_morphism/README.md) of the trial to get more detail.
+If you want to know more about network morphism trial usage, please see the [Readme.md](https://github.com/Microsoft/nni/blob/master/examples/trials/network_morphism/README.md).
## 2. Usage
@@ -29,7 +29,7 @@ tuner:
-In the training procedure, it generate a JSON file which represent a Network Graph. Users can call "json\_to\_graph()" function to build a pytorch model or keras model from this JSON file.
+In the training procedure, it generates a JSON file which represents a Network Graph. Users can call the "json\_to\_graph()" function to build a PyTorch or Keras model from this JSON file.
```python
import nni
@@ -54,7 +54,7 @@ net = build_graph_from_json(RCV_CONFIG)
nni.report_final_result(best_acc)
```
-If you want to save and **load the best model**, the following methods are recommended.
+If you want to save and load the **best model**, the following methods are recommended.
```python
# 1. Use NNI API
@@ -102,27 +102,27 @@ loaded_model = torch.load("model-{}.pt".format(model_id))
## 3. File Structure
-The tuner has a lot of different files, functions and classes. Here we will only give most of those files a brief introduction:
+The tuner has a lot of different files, functions, and classes. Here, we will give most of those files only a brief introduction:
-- `networkmorphism_tuner.py` is a tuner which using network morphism techniques.
+- `networkmorphism_tuner.py` is a tuner which uses network morphism techniques.
-- `bayesian.py` is Bayesian method to estimate the metric of unseen model based on the models we have already searched.
-- `graph.py` is the meta graph data structure. Class Graph is representing the neural architecture graph of a model.
+- `bayesian.py` is a Bayesian method to estimate the metric of unseen model based on the models we have already searched.
+- `graph.py` is the meta graph data structure. The class Graph represents the neural architecture graph of a model.
- Graph extracts the neural architecture graph from a model.
- - Each node in the graph is a intermediate tensor between layers.
+ - Each node in the graph is an intermediate tensor between layers.
- Each layer is an edge in the graph.
- Notably, multiple edges may refer to the same layer.
-- `graph_transformer.py` includes some graph transformer to wider, deeper or add a skip-connection into the graph.
+- `graph_transformer.py` includes some graph transformers which widen, deepen, or add skip-connections to the graph.
- `layers.py` includes all the layers we use in our model.
-- `layer_transformer.py` includes some layer transformer to wider, deeper or add a skip-connection into the layer.
-- `nn.py` includes the class to generate network class initially.
+- `layer_transformer.py` includes some layer transformers which widen, deepen, or add skip-connections to the layer.
+- `nn.py` includes the class which generates the initial network.
- `metric.py` some metric classes including Accuracy and MSE.
-- `utils.py` is the example search network architectures in dataset `cifar10` by using Keras.
+- `utils.py` is the example search network architectures for the `cifar10` dataset, using Keras.
## 4. The Network Representation Json Example
-Here is an example of the intermediate representation JSON file we defined, which is passed from the tuner to the trial in the architecture search procedure. Users can call "json\_to\_graph()" function in trial code to build a pytorch model or keras model from this JSON file. The example is as follows.
+Here is an example of the intermediate representation JSON file we defined, which is passed from the tuner to the trial in the architecture search procedure. Users can call the "json\_to\_graph()" function in the trial code to build a PyTorch or Keras model from this JSON file.
```json
{
@@ -215,29 +215,29 @@ Here is an example of the intermediate representation JSON file we defined, whic
}
```
-The definition of each model is a JSON object(also you can consider the model as a [directed acyclic graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph)), where:
+You can consider the model to be a [directed acyclic graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph). The definition of each model is a JSON object where:
-- `input_shape` is a list of integers, which does not include the batch axis.
+- `input_shape` is a list of integers which do not include the batch axis.
- `weighted` means whether the weights and biases in the neural network should be included in the graph.
- `operation_history` is a list saving all the network morphism operations.
-- `layer_id_to_input_node_ids` is a dictionary instance mapping from layer identifiers to their input nodes identifiers.
-- `layer_id_to_output_node_ids` is a dictionary instance mapping from layer identifiers to their output nodes identifiers
-- `adj_list` is a two dimensional list. The adjacency list of the graph. The first dimension is identified by tensor identifiers. In each edge list, the elements are two-element tuples of (tensor identifier, layer identifier).
-- `reverse_adj_list` is a A reverse adjacent list in the same format as adj_list.
+- `layer_id_to_input_node_ids` is a dictionary mapping from layer identifiers to their input nodes identifiers.
+- `layer_id_to_output_node_ids` is a dictionary mapping from layer identifiers to their output nodes identifiers
+- `adj_list` is a two-dimensional list; the adjacency list of the graph. The first dimension is identified by tensor identifiers. In each edge list, the elements are two-element tuples of (tensor identifier, layer identifier).
+- `reverse_adj_list` is a reverse adjacent list in the same format as adj_list.
- `node_list` is a list of integers. The indices of the list are the identifiers.
- `layer_list` is a list of stub layers. The indices of the list are the identifiers.
- - For `StubConv (StubConv1d, StubConv2d, StubConv3d)`, the number follows is its node input id(or id list), node output id, input_channel, filters, kernel_size, stride and padding.
+ - For `StubConv (StubConv1d, StubConv2d, StubConv3d)`, the numbering follows the format: its node input id (or id list), node output id, input_channel, filters, kernel_size, stride, and padding.
- - For `StubDense`, the number follows is its node input id(or id list), node output id, input_units and units.
+ - For `StubDense`, the numbering follows the format: its node input id (or id list), node output id, input_units, and units.
- - For `StubBatchNormalization (StubBatchNormalization1d, StubBatchNormalization2d, StubBatchNormalization3d)`, the number follows is its node input id(or id list), node output id and features numbers.
+ - For `StubBatchNormalization (StubBatchNormalization1d, StubBatchNormalization2d, StubBatchNormalization3d)`, the numbering follows the format: its node input id (or id list), node output id, and features numbers.
- - For `StubDropout(StubDropout1d, StubDropout2d, StubDropout3d)`, the number follows is its node input id(or id list), node output id and dropout rate.
+ - For `StubDropout(StubDropout1d, StubDropout2d, StubDropout3d)`, the numbering follows the format: its node input id (or id list), node output id, and dropout rate.
- - For `StubPooling (StubPooling1d, StubPooling2d, StubPooling3d)`, the number follows is its node input id(or id list), node output id, kernel_size, stride and padding.
+ - For `StubPooling (StubPooling1d, StubPooling2d, StubPooling3d)`, the numbering follows the format: its node input id (or id list), node output id, kernel_size, stride, and padding.
- - For else layers, the number follows is its node input id(or id list) and node output id.
+ - For else layers, the numbering follows the format: its node input id (or id list) and node output id.
## 5. TODO
-Next step, we will change the API from fixed network generator to more network operator generator. Besides, we will use ONNX instead of JSON later as the intermediate representation spec in the future.
+Next step, we will change the API from s fixed network generator to a network generator with more available operators. We will use ONNX instead of JSON later as the intermediate representation spec in the future.
From ce777703c64f49f13210cd63e7ab48d8b203ec63 Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Sat, 14 Mar 2020 20:14:15 -0700
Subject: [PATCH 23/34] Improve wording in hyperbanding description.
---
docs/en_US/Tuner/HyperbandAdvisor.md | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/docs/en_US/Tuner/HyperbandAdvisor.md b/docs/en_US/Tuner/HyperbandAdvisor.md
index b7787af199..14f6ad3614 100644
--- a/docs/en_US/Tuner/HyperbandAdvisor.md
+++ b/docs/en_US/Tuner/HyperbandAdvisor.md
@@ -2,12 +2,12 @@ Hyperband on NNI
===
## 1. Introduction
-[Hyperband][1] is a popular automl algorithm. The basic idea of Hyperband is that it creates several buckets, each bucket has `n` randomly generated hyperparameter configurations, each configuration uses `r` resource (e.g., epoch number, batch number). After the `n` configurations is finished, it chooses top `n/eta` configurations and runs them using increased `r*eta` resource. At last, it chooses the best configuration it has found so far.
+[Hyperband][1] is a popular autoML algorithm. The basic idea of Hyperband is to create several buckets, each having `n` randomly generated hyperparameter configurations, each configuration using `r` resources (e.g., epoch number, batch number). After the `n` configurations are finished, it chooses the top `n/eta` configurations and runs them using increased `r*eta` resources. At last, it chooses the best configuration it has found so far.
-## 2. Implementation with fully parallelism
-First, this is an example of how to write an automl algorithm based on MsgDispatcherBase, rather than Tuner and Assessor. Hyperband is implemented in this way because it integrates the functions of both Tuner and Assessor, thus, we call it advisor.
+## 2. Implementation with full parallelism
+First, this is an example of how to write an autoML algorithm based on MsgDispatcherBase, rather than Tuner and Assessor. Hyperband is implemented in this way because it integrates the functions of both Tuner and Assessor, thus, we call it Advisor.
-Second, this implementation fully leverages Hyperband's internal parallelism. More specifically, the next bucket is not started strictly after the current bucket, instead, it starts when there is available resource.
+Second, this implementation fully leverages Hyperband's internal parallelism. Specifically, the next bucket is not started strictly after the current bucket. Instead, it starts when there are available resources.
## 3. Usage
To use Hyperband, you should add the following spec in your experiment's YAML config file:
@@ -25,8 +25,7 @@ advisor:
optimize_mode: maximize
```
-Note that once you use advisor, it is not allowed to add tuner and assessor spec in the config file any more.
-If you use Hyperband, among the hyperparameters (i.e., key-value pairs) received by a trial, there is one more key called `TRIAL_BUDGET` besides the hyperparameters defined by user. **By using this `TRIAL_BUDGET`, the trial can control how long it runs**.
+Note that once you use Advisor, you are not allowed to add a Tuner and Assessor spec in the config file. If you use Hyperband, among the hyperparameters (i.e., key-value pairs) received by a trial, there will be one more key called `TRIAL_BUDGET` defined by user. **By using this `TRIAL_BUDGET`, the trial can control how long it runs**.
For `report_intermediate_result(metric)` and `report_final_result(metric)` in your trial code, **`metric` should be either a number or a dict which has a key `default` with a number as its value**. This number is the one you want to maximize or minimize, for example, accuracy or loss.
@@ -47,11 +46,11 @@ Here is a concrete example of `R=81` and `eta=3`:
`s` means bucket, `n` means the number of configurations that are generated, the corresponding `r` means how many budgets these configurations run. `i` means round, for example, bucket 4 has 5 rounds, bucket 3 has 4 rounds.
-About how to write trial code, please refer to the instructions under `examples/trials/mnist-hyperband/`.
+For information about writing trial code, please refer to the instructions under `examples/trials/mnist-hyperband/`.
-## 4. To be improved
-The current implementation of Hyperband can be further improved by supporting simple early stop algorithm, because it is possible that not all the configurations in the top `n/eta` perform good. The unpromising configurations can be stopped early.
+## 4. Future improvements
+The current implementation of Hyperband can be further improved by supporting a simple early stop algorithm since it's possible that not all the configurations in the top `n/eta` perform well. Any unpromising configurations should be stopped early.
-In the current implementation, configurations are generated randomly, which follows the design in the [paper][1]. To further improve, configurations could be generated more wisely by leveraging advanced algorithms.
+In the current implementation, configurations are generated randomly which follows the design in the [paper][1]. As an improvement, configurations could be generated more wisely by leveraging advanced algorithms.
[1]: https://arxiv.org/pdf/1603.06560.pdf
From 73dd35e1044d66f45106a0fc728dfb322da58547 Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Sat, 14 Mar 2020 21:14:04 -0700
Subject: [PATCH 24/34] Fix a lot of confusing wording, spelling, and
gramatical errors in BOHB description.
---
docs/en_US/Tuner/BohbAdvisor.md | 52 ++++++++++++++++-----------------
1 file changed, 26 insertions(+), 26 deletions(-)
diff --git a/docs/en_US/Tuner/BohbAdvisor.md b/docs/en_US/Tuner/BohbAdvisor.md
index 19be53e417..0e431ee4bb 100644
--- a/docs/en_US/Tuner/BohbAdvisor.md
+++ b/docs/en_US/Tuner/BohbAdvisor.md
@@ -2,48 +2,48 @@ BOHB Advisor on NNI
===
## 1. Introduction
-BOHB is a robust and efficient hyperparameter tuning algorithm mentioned in [reference paper](https://arxiv.org/abs/1807.01774). BO is the abbreviation of Bayesian optimization and HB is the abbreviation of Hyperband.
+BOHB is a robust and efficient hyperparameter tuning algorithm mentioned in [this reference paper](https://arxiv.org/abs/1807.01774). BO is an abbreviation for "Bayesian Optimization" and HB is an abbreviation for "Hyperband".
-BOHB relies on HB(Hyperband) to determine how many configurations to evaluate with which budget, but it **replaces the random selection of configurations at the beginning of each HB iteration by a model-based search(Byesian Optimization)**. Once the desired number of configurations for the iteration is reached, the standard successive halving procedure is carried out using these configurations. We keep track of the performance of all function evaluations g(x, b) of configurations x on all budgets b to use as a basis for our models in later iterations.
+BOHB relies on HB (Hyperband) to determine how many configurations to evaluate with which budget, but it **replaces the random selection of configurations at the beginning of each HB iteration by a model-based search (Bayesian Optimization)**. Once the desired number of configurations for the iteration is reached, the standard successive halving procedure is carried out using these configurations. We keep track of the performance of all function evaluations g(x, b) of configurations x on all budgets b to use as a basis for our models in later iterations.
-Below we divide introduction of the BOHB process into two parts:
+Below we divide the introduction of the BOHB process into two parts:
### HB (Hyperband)
-We follow Hyperband’s way of choosing the budgets and continue to use SuccessiveHalving, for more details, you can refer to the [Hyperband in NNI](HyperbandAdvisor.md) and [reference paper of Hyperband](https://arxiv.org/abs/1603.06560). This procedure is summarized by the pseudocode below.
+We follow Hyperband’s way of choosing the budgets and continue to use SuccessiveHalving. For more details, you can refer to the [Hyperband in NNI](HyperbandAdvisor.md) and the [reference paper for Hyperband](https://arxiv.org/abs/1603.06560). This procedure is summarized by the pseudocode below.
![](../../img/bohb_1.png)
### BO (Bayesian Optimization)
-The BO part of BOHB closely resembles TPE, with one major difference: we opted for a single multidimensional KDE compared to the hierarchy of one-dimensional KDEs used in TPE in order to better handle interaction effects in the input space.
+The BO part of BOHB closely resembles TPE with one major difference: we opted for a single multidimensional KDE compared to the hierarchy of one-dimensional KDEs used in TPE in order to better handle interaction effects in the input space.
-Tree Parzen Estimator(TPE): uses a KDE(kernel density estimator) to model the densities.
+Tree Parzen Estimator(TPE): uses a KDE (kernel density estimator) to model the densities.
![](../../img/bohb_2.png)
-To fit useful KDEs, we require a minimum number of data points Nmin; this is set to d + 1 for our experiments, where d is the number of hyperparameters. To build a model as early as possible, we do not wait until Nb = |Db|, the number of observations for budget b, is large enough to satisfy q · Nb ≥ Nmin. Instead, after initializing with Nmin + 2 random configurations, we choose the
+To fit useful KDEs, we require a minimum number of data points Nmin; this is set to d + 1 for our experiments, where d is the number of hyperparameters. To build a model as early as possible, we do not wait until Nb = |Db|, where the number of observations for budget b is large enough to satisfy q · Nb ≥ Nmin. Instead, after initializing with Nmin + 2 random configurations, we choose the
![](../../img/bohb_3.png)
best and worst configurations, respectively, to model the two densities.
-Note that we alse sample a constant fraction named **random fraction** of the configurations uniformly at random.
+Note that we also sample a constant fraction named **random fraction** of the configurations uniformly at random.
## 2. Workflow
![](../../img/bohb_6.jpg)
-This image shows the workflow of BOHB. Here we set max_budget = 9, min_budget = 1, eta = 3, others as default. In this case, s_max = 2, so we will continuesly run the {s=2, s=1, s=0, s=2, s=1, s=0, ...} cycle. In each stage of SuccessiveHalving (the orange box), we will pick the top 1/eta configurations and run them again with more budget, repeated SuccessiveHalving stage until the end of this iteration. At the same time, we collect the configurations, budgets and final metrics of each trial, and use this to build a multidimensional KDEmodel with the key "budget".
+This image shows the workflow of BOHB. Here we set max_budget = 9, min_budget = 1, eta = 3, others as default. In this case, s_max = 2, so we will continuously run the {s=2, s=1, s=0, s=2, s=1, s=0, ...} cycle. In each stage of SuccessiveHalving (the orange box), we will pick the top 1/eta configurations and run them again with more budget, repeating the SuccessiveHalving stage until the end of this iteration. At the same time, we collect the configurations, budgets and final metrics of each trial and use these to build a multidimensional KDEmodel with the key "budget".
Multidimensional KDE is used to guide the selection of configurations for the next iteration.
-The way of sampling procedure(use Multidimensional KDE to guide the selection) is summarized by the pseudocode below.
+The sampling procedure (using Multidimensional KDE to guide selection) is summarized by the pseudocode below.
![](../../img/bohb_4.png)
## 3. Usage
-BOHB advisor requires [ConfigSpace](https://github.com/automl/ConfigSpace) package, ConfigSpace need to be installed by following command before first use.
+BOHB advisor requires the [ConfigSpace](https://github.com/automl/ConfigSpace) package. ConfigSpace can be installed using the following command.
```bash
nnictl package install --name=BOHB
@@ -67,26 +67,26 @@ advisor:
min_bandwidth: 0.001
```
-**Requirement of classArg**
+**classArgs Requirements:**
-* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will target to maximize metrics. If 'minimize', tuner will target to minimize metrics.
-* **min_budget** (*int, optional, default = 1*) - The smallest budget assign to a trial job, (budget could be the number of mini-batches or epochs). Needs to be positive.
-* **max_budget** (*int, optional, default = 3*) - The largest budget assign to a trial job, (budget could be the number of mini-batches or epochs). Needs to be larger than min_budget.
+* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will try to maximize metrics. If 'minimize', tuner will try to minimize metrics.
+* **min_budget** (*int, optional, default = 1*) - The smallest budget to assign to a trial job, (budget can be the number of mini-batches or epochs). Needs to be positive.
+* **max_budget** (*int, optional, default = 3*) - The largest budget to assign to a trial job, (budget can be the number of mini-batches or epochs). Needs to be larger than min_budget.
* **eta** (*int, optional, default = 3*) - In each iteration, a complete run of sequential halving is executed. In it, after evaluating each configuration on the same subset size, only a fraction of 1/eta of them 'advances' to the next round. Must be greater or equal to 2.
-* **min_points_in_model**(*int, optional, default = None*): number of observations to start building a KDE. Default 'None' means dim+1, when the number of completed trial in this budget is equal or larger than `max{dim+1, min_points_in_model}`, BOHB will start to build a KDE model of this budget, then use KDE model to guide the configuration selection. Need to be positive.(dim means the number of hyperparameters in search space)
-* **top_n_percent**(*int, optional, default = 15*): percentage (between 1 and 99, default 15) of the observations that are considered good. Good points and bad points are used for building KDE models. For example, if you have 100 observed trials and top_n_percent is 15, then top 15 point will used for building good point models "l(x)", the remaining 85 point will used for building bad point models "g(x)".
-* **num_samples**(*int, optional, default = 64*): number of samples to optimize EI (default 64). In this case, we will sample "num_samples"(default = 64) points, and compare the result of l(x)/g(x), then return one with the maximum l(x)/g(x) value as the next configuration if the optimize_mode is maximize. Otherwise, we return the smallest one.
+* **min_points_in_model**(*int, optional, default = None*): number of observations to start building a KDE. Default 'None' means dim+1; when the number of completed trials in this budget is equal to or larger than `max{dim+1, min_points_in_model}`, BOHB will start to build a KDE model of this budget then use said KDE model to guide configuration selection. Needs to be positive. (dim means the number of hyperparameters in search space)
+* **top_n_percent**(*int, optional, default = 15*): percentage (between 1 and 99) of the observations which are considered good. Good points and bad points are used for building KDE models. For example, if you have 100 observed trials and top_n_percent is 15, then the top 15% of points will be used for building the good points models "l(x)". The remaining 85% of points will be used for building the bad point models "g(x)".
+* **num_samples**(*int, optional, default = 64*): number of samples to optimize EI (default 64). In this case, we will sample "num_samples" points and compare the result of l(x)/g(x). Then we will return the one with the maximum l(x)/g(x) value as the next configuration if the optimize_mode is `maximize`. Otherwise, we return the smallest one.
* **random_fraction**(*float, optional, default = 0.33*): fraction of purely random configurations that are sampled from the prior without the model.
-* **bandwidth_factor**(*float, optional, default = 3.0*): to encourage diversity, the points proposed to optimize EI, are sampled from a 'widened' KDE where the bandwidth is multiplied by this factor. Suggest to use default value if you are not familiar with KDE.
-* **min_bandwidth**(*float, optional, default = 0.001*): to keep diversity, even when all (good) samples have the same value for one of the parameters, a minimum bandwidth (default: 1e-3) is used instead of zero. Suggest to use default value if you are not familiar with KDE.
+* **bandwidth_factor**(*float, optional, default = 3.0*): to encourage diversity, the points proposed to optimize EI are sampled from a 'widened' KDE where the bandwidth is multiplied by this factor. We suggest using the default value if you are not familiar with KDE.
+* **min_bandwidth**(*float, optional, default = 0.001*): to keep diversity, even when all (good) samples have the same value for one of the parameters, a minimum bandwidth (default: 1e-3) is used instead of zero. We suggest using the default value if you are not familiar with KDE.
-*Please note that currently float type only support decimal representation, you have to use 0.333 instead of 1/3 and 0.001 instead of 1e-3.*
+*Please note that the float type currently only supports decimal representations. You have to use 0.333 instead of 1/3 and 0.001 instead of 1e-3.*
## 4. File Structure
-The advisor has a lot of different files, functions and classes. Here we will only give most of those files a brief introduction:
+The advisor has a lot of different files, functions, and classes. Here, we will only give most of those files a brief introduction:
-* `bohb_advisor.py` Defination of BOHB, handle the interaction with the dispatcher, including generating new trial and processing results. Also includes the implementation of HB(Hyperband) part.
-* `config_generator.py` includes the implementation of BO(Bayesian Optimization) part. The function *get_config* can generate new configuration base on BO, the function *new_result* will update model with the new result.
+* `bohb_advisor.py` Definition of BOHB, handles interaction with the dispatcher, including generating new trials and processing results. Also includes the implementation of the HB (Hyperband) part.
+* `config_generator.py` Includes the implementation of the BO (Bayesian Optimization) part. The function *get_config* can generate new configurations based on BO; the function *new_result* will update the model with the new result.
## 5. Experiment
@@ -94,8 +94,8 @@ The advisor has a lot of different files, functions and classes. Here we will on
code implementation: [examples/trials/mnist-advisor](https://github.com/Microsoft/nni/tree/master/examples/trials/)
-We chose BOHB to build CNN on the MNIST dataset. The following is our experimental final results:
+We chose BOHB to build a CNN on the MNIST dataset. The following is our experimental final results:
![](../../img/bohb_5.png)
-More experimental result can be found in the [reference paper](https://arxiv.org/abs/1807.01774), we can see that BOHB makes good use of previous results, and has a balance trade-off in exploration and exploitation.
\ No newline at end of file
+More experimental results can be found in the [reference paper](https://arxiv.org/abs/1807.01774). We can see that BOHB makes good use of previous results and has a balanced trade-off in exploration and exploitation.
\ No newline at end of file
From 60834458bcb7da56e04569a6efe0aef50c720fc7 Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Sat, 14 Mar 2020 21:22:51 -0700
Subject: [PATCH 25/34] Fix a lot of confusing and some redundant wording in
the PPOTunder description.
---
docs/en_US/Tuner/PPOTuner.md | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/docs/en_US/Tuner/PPOTuner.md b/docs/en_US/Tuner/PPOTuner.md
index 6afdf89503..488d824f95 100644
--- a/docs/en_US/Tuner/PPOTuner.md
+++ b/docs/en_US/Tuner/PPOTuner.md
@@ -3,18 +3,18 @@ PPO Tuner on NNI
## PPOTuner
-This is a tuner generally for NNI's NAS interface, it uses [ppo algorithm](https://arxiv.org/abs/1707.06347). The implementation inherits the main logic of the implementation [here](https://github.com/openai/baselines/tree/master/baselines/ppo2) (i.e., ppo2 from OpenAI), and is adapted for NAS scenario.
+This is a tuner geared for NNI's Neural Architecture Search (NAS) interface. It uses the [ppo algorithm](https://arxiv.org/abs/1707.06347). The implementation inherits the main logic of the ppo2 OpenAI implementation [here](https://github.com/openai/baselines/tree/master/baselines/ppo2) and is adapted for the NAS scenario.
-It could successfully tune the [mnist-nas example](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas), and has the following result:
+It can successfully tune the [mnist-nas example](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-nas), and has the following result:
![](../../img/ppo_mnist.png)
-We also tune [the macro search space for image classification in the enas paper](https://github.com/microsoft/nni/tree/master/examples/trials/nas_cifar10) (with limited epoch number for each trial, i.e., 8 epochs), which is implemented using the NAS interface and tuned with PPOTuner. Use Figure 7 in the [enas paper](https://arxiv.org/pdf/1802.03268.pdf) to show how the search space looks like
+We also tune [the macro search space for image classification in the enas paper](https://github.com/microsoft/nni/tree/master/examples/trials/nas_cifar10) (with a limited epoch number for each trial, i.e., 8 epochs), which is implemented using the NAS interface and tuned with PPOTuner. Here is Figure 7 from the [enas paper](https://arxiv.org/pdf/1802.03268.pdf) to show what the search space looks like
![](../../img/enas_search_space.png)
-The figure above is a chosen architecture, we use it to show how the search space looks like. Each square is a layer whose operation can be chosen from 6 operations. Each dash line is a skip connection, each square layer could choose 0 or 1 skip connection getting the output of a previous layer. __Note that__ in original macro search space each square layer could choose any number of skip connections, while in our implementation it is only allowed to choose 0 or 1.
+The figure above was the chosen architecture. Each square is a layer whose operation was chosen from 6 options. Each dashed line is a skip connection, each square layer can choose 0 or 1 skip connections, getting the output from a previous layer. __Note that__, in original macro search space, each square layer could choose any number of skip connections, while in our implementation, it is only allowed to choose 0 or 1.
-The result is shown in figure below (with the experiment config [here](https://github.com/microsoft/nni/blob/master/examples/trials/nas_cifar10/config_ppo.yml)):
+The results are shown in figure below (see the experimenal config [here](https://github.com/microsoft/nni/blob/master/examples/trials/nas_cifar10/config_ppo.yml):
![](../../img/ppo_cifar10.png)
From e80d7c47aea4bba791775257363949930ee7e697 Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Sun, 15 Mar 2020 16:21:32 -0700
Subject: [PATCH 26/34] Improve wording in Builtin Assesors overview.
---
docs/en_US/builtin_assessor.rst | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/docs/en_US/builtin_assessor.rst b/docs/en_US/builtin_assessor.rst
index c7307a9ef9..f0ba1849b8 100644
--- a/docs/en_US/builtin_assessor.rst
+++ b/docs/en_US/builtin_assessor.rst
@@ -1,13 +1,13 @@
Builtin-Assessors
=================
-In order to save our computing resources, NNI supports an early stop policy and creates **Assessor** to finish this job.
+In order to save on computing resources, NNI supports an early stopping policy and has an interface called **Assessor** to do this job.
-Assessor receives the intermediate result from Trial and decides whether the Trial should be killed by specific algorithm. Once the Trial experiment meets the early stop conditions(which means assessor is pessimistic about the final results), the assessor will kill the trial and the status of trial will be `"EARLY_STOPPED"`.
+Assessor receives the intermediate result from a trial and decides whether the trial should be killed using a specific algorithm. Once the trial experiment meets the early stopping conditions (which means Assessor is pessimistic about the final results), the assessor will kill the trial and the status of the trial will be `EARLY_STOPPED`.
-Here is an experimental result of MNIST after using 'Curvefitting' Assessor in 'maximize' mode, you can see that assessor successfully **early stopped** many trials with bad hyperparameters in advance. If you use assessor, we may get better hyperparameters under the same computing resources.
+Here is an experimental result of MNIST after using the 'Curvefitting' Assessor in 'maximize' mode. You can see that Assessor successfully **early stopped** many trials with bad hyperparameters in advance. If you use Assessor, you may get better hyperparameters using the same computing resources.
-*Implemented code directory: config_assessor.yml *
+*Implemented code directory: [config_assessor.yml](https://github.com/Microsoft/nni/blob/master/examples/trials/mnist-tfv1/config_assessor.yml)*
.. image:: ../img/Assessor.png
From cb2dca3324a3b148e1640cc86923b20c054e3aae Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Sun, 15 Mar 2020 17:34:47 -0700
Subject: [PATCH 27/34] Fix some wording in Assessor overview.
---
docs/en_US/Assessor/BuiltinAssessor.md | 28 +++++++++++++-------------
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/docs/en_US/Assessor/BuiltinAssessor.md b/docs/en_US/Assessor/BuiltinAssessor.md
index cacaeb2b53..9b0b5c0a96 100644
--- a/docs/en_US/Assessor/BuiltinAssessor.md
+++ b/docs/en_US/Assessor/BuiltinAssessor.md
@@ -1,21 +1,21 @@
# Built-in Assessors
-NNI provides state-of-the-art tuning algorithm in our builtin-assessors and makes them easy to use. Below is the brief overview of NNI current builtin Assessors:
+NNI provides state-of-the-art tuning algorithms within our builtin-assessors and makes them easy to use. Below is a brief overview of NNI's current builtin Assessors.
-Note: Click the **Assessor's name** to get the Assessor's installation requirements, suggested scenario and using example. The link for a detailed description of the algorithm is at the end of the suggested scenario of each Assessor.
+Note: Click the **Assessor's name** to get each Assessor's installation requirements, suggested usage scenario, and a config example. A link to a detailed description of each algorithm is provided at the end of the suggested scenario for each Assessor.
-Currently we support the following Assessors:
+Currently, we support the following Assessors:
|Assessor|Brief Introduction of Algorithm|
|---|---|
|[__Medianstop__](#MedianStop)|Medianstop is a simple early stopping rule. It stops a pending trial X at step S if the trial’s best objective value by step S is strictly worse than the median value of the running averages of all completed trials’ objectives reported up to step S. [Reference Paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf)|
-|[__Curvefitting__](#Curvefitting)|Curve Fitting Assessor is a LPA(learning, predicting, assessing) algorithm. It stops a pending trial X at step S if the prediction of final epoch's performance worse than the best final performance in the trial history. In this algorithm, we use 12 curves to fit the accuracy curve. [Reference Paper](http://aad.informatik.uni-freiburg.de/papers/15-IJCAI-Extrapolation_of_Learning_Curves.pdf)|
+|[__Curvefitting__](#Curvefitting)|Curve Fitting Assessor is an LPA (learning, predicting, assessing) algorithm. It stops a pending trial X at step S if the prediction of the final epoch's performance worse than the best final performance in the trial history. In this algorithm, we use 12 curves to fit the accuracy curve. [Reference Paper](http://aad.informatik.uni-freiburg.de/papers/15-IJCAI-Extrapolation_of_Learning_Curves.pdf)|
## Usage of Builtin Assessors
-Use builtin assessors provided by NNI SDK requires to declare the **builtinAssessorName** and **classArgs** in `config.yml` file. In this part, we will introduce the detailed usage about the suggested scenarios, classArg requirements, and example for each assessor.
+Usage of builtin assessors provided by the NNI SDK requires one to declare the **builtinAssessorName** and **classArgs** in the `config.yml` file. In this part, we will introduce the details of usage and the suggested scenarios, classArg requirements, and an example for each assessor.
-Note: Please follow the format when you write your `config.yml` file.
+Note: Please follow the provided format when writing your `config.yml` file.
@@ -25,12 +25,12 @@ Note: Please follow the format when you write your `config.yml` file.
**Suggested scenario**
-It is applicable in a wide range of performance curves, thus, can be used in various scenarios to speed up the tuning progress. [Detailed Description](./MedianstopAssessor.md)
+It's applicable in a wide range of performance curves, thus, it can be used in various scenarios to speed up the tuning progress. [Detailed Description](./MedianstopAssessor.md)
**Requirement of classArg**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', assessor will **stop** the trial with smaller expectation. If 'minimize', assessor will **stop** the trial with larger expectation.
-* **start_step** (*int, optional, default = 0*) - A trial is determined to be stopped or not, only after receiving start_step number of reported intermediate results.
+* **start_step** (*int, optional, default = 0*) - A trial is determined to be stopped or not only after receiving start_step number of reported intermediate results.
**Usage example:**
@@ -53,15 +53,15 @@ assessor:
**Suggested scenario**
-It is applicable in a wide range of performance curves, thus, can be used in various scenarios to speed up the tuning progress. Even better, it's able to handle and assess curves with similar performance. [Detailed Description](./CurvefittingAssessor.md)
+It's applicable in a wide range of performance curves, thus, it can be used in various scenarios to speed up the tuning progress. Even better, it's able to handle and assess curves with similar performance. [Detailed Description](./CurvefittingAssessor.md)
-**Requirement of classArg**
+**classArgs requirements:**
-* **epoch_num** (*int, **required***) - The total number of epoch. We need to know the number of epoch to determine which point we need to predict.
+* **epoch_num** (*int, **required***) - The total number of epochs. We need to know the number of epochs to determine which points we need to predict.
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', assessor will **stop** the trial with smaller expectation. If 'minimize', assessor will **stop** the trial with larger expectation.
-* **start_step** (*int, optional, default = 6*) - A trial is determined to be stopped or not, we start to predict only after receiving start_step number of reported intermediate results.
-* **threshold** (*float, optional, default = 0.95*) - The threshold that we decide to early stop the worse performance curve. For example: if threshold = 0.95, optimize_mode = maximize, best performance in the history is 0.9, then we will stop the trial which predict value is lower than 0.95 * 0.9 = 0.855.
-* **gap** (*int, optional, default = 1*) - The gap interval between Assesor judgements. For example: if gap = 2, start_step = 6, then we will assess the result when we get 6, 8, 10, 12...intermedian result.
+* **start_step** (*int, optional, default = 6*) - A trial is determined to be stopped or not only after receiving start_step number of reported intermediate results.
+* **threshold** (*float, optional, default = 0.95*) - The threshold that we use to decide to early stop the worst performance curve. For example: if threshold = 0.95, optimize_mode = maximize, and the best performance in the history is 0.9, then we will stop the trial who's predicted value is lower than 0.95 * 0.9 = 0.855.
+* **gap** (*int, optional, default = 1*) - The gap interval between Assesor judgements. For example: if gap = 2, start_step = 6, then we will assess the result when we get 6, 8, 10, 12...intermediate results.
**Usage example:**
From 0ff8125e7daa7e59ef63c6d8dc75c4da54edcc90 Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Sun, 15 Mar 2020 20:44:34 -0700
Subject: [PATCH 28/34] Improved some wording in Median Assessor's description.
---
docs/en_US/Assessor/MedianstopAssessor.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/docs/en_US/Assessor/MedianstopAssessor.md b/docs/en_US/Assessor/MedianstopAssessor.md
index 138726f67e..246ca82576 100644
--- a/docs/en_US/Assessor/MedianstopAssessor.md
+++ b/docs/en_US/Assessor/MedianstopAssessor.md
@@ -3,4 +3,4 @@ Medianstop Assessor on NNI
## Median Stop
-Medianstop is a simple early stopping rule mentioned in the [paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf). It stops a pending trial X at step S if the trial’s best objective value by step S is strictly worse than the median value of the running averages of all completed trials’ objectives reported up to step S.
\ No newline at end of file
+Medianstop is a simple early stopping rule mentioned in this [paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf). It stops a pending trial X after step S if the trial’s best objective value by step S is strictly worse than the median value of the running averages of all completed trials’ objectives reported up to step S.
\ No newline at end of file
From 2aefa0346775d737cd3f93fd07d952e1f6d84643 Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Sun, 15 Mar 2020 21:01:21 -0700
Subject: [PATCH 29/34] Improve wording and grammar on curve fitting assessor
description.
---
docs/en_US/Assessor/CurvefittingAssessor.md | 26 ++++++++++-----------
1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/docs/en_US/Assessor/CurvefittingAssessor.md b/docs/en_US/Assessor/CurvefittingAssessor.md
index 48b7193d44..6e4c776528 100644
--- a/docs/en_US/Assessor/CurvefittingAssessor.md
+++ b/docs/en_US/Assessor/CurvefittingAssessor.md
@@ -2,9 +2,9 @@ Curve Fitting Assessor on NNI
===
## 1. Introduction
-Curve Fitting Assessor is a LPA(learning, predicting, assessing) algorithm. It stops a pending trial X at step S if the prediction of final epoch's performance is worse than the best final performance in the trial history.
+The Curve Fitting Assessor is an LPA (learning, predicting, assessing) algorithm. It stops a pending trial X at step S if the prediction of the final epoch's performance is worse than the best final performance in the trial history.
-In this algorithm, we use 12 curves to fit the learning curve, the large set of parametric curve models are chosen from [reference paper][1]. The learning curves' shape coincides with our prior knowlwdge about the form of learning curves: They are typically increasing, saturating functions.
+In this algorithm, we use 12 curves to fit the learning curve. The set of parametric curve models are chosen from this [reference paper][1]. The learning curves' shape coincides with our prior knowledge about the form of learning curves: They are typically increasing, saturating functions.
![](../../img/curvefitting_learning_curve.PNG)
@@ -12,21 +12,21 @@ We combine all learning curve models into a single, more powerful model. This co
![](../../img/curvefitting_f_comb.gif)
-where the new combined parameter vector
+with the new combined parameter vector
![](../../img/curvefitting_expression_xi.gif)
-Assuming additive a Gaussian noise and the noise parameter is initialized to its maximum likelihood estimate.
+Assuming additive Gaussian noise and the noise parameter being initialized to its maximum likelihood estimate.
-We determine the maximum probability value of the new combined parameter vector by learing the historical data. Use such value to predict the future trial performance, and stop the inadequate experiments to save computing resource.
+We determine the maximum probability value of the new combined parameter vector by learning the historical data. We use such a value to predict future trial performance and stop the inadequate experiments to save computing resources.
-Concretely,this algorithm goes through three stages of learning, predicting and assessing.
+Concretely, this algorithm goes through three stages of learning, predicting, and assessing.
-* Step1: Learning. We will learning about the trial history of the current trial and determine the \xi at Bayesian angle. First of all, We fit each curve using the least squares method(implement by `fit_theta`) to save our time. After we obtained the parameters, we filter the curve and remove the outliers(implement by `filter_curve`). Finally, we use the MCMC sampling method(implement by `mcmc_sampling`) to adjust the weight of each curve. Up to now, we have dertermined all the parameters in \xi.
+* Step1: Learning. We will learn about the trial history of the current trial and determine the \xi at the Bayesian angle. First of all, We fit each curve using the least-squares method, implemented by `fit_theta`. After we obtained the parameters, we filter the curve and remove the outliers, implemented by `filter_curve`. Finally, we use the MCMC sampling method. implemented by `mcmc_sampling`, to adjust the weight of each curve. Up to now, we have determined all the parameters in \xi.
-* Step2: Predicting. Calculates the expected final result accuracy(implement by `f_comb`) at target position(ie the total number of epoch) by the \xi and the formula of the combined model.
+* Step2: Predicting. It calculates the expected final result accuracy, implemented by `f_comb`, at the target position (i.e., the total number of epochs) by \xi and the formula of the combined model.
-* Step3: If the fitting result doesn't converge, the predicted value will be `None`, in this case we return `AssessResult.Good` to ask for future accuracy information and predict again. Furthermore, we will get a positive value by `predict()` function, if this value is strictly greater than the best final performance in history * `THRESHOLD`(default value = 0.95), return `AssessResult.Good`, otherwise, return `AssessResult.Bad`
+* Step3: If the fitting result doesn't converge, the predicted value will be `None`. In this case, we return `AssessResult.Good` to ask for future accuracy information and predict again. Furthermore, we will get a positive value from the `predict()` function. If this value is strictly greater than the best final performance in history * `THRESHOLD`(default value = 0.95), return `AssessResult.Good`, otherwise, return `AssessResult.Bad`
The figure below is the result of our algorithm on MNIST trial history data, where the green point represents the data obtained by Assessor, the blue point represents the future but unknown data, and the red line is the Curve predicted by the Curve fitting assessor.
@@ -60,11 +60,11 @@ assessor:
```
## 3. File Structure
-The assessor has a lot of different files, functions and classes. Here we will only give most of those files a brief introduction:
+The assessor has a lot of different files, functions, and classes. Here we briefly describe a few of them.
-* `curvefunctions.py` includes all the function expression and default parameters.
-* `modelfactory.py` includes learning and predicting, the corresponding calculation part is also implemented here.
-* `curvefitting_assessor.py` is a assessor which receives the trial history and assess whether to early stop the trial.
+* `curvefunctions.py` includes all the function expressions and default parameters.
+* `modelfactory.py` includes learning and predicting; the corresponding calculation part is also implemented here.
+* `curvefitting_assessor.py` is the assessor which receives the trial history and assess whether to early stop the trial.
## 4. TODO
* Further improve the accuracy of the prediction and test it on more models.
From e4e234c12d876b5a9a38eb541840ca5b2fa3cd67 Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Sun, 15 Mar 2020 21:18:12 -0700
Subject: [PATCH 30/34] Improved some grammar and wording the the WebUI
tutorial page.
---
docs/en_US/Tutorial/WebUI.md | 30 ++++++++++++++----------------
1 file changed, 14 insertions(+), 16 deletions(-)
diff --git a/docs/en_US/Tutorial/WebUI.md b/docs/en_US/Tutorial/WebUI.md
index 438f779182..a4eec711b5 100644
--- a/docs/en_US/Tutorial/WebUI.md
+++ b/docs/en_US/Tutorial/WebUI.md
@@ -4,22 +4,22 @@
Click the tab "Overview".
-* See the experiment trial profile/search space and performanced good trials.
+* On the overview tab, you can see the experiment trial profile/search space and the performance of top trials.
![](../../img/webui-img/over1.png)
![](../../img/webui-img/over2.png)
-* If your experiment have many trials, you can change the refresh interval on here.
+* If your experiment has many trials, you can change the refresh interval here.
![](../../img/webui-img/refresh-interval.png)
-* Support to review and download the experiment result and nni-manager/dispatcher log file from the "View" button.
+* You can review and download the experiment results and nni-manager/dispatcher log files from the "View" button.
![](../../img/webui-img/download.png)
-* You can click the learn about in the error box to track experiment log message if the experiment's status is error.
+* You can click the exclamation point in the error box to see a log message if the experiment's status is an error.
![](../../img/webui-img/log-error.png)
![](../../img/webui-img/review-log.png)
-* You can click "Feedback" to report it if you have any questions.
+* You can click "Feedback" to report any questions.
## View job default metric
@@ -46,25 +46,23 @@ Click the tab "Trial Duration" to see the bar graph.
![](../../img/trial_duration.png)
## View Trial Intermediate Result Graph
-Click the tab "Intermediate Result" to see the lines graph.
+Click the tab "Intermediate Result" to see the line graph.
![](../../img/webui-img/trials_intermeidate.png)
-The trial may have many intermediate results in the training progress. In order to see the trend of some trials more clearly, we set a filtering function for the intermediate result graph.
+The trial may have many intermediate results in the training process. In order to see the trend of some trials more clearly, we set a filtering function for the intermediate result graph.
-You may find that these trials will get better or worse at one of intermediate results. In other words, this is an important and relevant intermediate result. To take a closer look at the point here, you need to enter its corresponding abscissa value at #Intermediate.
-
-And then input the range of metrics on this intermedia result. Like below picture, it chooses No. 4 intermediate result and set the range of metrics to 0.8-1.
+You may find that these trials will get better or worse at an intermediate result. This indicates that it is an important and relevant intermediate result. To take a closer look at the point here, you need to enter its corresponding X-value at #Intermediate. Then input the range of metrics on this intermedia result. In the picture below, we choose the No. 4 intermediate result and set the range of metrics to 0.8-1.
![](../../img/webui-img/filter-intermediate.png)
## View trials status
-Click the tab "Trials Detail" to see the status of the all trials. Specifically:
+Click the tab "Trials Detail" to see the status of all trials. Specifically:
-* Trial detail: trial's id, trial's duration, start time, end time, status, accuracy and search space file.
+* Trial detail: trial's id, trial's duration, start time, end time, status, accuracy, and search space file.
![](../../img/webui-img/detail-local.png)
-* The button named "Add column" can select which column to show in the table. If you run an experiment that final result is dict, you can see other keys in the table. You can choose the column "Intermediate count" to watch the trial's progress.
+* The button named "Add column" can select which column to show on the table. If you run an experiment whose final result is a dict, you can see other keys in the table. You can choose the column "Intermediate count" to watch the trial's progress.
![](../../img/webui-img/addColumn.png)
* If you want to compare some trials, you can select them and then click "Compare" to see the results.
@@ -74,13 +72,13 @@ Click the tab "Trials Detail" to see the status of the all trials. Specifically:
* Support to search for a specific trial by it's id, status, Trial No. and parameters.
![](../../img/webui-img/search-trial.png)
-* You can use the button named "Copy as python" to copy trial's parameters.
+* You can use the button named "Copy as python" to copy the trial's parameters.
![](../../img/webui-img/copyParameter.png)
-* If you run on OpenPAI or Kubeflow platform, you can also see the hdfsLog.
+* If you run on the OpenPAI or Kubeflow platform, you can also see the hdfsLog.
![](../../img/webui-img/detail-pai.png)
-* Intermediate Result Graph: you can see default and other keys in this graph by click the operation column button.
+* Intermediate Result Graph: you can see the default and other keys in this graph by clicking the operation column button.
![](../../img/webui-img/intermediate-btn.png)
![](../../img/webui-img/intermediate.png)
From 17ea82afc13d666f3ea9775399993d83c121e42e Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Wed, 18 Mar 2020 10:59:11 -0700
Subject: [PATCH 31/34] Improved wording and gammar in NAS overview.
Also deletes one redundant copy of a note that was stated twice.
---
docs/en_US/NAS/Overview.md | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md
index 9e7bc687d4..163b5883e5 100644
--- a/docs/en_US/NAS/Overview.md
+++ b/docs/en_US/NAS/Overview.md
@@ -1,27 +1,27 @@
# Neural Architecture Search (NAS) on NNI
-Automatic neural architecture search is taking an increasingly important role on finding better models. Recent research works have proved the feasibility of automatic NAS, and also found some models that could beat manually designed and tuned models. Some of representative works are [NASNet][2], [ENAS][1], [DARTS][3], [Network Morphism][4], and [Evolution][5]. There are new innovations keeping emerging.
+Automatic neural architecture search is taking an increasingly important role in finding better models. Recent research has proved the feasibility of automatic NAS and has lead to models that beat many manually designed and tuned models. Some representative works are [NASNet][2], [ENAS][1], [DARTS][3], [Network Morphism][4], and [Evolution][5]. Further, new innovations keep emerging.
-However, it takes great efforts to implement NAS algorithms, and it is hard to reuse code base of existing algorithms in new one. To facilitate NAS innovations (e.g., design and implement new NAS models, compare different NAS models side-by-side), an easy-to-use and flexible programming interface is crucial.
+However, it takes a great effort to implement NAS algorithms, and it's hard to reuse the code base of existing algorithms for new ones. To facilitate NAS innovations (e.g., the design and implementation of new NAS models, the comparison of different NAS models side-by-side, etc.), an easy-to-use and flexible programming interface is crucial.
-With this motivation, our ambition is to provide a unified architecture in NNI, to accelerate innovations on NAS, and apply state-of-art algorithms on real world problems faster.
+With this motivation, our ambition is to provide a unified architecture in NNI, accelerate innovations on NAS, and apply state-of-the-art algorithms to real-world problems faster.
-With the unified interface, there are two different modes for the architecture search. [One](#supported-one-shot-nas-algorithms) is the so-called one-shot NAS, where a super-net is built based on search space, and using one shot training to generate good-performing child model. [The other](#supported-distributed-nas-algorithms) is the traditional searching approach, where each child model in search space runs as an independent trial, the performance result is sent to tuner and the tuner generates new child model.
+With the unified interface, there are two different modes for architecture search. [One](#supported-one-shot-nas-algorithms) is the so-called one-shot NAS, where a super-net is built based on a search space and one-shot training is used to generate a good-performing child model. [The other](#supported-distributed-nas-algorithms) is the traditional search-based approach, where each child model within the search space runs as an independent trial. The performance result is then sent to Tuner and the tuner generates a new child model.
## Supported One-shot NAS Algorithms
-NNI supports below NAS algorithms now and is adding more. User can reproduce an algorithm or use it on their own dataset. We also encourage users to implement other algorithms with [NNI API](#use-nni-api), to benefit more people.
+NNI currently supports the NAS algorithms listed below and is adding more. Users can reproduce an algorithm or use it on their own dataset. We also encourage users to implement other algorithms with [NNI API](#use-nni-api), to benefit more people.
|Name|Brief Introduction of Algorithm|
|---|---|
| [ENAS](ENAS.md) | [Efficient Neural Architecture Search via Parameter Sharing](https://arxiv.org/abs/1802.03268). In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. It uses parameter sharing between child models to achieve fast speed and excellent performance. |
| [DARTS](DARTS.md) | [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055) introduces a novel algorithm for differentiable network architecture search on bilevel optimization. |
| [P-DARTS](PDARTS.md) | [Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation](https://arxiv.org/abs/1904.12760) is based on DARTS. It introduces an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure. |
-| [SPOS](SPOS.md) | [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) constructs a simplified supernet trained with an uniform path sampling method, and applies an evolutionary algorithm to efficiently search for the best-performing architectures. |
+| [SPOS](SPOS.md) | [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) constructs a simplified supernet trained with a uniform path sampling method and applies an evolutionary algorithm to efficiently search for the best-performing architectures. |
| [CDARTS](CDARTS.md) | [Cyclic Differentiable Architecture Search](https://arxiv.org/abs/****) builds a cyclic feedback mechanism between the search and evaluation networks. It introduces a cyclic differentiable architecture search framework which integrates the two networks into a unified architecture.|
| [ProxylessNAS](Proxylessnas.md) | [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/abs/1812.00332).|
-One-shot algorithms run **standalone without nnictl**. Only PyTorch version has been implemented. Tensorflow 2.x will be supported in future release.
+One-shot algorithms run **standalone without nnictl**. Only the PyTorch version has been implemented. Tensorflow 2.x will be supported in a future release.
Here are some common dependencies to run the examples. PyTorch needs to be above 1.2 to use ``BoolTensor``.
@@ -34,20 +34,20 @@ Here are some common dependencies to run the examples. PyTorch needs to be above
|Name|Brief Introduction of Algorithm|
|---|---|
-| [SPOS's 2nd stage](SPOS.md) | [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) constructs a simplified supernet trained with an uniform path sampling method, and applies an evolutionary algorithm to efficiently search for the best-performing architectures. _Note:: SPOS is a two-stage algorithm, whose first stage is one-shot and second stage is distributed, leveraging result of first stage as a checkpoint._|
+| [SPOS's 2nd stage](SPOS.md) | [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) constructs a simplified supernet trained with a uniform path sampling method, and applies an evolutionary algorithm to efficiently search for the best-performing architectures.|
```eval_rst
-.. Note:: SPOS is a two-stage algorithm, whose first stage is one-shot and second stage is distributed, leveraging result of first stage as a checkpoint.
+.. Note:: SPOS is a two-stage algorithm, whose first stage is one-shot and the second stage is distributed, leveraging the result of the first stage as a checkpoint.
```
-## Use NNI API
+## Using the NNI API
The programming interface of designing and searching a model is often demanded in two scenarios.
-1. When designing a neural network, there may be multiple operation choices on a layer, sub-model, or connection, and it's undetermined which one or combination performs best. So, it needs an easy way to express the candidate layers or sub-models.
-2. When applying NAS on a neural network, it needs an unified way to express the search space of architectures, so that it doesn't need to update trial code for different searching algorithms.
+1. When designing a neural network, there may be multiple operation choices on a layer, sub-model, or connection, and it's undetermined which one or combination performs best. So, it needs an easy way to express the candidate layers or sub-models.
+2. When applying NAS on a neural network, it needs a unified way to express the search space of architectures, so that it doesn't need to update trial code for different search algorithms.
-[Here](./NasGuide.md) is a user guide to get started with using NAS on NNI.
+[Here](./NasGuide.md) is the user guide to get started with using NAS on NNI.
## Reference and Feedback
From 9dc14d59d973177f27c26e3ea290d49eba37edbf Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Wed, 18 Mar 2020 11:55:05 -0700
Subject: [PATCH 32/34] Improved grammar and wording in NAS quickstart.
---
docs/en_US/NAS/QuickStart.md | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/docs/en_US/NAS/QuickStart.md b/docs/en_US/NAS/QuickStart.md
index c63bf20733..d8c77a7950 100644
--- a/docs/en_US/NAS/QuickStart.md
+++ b/docs/en_US/NAS/QuickStart.md
@@ -1,12 +1,12 @@
# NAS Quick Start
-The NAS feature provided by NNI has two key components: APIs for expressing search space, and NAS training approaches. The former is for users to easily specify a class of models (i.e., the candidate models specified by search space) which may perform well. The latter is for users to easily apply state-of-the-art NAS training approaches on their own model.
+The NAS feature provided by NNI has two key components: APIs for expressing the search space and NAS training approaches. The former is for users to easily specify a class of models (i.e., the candidate models specified by the search space) which may perform well. The latter is for users to easily apply state-of-the-art NAS training approaches on their own model.
-Here we use a simple example to demonstrate how to tune your model architecture with NNI NAS APIs step by step. The complete code of this example can be found [here](https://github.com/microsoft/nni/tree/master/examples/nas/naive).
+Here we use a simple example to demonstrate how to tune your model architecture with the NNI NAS APIs step by step. The complete code of this example can be found [here](https://github.com/microsoft/nni/tree/master/examples/nas/naive).
## Write your model with NAS APIs
-Instead of writing a concrete neural model, you can write a class of neural models using two NAS APIs `LayerChoice` and `InputChoice`. For example, you think either of two operations might work in the first convolution layer, then you can get one from them using `LayerChoice` as shown by `self.conv1` in the code. Similarly, the second convolution layer `self.conv2` also chooses one from two operations. To this line, four candidate neural networks are specified. `self.skipconnect` uses `InputChoice` to specify two choices, i.e., adding skip connection or not.
+Instead of writing a concrete neural model, you can write a class of neural models using two of the NAS APIs library functions, `LayerChoice` and `InputChoice`. For example, if you think either of two options might work in the first convolution layer, then you can get one from them using `LayerChoice` as shown by `self.conv1` in the code. Similarly, the second convolution layer `self.conv2` also chooses one from two options. To this line, four candidate neural networks are specified. `self.skipconnect` uses `InputChoice` to specify two choices, adding a skip connection or not.
```python
import torch.nn as nn
@@ -29,11 +29,11 @@ class Net(nn.Module):
self.fc3 = nn.Linear(84, 10)
```
-For detailed description of `LayerChoice` and `InputChoice`, please refer to [the guidance](NasGuide.md)
+For a detailed description of `LayerChoice` and `InputChoice`, please refer to [the NAS guide](NasGuide.md)
## Choose a NAS trainer
-After the model is instantiated, it is time to train the model using NAS trainer. Different trainers use different approaches to search for the best one from a class of neural models that you specified. NNI provides popular NAS training approaches, such as DARTS, ENAS. Here we use `DartsTrainer` as an example below. After the trainer is instantiated, invoke `trainer.train()` to do the search.
+After the model is instantiated, it is time to train the model using a NAS trainer. Different trainers use different approaches to search for the best one from a class of neural models that you specified. NNI provides several popular NAS training approaches such as DARTS and ENAS. Here we use `DartsTrainer` in the example below. After the trainer is instantiated, invoke `trainer.train()` to do the search.
```python
trainer = DartsTrainer(net,
@@ -50,15 +50,15 @@ trainer.train()
## Export the best model
-After the search (i.e., `trainer.train()`) is done, we want to get the best performing model, then simply call `trainer.export("final_arch.json")` to export the found neural architecture to a file.
+After the search (i.e., `trainer.train()`) is done, to get the best performing model we simply call `trainer.export("final_arch.json")` to export the found neural architecture to a file.
## NAS visualization
-We are working on visualization of NAS and will release soon.
+We are working on NAS visualization and will release this feature soon.
## Retrain the exported best model
-It is simple to retrain the found (exported) neural architecture. Step one, instantiate the model you defined above. Step two, invoke `apply_fixed_architecture` on the model. Then the model becomes the found (exported) one, you can use traditional model training to train this model.
+It is simple to retrain the found (exported) neural architecture. Step one, instantiate the model you defined above. Step two, invoke `apply_fixed_architecture` to the model. Then the model becomes the found (exported) one. Afterward, you can use traditional training to train this model.
```python
model = Net()
From 8ce046f57855f766a8cf9f1f7cf960b1aa65db8e Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Wed, 18 Mar 2020 12:50:36 -0700
Subject: [PATCH 33/34] Improve much of the wording and grammar in the NAS
guide.
---
docs/en_US/NAS/NasGuide.md | 42 +++++++++++++++++++-------------------
1 file changed, 21 insertions(+), 21 deletions(-)
diff --git a/docs/en_US/NAS/NasGuide.md b/docs/en_US/NAS/NasGuide.md
index 07147ea41b..5093449fe3 100644
--- a/docs/en_US/NAS/NasGuide.md
+++ b/docs/en_US/NAS/NasGuide.md
@@ -8,11 +8,11 @@
![](../../img/nas_abstract_illustration.png)
-Modern Neural Architecture Search (NAS) methods usually incorporate [three dimensions][1]: search space, search strategy, and performance estimation strategy. Search space often contains a limited neural network architectures to explore, while search strategy samples architectures from search space, gets estimations of their performance, and evolves itself. Ideally, search strategy should find the best architecture in the search space and report it to users. After users obtain such "best architecture", many methods use a "retrain step", which trains the network with the same pipeline as any traditional model.
+Modern Neural Architecture Search (NAS) methods usually incorporate [three dimensions][1]: search space, search strategy, and performance estimation strategy. Search space often contains a limited number of neural network architectures to explore, while the search strategy samples architectures from search space, gets estimations of their performance, and evolves itself. Ideally, the search strategy should find the best architecture in the search space and report it to users. After users obtain the "best architecture", many methods use a "retrain step", which trains the network with the same pipeline as any traditional model.
## Implement a Search Space
-Assuming now we've got a baseline model, what should we do to be empowered with NAS? Take [MNIST on PyTorch](https://github.com/pytorch/examples/blob/master/mnist/main.py) as an example, the code might look like this:
+Assuming we've got a baseline model, what should we do to be empowered with NAS? Take [MNIST on PyTorch](https://github.com/pytorch/examples/blob/master/mnist/main.py) as an example, the code might look like this:
```python
from nni.nas.pytorch import mutables
@@ -37,9 +37,9 @@ class Net(nn.Module):
return output
```
-The example above adds an option of choosing conv5x5 at conv1. The modification is as simple as declaring a `LayerChoice` with original conv3x3 and a new conv5x5 as its parameter. That's it! You don't have to modify the forward function in anyway. You can imagine conv1 as any another module without NAS.
+The example above adds an option of choosing conv5x5 at conv1. The modification is as simple as declaring a `LayerChoice` with the original conv3x3 and a new conv5x5 as its parameter. That's it! You don't have to modify the forward function in any way. You can imagine conv1 as any other module without NAS.
-So how about the possibilities of connections? This can be done by `InputChoice`. To allow for a skipconnection on an MNIST example, we add another layer called conv3. In the following example, a possible connection from conv2 is added to the output of conv3.
+So how about the possibilities of connections? This can be done using `InputChoice`. To allow for a skip connection on the MNIST example, we add another layer called conv3. In the following example, a possible connection from conv2 is added to the output of conv3.
```python
from nni.nas.pytorch import mutables
@@ -67,21 +67,21 @@ class Net(nn.Module):
return output
```
-Input choice can be thought of as a callable module that receives a list of tensors and output the concatenation/sum/mean of some of them (sum by default), or `None` if none is selected. Like layer choices, input choices should be **initialized in `__init__` and called in `forward`**. We will see later that this is to allow search algorithms to identify these choices, and do necessary preparation.
+Input choice can be thought of as a callable module that receives a list of tensors and outputs the concatenation/sum/mean of some of them (sum by default), or `None` if none is selected. Like layer choices, input choices should be **initialized in `__init__` and called in `forward`**. We will see later that this is to allow search algorithms to identify these choices and do necessary preparations.
-`LayerChoice` and `InputChoice` are both **mutables**. Mutable means "changeable". As opposed to traditional deep learning layers/modules which have fixed operation type once defined, models with mutables are essentially a series of possible models.
+`LayerChoice` and `InputChoice` are both **mutables**. Mutable means "changeable". As opposed to traditional deep learning layers/modules which have fixed operation types once defined, models with mutable are essentially a series of possible models.
-Users can specify a **key** for each mutable. By default NNI will assign one for you that is globally unique, but in case users want to share choices (for example, there are two `LayerChoice` with the same candidate operations, and you want them to have the same choice, i.e., if first one chooses the i-th op, the second one also chooses the i-th op), they can give them the same key. The key marks the identity for this choice, and will be used in dumped checkpoint. So if you want to increase the readability of your exported architecture, manually assigning keys to each mutable would be a good idea. For advanced usage on mutables, see [Mutables](./NasReference.md).
+Users can specify a **key** for each mutable. By default, NNI will assign one for you that is globally unique, but in case users want to share choices (for example, there are two `LayerChoice`s with the same candidate operations and you want them to have the same choice, i.e., if first one chooses the i-th op, the second one also chooses the i-th op), they can give them the same key. The key marks the identity for this choice and will be used in the dumped checkpoint. So if you want to increase the readability of your exported architecture, manually assigning keys to each mutable would be a good idea. For advanced usage on mutables, see [Mutables](./NasReference.md).
## Use a Search Algorithm
-Different in how the search space is explored and trials are spawned, there are at least two different ways users can do search. One runs NAS distributedly, which can be as naive as enumerating all the architectures and training each one from scratch, or leveraging more advanced technique, such as [SMASH][8], [ENAS][2], [DARTS][1], [FBNet][3], [ProxylessNAS][4], [SPOS][5], [Single-Path NAS][6], [Understanding One-shot][7] and [GDAS][9]. Since training many different architectures are known to be expensive, another family of methods, called one-shot NAS, builds a supernet containing every candidate in the search space as its subnetwork, and in each step a subnetwork or combination of several subnetworks is trained.
+Aside from using a search space, there are at least two other ways users can do search. One runs NAS distributedly, which can be as naive as enumerating all the architectures and training each one from scratch, or can involve leveraging more advanced technique, such as [SMASH][8], [ENAS][2], [DARTS][1], [FBNet][3], [ProxylessNAS][4], [SPOS][5], [Single-Path NAS][6], [Understanding One-shot][7] and [GDAS][9]. Since training many different architectures is known to be expensive, another family of methods, called one-shot NAS, builds a supernet containing every candidate in the search space as its subnetwork, and in each step, a subnetwork or combination of several subnetworks is trained.
-Currently, several one-shot NAS methods have been supported on NNI. For example, `DartsTrainer` which uses SGD to train architecture weights and model weights iteratively, `ENASTrainer` which [uses a controller to train the model][2]. New and more efficient NAS trainers keep emerging in research community.
+Currently, several one-shot NAS methods are supported on NNI. For example, `DartsTrainer`, which uses SGD to train architecture weights and model weights iteratively, and `ENASTrainer`, which [uses a controller to train the model][2]. New and more efficient NAS trainers keep emerging in research community and some will be implemented in future releases of NNI.
### One-Shot NAS
-Each one-shot NAS implements a trainer, which users can find detailed usages in the description of each algorithm. Here is a simple example, demonstrating how users can use `EnasTrainer`.
+Each one-shot NAS algorithm implements a trainer, for which users can find usage details in the description of each algorithm. Here is a simple example, demonstrating how users can use `EnasTrainer`.
```python
# this is exactly same as traditional model training
@@ -117,15 +117,15 @@ trainer.train() # training
trainer.export(file="model_dir/final_architecture.json") # export the final architecture to file
```
-Users can directly run their training file by `python3 train.py`, without `nnictl`. After training, users could export the best one of the found models through `trainer.export()`.
+Users can directly run their training file through `python3 train.py` without `nnictl`. After training, users can export the best one of the found models through `trainer.export()`.
-Normally, the trainer exposes a few arguments that you can customize, for example, loss function, metrics function, optimizer, and datasets. These should satisfy the needs from most usages, and we do our best to make sure our built-in trainers work on as many models, tasks and datasets as possible. But there is no guarantee. For example, some trainers have assumption that the task has to be a classification task; some trainers might have a different definition of "epoch" (e.g., an ENAS epoch = some child steps + some controller steps); most trainers do not have support for distributed training: they won't wrap your model with `DataParallel` or `DistributedDataParallel` to do that. So after a few tryouts, if you want to actually use the trainers on your very customized applications, you might very soon need to [customize your trainer](#extend-the-ability-of-one-shot-trainers).
+Normally, the trainer exposes a few arguments that you can customize. For example, the loss function, the metrics function, the optimizer, and the datasets. These should satisfy most usages needs and we do our best to make sure our built-in trainers work on as many models, tasks, and datasets as possible. But there is no guarantee. For example, some trainers have the assumption that the task is a classification task; some trainers might have a different definition of "epoch" (e.g., an ENAS epoch = some child steps + some controller steps); most trainers do not have support for distributed training: they won't wrap your model with `DataParallel` or `DistributedDataParallel` to do that. So after a few tryouts, if you want to actually use the trainers on your very customized applications, you might need to [customize your trainer](#extend-the-ability-of-one-shot-trainers).
### Distributed NAS
-Neural architecture search is originally executed by running each child model independently as a trial job. We also support this searching approach, and it naturally fits in NNI hyper-parameter tuning framework, where tuner generates child model for next trial and trials run in training service.
+Neural architecture search was originally executed by running each child model independently as a trial job. We also support this searching approach, and it naturally fits within the NNI hyper-parameter tuning framework, where Tuner generates child models for the next trial and trials run in the training service.
-To use this mode, there is no need to change the search space expressed with NNI NAS API (i.e., `LayerChoice`, `InputChoice`, `MutableScope`). After the model is initialized, apply the function `get_and_apply_next_architecture` on the model. One-shot NAS trainers are not used in this mode. Here is a simple example:
+To use this mode, there is no need to change the search space expressed with the NNI NAS API (i.e., `LayerChoice`, `InputChoice`, `MutableScope`). After the model is initialized, apply the function `get_and_apply_next_architecture` on the model. One-shot NAS trainers are not used in this mode. Here is a simple example:
```python
model = Net()
@@ -137,17 +137,17 @@ acc = test(model) # test the trained model
nni.report_final_result(acc) # report the performance of the chosen architecture
```
-The search space should be generated and sent to tuner. As with NNI NAS API the search space is embedded in user code, users could use "[nnictl ss_gen](../Tutorial/Nnictl.md)" to generate search space file. Then, put the path of the generated search space in the field `searchSpacePath` of `config.yml`. The other fields in `config.yml` can be filled by referring [this tutorial](../Tutorial/QuickStart.md).
+The search space should be generated and sent to Tuner. As with the NNI NAS API, the search space is embedded in the user code. Users can use "[nnictl ss_gen](../Tutorial/Nnictl.md)" to generate the search space file. Then put the path of the generated search space in the field `searchSpacePath` of `config.yml`. The other fields in `config.yml` can be filled by referring [this tutorial](../Tutorial/QuickStart.md).
-You could use [NNI tuners](../Tuner/BuiltinTuner.md) to do the search. Currently, only PPO Tuner supports NAS search space.
+You can use the [NNI tuners](../Tuner/BuiltinTuner.md) to do the search. Currently, only PPO Tuner supports NAS search spaces.
-We support standalone mode for easy debugging, where you could directly run the trial command without launching an NNI experiment. This is for checking whether your trial code can correctly run. The first candidate(s) are chosen for `LayerChoice` and `InputChoice` in this standalone mode.
+We support a standalone mode for easy debugging, where you can directly run the trial command without launching an NNI experiment. This is for checking whether your trial code can correctly run. The first candidate(s) are chosen for `LayerChoice` and `InputChoice` in this standalone mode.
A complete example can be found [here](https://github.com/microsoft/nni/tree/master/examples/nas/classic_nas/config_nas.yml).
### Retrain with Exported Architecture
-After the searching phase, it's time to train the architecture found. Unlike many open-source NAS algorithms who write a whole new model specifically for retraining. We found that searching model and retraining model are usual very similar, and therefore you can construct your final model with the exact model code. For example
+After the search phase, it's time to train the found architecture. Unlike many open-source NAS algorithms who write a whole new model specifically for retraining. We found that the search model and retraining model are usually very similar, and therefore you can construct your final model with the exact same model code. For example
```python
model = Net()
@@ -163,9 +163,9 @@ The JSON is simply a mapping from mutable keys to one-hot or multi-hot represent
}
```
-After applying, the model is then fixed and ready for a final training. The model works as a single model, although it might contain more parameters than expected. This comes with pros and cons. The good side is, you can directly load the checkpoint dumped from supernet during search phase and start retrain from there. However, this is also a model with redundant parameters, which may cause problems when trying to count the number of parameters in model. For deeper reasons and possible workaround, see [Trainers](./NasReference.md).
+After applying, the model is then fixed and ready for final training. The model works as a single model, although it might contain more parameters than expected. This comes with pros and cons. The good side is, you can directly load the checkpoint dumped from supernet during the search phase and start retraining from there. However, this is also a model with redundant parameters and this may cause problems when trying to count the number of parameters in the model. For deeper reasons and possible workarounds, see [Trainers](./NasReference.md).
-Also refer to [DARTS](./DARTS.md) for example code of retraining.
+Also, refer to [DARTS](./DARTS.md) for code exemplifying retraining.
[1]: https://arxiv.org/abs/1808.05377
[2]: https://arxiv.org/abs/1802.03268
@@ -175,4 +175,4 @@ Also refer to [DARTS](./DARTS.md) for example code of retraining.
[6]: https://arxiv.org/abs/1904.02877
[7]: http://proceedings.mlr.press/v80/bender18a
[8]: https://arxiv.org/abs/1708.05344
-[9]: https://arxiv.org/abs/1910.04465
\ No newline at end of file
+[9]: https://arxiv.org/abs/1910.04465
From 6ee99f350a3fac2ce89b854ca8120392fe453902 Mon Sep 17 00:00:00 2001
From: AHartNtkn
Date: Mon, 23 Mar 2020 08:45:40 -0700
Subject: [PATCH 34/34] Replace "Requirement of classArg" with "classArgs
requirements:" in two files
tuner and builtin assessor.One instance in HyperoptTuner.md and BuiltinAssessor.md.
---
docs/en_US/Assessor/BuiltinAssessor.md | 2 +-
docs/en_US/Tuner/HyperoptTuner.md | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/docs/en_US/Assessor/BuiltinAssessor.md b/docs/en_US/Assessor/BuiltinAssessor.md
index 9b0b5c0a96..0e1266331e 100644
--- a/docs/en_US/Assessor/BuiltinAssessor.md
+++ b/docs/en_US/Assessor/BuiltinAssessor.md
@@ -27,7 +27,7 @@ Note: Please follow the provided format when writing your `config.yml` file.
It's applicable in a wide range of performance curves, thus, it can be used in various scenarios to speed up the tuning progress. [Detailed Description](./MedianstopAssessor.md)
-**Requirement of classArg**
+**classArgs requirements:**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', assessor will **stop** the trial with smaller expectation. If 'minimize', assessor will **stop** the trial with larger expectation.
* **start_step** (*int, optional, default = 0*) - A trial is determined to be stopped or not only after receiving start_step number of reported intermediate results.
diff --git a/docs/en_US/Tuner/HyperoptTuner.md b/docs/en_US/Tuner/HyperoptTuner.md
index 7fd366e68b..fdfea6f9e7 100644
--- a/docs/en_US/Tuner/HyperoptTuner.md
+++ b/docs/en_US/Tuner/HyperoptTuner.md
@@ -22,7 +22,7 @@ tuner:
constant_liar_type: min
```
-**Requirement of classArg**
+**classArgs requirements:**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', tuners will try to maximize metrics. If 'minimize', tuner will try to minimize metrics.
* **parallel_optimize** (*bool, optional, default = False*) - If True, TPE will use the Constant Liar algorithm to optimize parallel hyperparameter tuning. Otherwise, TPE will not discriminate between sequential or parallel situations.
* **constant_liar_type** (*min or max or mean, optional, default = min*) - The type of constant liar to use, will logically be determined on the basis of the values taken by y at X. There are three possible values, min{Y}, max{Y}, and mean{Y}.