Merge pull request #80 from microsoft/master

pull code
chicm-ms · Apr 11, 2020 · c51263a · c51263a
2 parents 7095716 + d2c5777
commit c51263a
Show file tree

Hide file tree

Showing 100 changed files with 2,916 additions and 799 deletions.
diff --git a/README_zh_CN.md b/README_zh_CN.md
@@ -19,7 +19,7 @@ NNI 管理自动机器学习 (AutoML) 的 Experiment，**调度运行**由调优
 * 想要更容易**实现或试验新的自动机器学习算法**的研究员或数据科学家，包括：超参调优算法，神经网络搜索算法以及模型压缩算法。
 * 在机器学习平台中**支持自动机器学习**。
 
-### **NNI v1.3 已发布！ &nbsp;[<img width="48" src="docs/img/release_icon.png" />](#nni-released-reminder)**
+### **NNI v1.4 已发布！ &nbsp;[<img width="48" src="docs/img/release_icon.png" />](#nni-released-reminder)**
 
 ## **NNI 功能一览**
 
@@ -100,31 +100,33 @@ NNI 提供命令行工具以及友好的 WebUI 来管理训练的 Experiment。
           <b>启发式搜索</b>
           <ul>
             <li><a href="docs/zh_CN/Tuner/BuiltinTuner.md#Evolution">Naïve Evolution（朴素进化）</a></li>
-            <li><a href="docs/zh_CN/Tuner/BuiltinTuner.md#Anneal">Anneal（退火算法）</a></li>  
+            <li><a href="docs/zh_CN/Tuner/BuiltinTuner.md#Anneal">Anneal（退火算法）</a></li>
             <li><a href="docs/zh_CN/Tuner/BuiltinTuner.md#Hyperband">Hyperband</a></li>
           </ul>
           <b>贝叶斯优化</b>
             <ul>
-              <li><a href="docs/zh_CN/Tuner/BuiltinTuner.md#BOHB">BOHB</a></li>  
+              <li><a href="docs/zh_CN/Tuner/BuiltinTuner.md#BOHB">BOHB</a></li>
               <li><a href="docs/zh_CN/Tuner/BuiltinTuner.md#TPE">TPE</a></li>
-            <li><a href="docs/zh_CN/Tuner/BuiltinTuner.md#SMAC">SMAC</a></li> 
+            <li><a href="docs/zh_CN/Tuner/BuiltinTuner.md#SMAC">SMAC</a></li>
             <li><a href="docs/zh_CN/Tuner/BuiltinTuner.md#MetisTuner">Metis Tuner</a></li>
-            <li><a href="docs/zh_CN/Tuner/BuiltinTuner.md#GPTuner">GP Tuner</a> </li>
-            </ul>  
+            <li><a href="docs/zh_CN/Tuner/BuiltinTuner.md#GPTuner">GP Tuner</a></li>
+            </ul>
           <b>基于强化学习</b>
           <ul>
             <li><a href="docs/zh_CN/Tuner/BuiltinTuner.md#PPOTuner">PPO Tuner</a> </li>
           </ul>
         </ul>
           <a href="docs/zh_CN/NAS/Overview.md">神经网络架构搜索</a>
-          <ul>                        
+          <ul>
             <ul>
-              <li><a href="docs/zh_CN/NAS/Overview.md#enas">ENAS</a></li>
-              <li><a href="docs/zh_CN/NAS/Overview.md#darts">DARTS</a></li>
-              <li><a href="docs/zh_CN/NAS/Overview.md#p-darts">P-DARTS</a></li>
-              <li><a href="docs/zh_CN/NAS/Overview.md#cdarts">CDARTS</a></li>
+              <li><a href="docs/zh_CN/NAS/ENAS.md">ENAS</a></li>
+              <li><a href="docs/zh_CN/NAS/DARTS.md">DARTS</a></li>
+              <li><a href="docs/zh_CN/NAS/PDARTS.md">P-DARTS</a></li>
+              <li><a href="docs/zh_CN/NAS/CDARTS.md">CDARTS</a></li>
+              <li><a href="docs/zh_CN/NAS/SPOS.md">SPOS</a></li>
+              <li><a href="docs/zh_CN/NAS/Proxylessnas.md">ProxylessNAS</a></li>
               <li><a href="docs/zh_CN/Tuner/BuiltinTuner.md#NetworkMorphism">Network Morphism</a> </li>
-            </ul>    
+            </ul>
           </ul>
           <a href="docs/zh_CN/Compressor/Overview.md">模型压缩</a>
           <ul>
@@ -148,7 +150,7 @@ NNI 提供命令行工具以及友好的 WebUI 来管理训练的 Experiment。
           <a href="docs/zh_CN/Assessor/BuiltinAssessor.md">提前终止算法</a>
           <ul>
           <li><a href="docs/zh_CN/Assessor/BuiltinAssessor.md#Medianstop">Median Stop（中位数终止）</a></li>
-          <li><a href="docs/zh_CN/Assessor/BuiltinAssessor.md#Curvefitting">Curve Fitting（曲线拟合）</a></li>   
+          <li><a href="docs/zh_CN/Assessor/BuiltinAssessor.md#Curvefitting">Curve Fitting（曲线拟合）</a></li>
           </ul>
       </td>
       <td>
@@ -219,7 +221,7 @@ Linux 和 macOS 下 NNI 系统需求[参考这里](https://nni.readthedocs.io/zh
 
 * 如果遇到任何权限问题，可添加 `--user` 在用户目录中安装 NNI。
 * 目前，Windows 上的 NNI 支持本机，远程和 OpenPAI 模式。 强烈推荐使用 Anaconda 或 Miniconda 在 Windows 上安装 NNI。
-* 如果遇到如 `Segmentation fault` 等错误参考[常见问题](docs/zh_CN/Tutorial/FAQ.md)。 Windows 上的 FAQ 参考[在 Windows 上使用 NNI](docs/zh_CN/Tutorial/NniOnWindows.md)。
+* 如果遇到如 `Segmentation fault` 等错误参考[常见问题](docs/zh_CN/Tutorial/FAQ.md)。 Windows 上的 FAQ 参考[在 Windows 上使用 NNI](docs/zh_CN/Tutorial/InstallationWin.md#faq)。
 
 ### **验证安装**
 
@@ -228,7 +230,7 @@ Linux 和 macOS 下 NNI 系统需求[参考这里](https://nni.readthedocs.io/zh
 * 通过克隆源代码下载示例。
 
    ```bash
-   git clone -b v1.3 https://github.com/Microsoft/nni.git
+   git clone -b v1.4 https://github.com/Microsoft/nni.git
    ```
 
 * 运行 MNIST 示例。
@@ -283,9 +285,9 @@ You can use these commands to get more information about the experiment
 
 ## **文档**
 
-* 要了解 NNI，请阅读 [NNI 概述](https://nni.readthedocs.io/zh/latest/Overview.html)。 
-* 要熟悉如何使用 NNI，请阅读[文档](https://nni.readthedocs.io/zh/latest/index.html)。 
-* 要安装 NNI，请参阅[安装 NNI](docs/zh_CN/Tutorial/Installation.md)。
+* 要了解 NNI，请阅读 [NNI 概述](https://nni.readthedocs.io/zh/latest/Overview.html)。
+* 要熟悉如何使用 NNI，请阅读[文档](https://nni.readthedocs.io/zh/latest/index.html)。
+* 要安装并使用 NNI，参考[安装指南](https://nni.readthedocs.io/zh/latest/installation.html)。
 
 ## **贡献**
 
@@ -303,22 +305,20 @@ You can use these commands to get more information about the experiment
 * 如果有使用上的问题，可先查看[常见问题解答](https://github.com/microsoft/nni/blob/master/docs/zh_CN/Tutorial/FAQ.md)。如果没能解决问题，可通过 [Gitter](https://gitter.im/Microsoft/nni?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) 联系 NNI 开发团队或在 GitHub 上 [报告问题](https://github.com/microsoft/nni/issues/new/choose)。
 * [自定义 Tuner](docs/zh_CN/Tuner/CustomizeTuner.md)
 * [实现定制的训练平台](docs/zh_CN/TrainingService/HowToImplementTrainingService.md)
-* [在 NNI 上实现新的 NAS Trainer](https://github.com/microsoft/nni/blob/master/docs/zh_CN/NAS/NasInterface.md#implement-a-new-nas-trainer-on-nni)
+* [在 NNI 上实现新的 NAS Trainer](docs/zh_CN/NAS/Advanced.md)
 * [自定义 Advisor](docs/zh_CN/Tuner/CustomizeAdvisor.md)
 
 ## **其它代码库和参考**
 
 经作者许可的一些 NNI 用法示例和相关文档。
 
-* ### **外部代码库**
-
+* ### **外部代码库** ### 
    * 在 NNI 中运行 [ENAS](examples/tuners/enas_nni/README_zh_CN.md)
-   * 在 NNI 中运行 [神经网络架构结构搜索](examples/trials/nas_cifar10/README_zh_CN.md) 
-   * [NNI 中的自动特征工程](examples/feature_engineering/auto-feature-engineering/README_zh_CN.md) 
+   * 在 NNI 中运行 [神经网络架构结构搜索](examples/trials/nas_cifar10/README_zh_CN.md)
+   * [NNI 中的自动特征工程](examples/feature_engineering/auto-feature-engineering/README_zh_CN.md)
    * 使用 NNI 的 [矩阵分解超参调优](https://github.com/microsoft/recommenders/blob/master/notebooks/04_model_select_and_optimize/nni_surprise_svd.ipynb)
    * [scikit-nni](https://github.com/ksachdeva/scikit-nni) 使用 NNI 为 scikit-learn 开发的超参搜索。
-* ### **相关文章**
-
+* ### **相关文章** ### 
    * [超参数优化的对比](docs/zh_CN/CommunitySharings/HpoComparision.md)
    * [神经网络结构搜索的对比](docs/zh_CN/CommunitySharings/NasComparision.md)
    * [并行化顺序算法：TPE](docs/zh_CN/CommunitySharings/ParallelizingTpeSearch.md)

diff --git a/docs/en_US/Compressor/Pruner.md b/docs/en_US/Compressor/Pruner.md
@@ -13,6 +13,8 @@ Index of supported pruning algorithms
 * [Filter Pruners with Activation Rank](#activationrankfilterpruner)
     * [APoZ Rank Pruner](#activationapozrankfilterpruner)
     * [Activation Mean Rank Pruner](#activationmeanrankfilterpruner)
+* [Filter Pruners with Gradient Rank](#gradientrankfilterpruner)
+    * [Taylor FO On Weight Pruner](#taylorfoweightfilterpruner)
 
 ## Level Pruner
 
@@ -281,7 +283,7 @@ pruner.compress()
 - **op_types:** Only Conv1d and Conv2d is supported in L2Filter Pruner
 
 ## ActivationRankFilterPruner
-ActivationRankFilterPruner is a series of pruners which prune the filters with the smallest importance criterion calculated from the output activations of convolution layers to achieve a preset level of network sparsity
+ActivationRankFilterPruner is a series of pruners which prune the filters with the smallest importance criterion calculated from the output activations of convolution layers to achieve a preset level of network sparsity.
 
 ### ActivationAPoZRankFilterPruner
 
@@ -341,4 +343,42 @@ You can view example for more information
 #### User configuration for ActivationMeanRankFilterPruner
 
 - **sparsity:** How much percentage of convolutional filters are to be pruned.
-- **op_types:** Only Conv2d is supported in ActivationMeanRankFilterPruner
+- **op_types:** Only Conv2d is supported in ActivationMeanRankFilterPruner.
+
+
+## GradientRankFilterPruner
+
+GradientRankFilterPruner is a series of pruners which prune the filters with the smallest importance criterion calculated from the gradients of convolution layers to achieve a preset level of network sparsity.
+
+### TaylorFOWeightFilterPruner
+
+We implemented it as a one-shot pruner, it prunes convolutional layers based on the first order taylor expansion on weights. The estimated importance of filters is defined as the paper [Importance Estimation for Neural Network Pruning](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf). Other pruning criteria mentioned in this paper will be supported in future release.
+
+> 
+
+![](../../img/importance_estimation_sum.png)
+
+#### Usage
+
+PyTorch code
+
+```python
+from nni.compression.torch import TaylorFOWeightFilterPruner
+config_list = [{
+    'sparsity': 0.5,
+    'op_types': ['Conv2d']
+}]
+pruner = TaylorFOWeightFilterPruner(model, config_list, optimizer)
+pruner.compress()
+```
+
+You can view example for more information
+
+#### User configuration for GradientWeightSumFilterPruner
+
+- **sparsity:** How much percentage of convolutional filters are to be pruned.
+- **op_types:** Currently only Conv2d is supported in TaylorFOWeightFilterPruner.
+
+
+
+
diff --git a/docs/en_US/NAS/TextNAS.md b/docs/en_US/NAS/TextNAS.md
@@ -0,0 +1,80 @@
+# TextNAS
+
+## Introduction
+
+This is the implementation of the TextNAS algorithm proposed in the paper [TextNAS: A Neural Architecture Search Space tailored for Text Representation](https://arxiv.org/pdf/1912.10729.pdf). TextNAS is a neural architecture search algorithm tailored for text representation, more specifically, TextNAS is based on a novel search space consists of operators widely adopted to solve various NLP tasks, and TextNAS also supports multi-path ensemble within a single network to balance the width and depth of the architecture. 
+
+The search space of TextNAS contains: 
+
+    * 1-D convolutional operator with filter size 1, 3, 5, 7 
+    * recurrent operator (bi-directional GRU) 
+    * self-attention operator
+    * pooling operator (max/average)
+
+Following the ENAS algorithm, TextNAS also utilizes parameter sharing to accelerate the search speed and adopts a reinforcement-learning controller for the architecture sampling and generation. Please refer to the paper for more details of TextNAS.
+
+## Preparation
+
+Prepare the word vectors and SST dataset, and organize them in data directory as shown below:
+
+```
+textnas
+├── data
+│   ├── sst
+│   │   └── trees
+│   │       ├── dev.txt
+│   │       ├── test.txt
+│   │       └── train.txt
+│   └── glove.840B.300d.txt
+├── dataloader.py
+├── model.py
+├── ops.py
+├── README.md
+├── search.py
+└── utils.py
+```
+
+The following link might be helpful for finding and downloading the corresponding dataset:
+
+* [GloVe: Global Vectors for Word Representation](https://nlp.stanford.edu/projects/glove/)
+  * [glove.840B.300d.txt](http://nlp.stanford.edu/data/glove.840B.300d.zip)
+* [Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank](https://nlp.stanford.edu/sentiment/)
+  * [trainDevTestTrees_PTB.zip](https://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip)
+
+## Examples
+
+### Search Space
+
+[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/textnas)
+
+```bash
+# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
+git clone https://github.com/Microsoft/nni.git
+
+# search the best architecture
+cd examples/nas/textnas
+
+# view more options for search
+python3 search.py -h
+```
+
+After each search epoch, 10 sampled architectures will be tested directly. Their performances are expected to be 40% - 42% after 10 epochs.
+
+By default, 20 sampled architectures will be exported into `checkpoints` directory for next step.
+
+### retrain
+
+```bash
+# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
+git clone https://github.com/Microsoft/nni.git
+
+# search the best architecture
+cd examples/nas/textnas
+
+# default to retrain on sst-2
+sh run_retrain.sh
+```
+
+## Reference
+
+TextNAS directly uses EnasTrainer, please refer to [ENAS](./ENAS.md) for the trainer APIs.
diff --git a/docs/en_US/Tuner/BuiltinTuner.md b/docs/en_US/Tuner/BuiltinTuner.md
@@ -21,6 +21,7 @@ Currently, we support the following algorithms:
 |[__BOHB__](#BOHB)|BOHB is a follow-up work to Hyperband. It targets the weakness of Hyperband that new configurations are generated randomly without leveraging finished trials. For the name BOHB, HB means Hyperband, BO means Bayesian Optimization. BOHB leverages finished trials by building multiple TPE models, a proportion of new configurations are generated through these models. [Reference Paper](https://arxiv.org/abs/1807.01774)|
 |[__GP Tuner__](#GPTuner)|Gaussian Process Tuner is a sequential model-based optimization (SMBO) approach with Gaussian Process as the surrogate. [Reference Paper](https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf), [Github Repo](https://github.com/fmfn/BayesianOptimization)|
 |[__PPO Tuner__](#PPOTuner)|PPO Tuner is a Reinforcement Learning tuner based on PPO algorithm. [Reference Paper](https://arxiv.org/abs/1707.06347)|
+|[__PBT Tuner__](#PBTTuner)|PBT Tuner is a simple asynchronous optimization algorithm which effectively utilizes a fixed computational budget to jointly optimize a population of models and their hyperparameters to maximize performance. [Reference Paper](https://arxiv.org/abs/1711.09846v1)|
 
 ## Usage of Built-in Tuners
 
@@ -453,6 +454,34 @@ tuner:
   classArgs:
     optimize_mode: maximize
 ```
+
+<a name="PBTTuner"></a>
+
+![](https://placehold.it/15/1589F0/000000?text=+) `PBT Tuner`
+
+> Built-in Tuner Name: **PBTTuner**
+
+**Suggested scenario**
+
+Population Based Training (PBT) which bridges and extends parallel search methods and sequential optimization methods. It has a wallclock run time that is no greater than that of a single optimization process, does not require sequential runs, and is also able to use fewer computational resources than naive search methods. Therefore, it's effective when you want to save computational resources and time. Besides, PBT returns hyperparameter scheduler instead of configuration. If you don't need to get a specific configuration, but just expect good results, you can choose this tuner. It should be noted that, in our implementation, the operation of checkpoint storage location is involved. A trial is considered as several traning epochs of training, so the loading and saving of checkpoint must be specified in the trial code, which is different with other tuners. Otherwise, if the experiment is not local mode, users should provide a path in a shared storage which can be accessed by all the trials. You could try it on very simple task, such as the [mnist-pbt-tuner-pytorch](https://github.com/microsoft/nni/tree/master/examples/trials/mnist-pbt-tuner-pytorch) example. [See details](./PBTTuner.md)
+
+**classArgs requirements:**
+
+* **optimize_mode** (*'maximize' or 'minimize'*) - If 'maximize', the tuner will target to maximize metrics. If 'minimize', the tuner will target to minimize metrics.
+* **all_checkpoint_dir** (*str, optional, default = None*) - Directory for trials to load and save checkpoint, if not specified, the directory would be "~/nni/checkpoint/<exp-id>". Note that if the experiment is not local mode, users should provide a path in a shared storage which can be accessed by all the trials.
+* **population_size** (*int, optional, default = 10*) - Number of trials for each step. In our implementation, one step is running each trial by specific training epochs set by users.
+* **factors** (*tuple, optional, default = (1.2, 0.8)*) - Factors for perturbation of hyperparameters.
+* **fraction** (*float, optional, default = 0.2*) - Fraction for selecting bottom and top trials.
+
+**Usage example**
+
+```yaml
+# config.yml
+tuner:
+  builtinTunerName: PBTTuner
+  classArgs:
+    optimize_mode: maximize
+```
 ## **Reference and Feedback**
 * To [report a bug](https://github.com/microsoft/nni/issues/new?template=bug-report.md) for this feature in GitHub;
 * To [file a feature or improvement request](https://github.com/microsoft/nni/issues/new?template=enhancement.md) for this feature in GitHub;

diff --git a/docs/en_US/Tuner/PBTTuner.md b/docs/en_US/Tuner/PBTTuner.md
@@ -0,0 +1,12 @@
+PBT Tuner on NNI
+===
+
+## PBTTuner
+
+Population Based Training (PBT) comes from [Population Based Training of Neural Networks](https://arxiv.org/abs/1711.09846v1). It's a simple asynchronous optimization algorithm which effectively utilizes a fixed computational budget to jointly optimize a population of models and their hyperparameters to maximize performance. Importantly, PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training. 
+
+PBTTuner initializes a population with several trials. Users can set a specific number of training epochs. After a certain number of epochs, the parameters and hyperparameters in the trial with bad metrics will be replaced with a better trial (exploit). Then the hyperparameters are perturbed (explore). 
+
+In our implementation, training epochs in the trial code is regarded as a step of PBT, different with other tuners. At the end of each step, PBT tuner will do exploitation and exploration -- replacing some trials with new trials. This is implemented by constantly modifying the values of `load_checkpoint_dir` and `save_checkpoint_dir`. We can directly change `load_checkpoint_dir` to replace parameters and hyperparameters, and `save_checkpoint_dir` to save a checkpoint that will be loaded in the next step. To this end, we need a shared folder which is accessible to all trials.
+
+If the experiment is running in local mode, users could provide an argument `all_checkpoint_dir` which will be the base folder of `load_checkpoint_dir` and `save_checkpoint_dir` (`checkpoint_dir` is set to `all_checkpoint_dir/<population-id>/<step>`). By default, `all_checkpoint_dir` is set to be `~/nni/experiments/<exp-id>/checkpoint`. If the experiment is in non-local mode, then users should provide a path in a shared storage folder which is mounted at `all_checkpoint_dir` on worker machines (but it's not necessarily available on the machine which runs tuner).
diff --git a/docs/en_US/builtin_tuner.rst b/docs/en_US/builtin_tuner.rst
@@ -23,3 +23,4 @@ Tuner receives metrics from `Trial` to evaluate the performance of a specific pa
     Hyperband <Tuner/HyperbandAdvisor>
     BOHB <Tuner/BohbAdvisor>
     PPO Tuner <Tuner/PPOTuner>
+    PBT Tuner <Tuner/PBTTuner>
diff --git a/docs/en_US/nas.rst b/docs/en_US/nas.rst
@@ -26,5 +26,6 @@ For details, please refer to the following tutorials:
     SPOS <NAS/SPOS>
     CDARTS <NAS/CDARTS>
     ProxylessNAS <NAS/Proxylessnas>
+    TextNAS <NAS/TextNAS>
     Customize a NAS Algorithm <NAS/Advanced>
     API Reference <NAS/NasReference>
diff --git a/docs/img/importance_estimation_sum.png b/docs/img/importance_estimation_sum.png