Merge pull request #116 from microsoft/master

pull
chicm-ms · Sep 21, 2020 · 0c2f59b · 0c2f59b
2 parents f974b2c + ec5af41
commit 0c2f59b
Show file tree

Hide file tree

Showing 189 changed files with 9,736 additions and 4,357 deletions.
diff --git a/README_zh_CN.md b/README_zh_CN.md
@@ -10,7 +10,7 @@
 
 **NNI (Neural Network Intelligence)** 是一个轻量但强大的工具包，帮助用户**自动**的进行[特征工程](docs/zh_CN/FeatureEngineering/Overview.md)，[神经网络架构搜索](docs/zh_CN/NAS/Overview.md)，[超参调优](docs/zh_CN/Tuner/BuiltinTuner.md)以及[模型压缩](docs/zh_CN/Compressor/Overview.md)。
 
-NNI 管理自动机器学习 (AutoML) 的 Experiment，**调度运行**由调优算法生成的 Trial 任务来找到最好的神经网络架构和/或超参，支持**各种训练环境**，如[本机](docs/zh_CN/TrainingService/LocalMode.md)，[远程服务器](docs/zh_CN/TrainingService/RemoteMachineMode.md)，[OpenPAI](docs/zh_CN/TrainingService/PaiMode.md)，[Kubeflow](docs/zh_CN/TrainingService/KubeflowMode.md)，[基于 K8S 的 FrameworkController（如，AKS 等)](docs/zh_CN/TrainingService/FrameworkControllerMode.md)， [DLWorkspace (又称 DLTS)](docs/zh_CN/TrainingService/DLTSMode.md) 和其它云服务。
+NNI 管理自动机器学习 (AutoML) 的 Experiment，**调度运行**由调优算法生成的 Trial 任务来找到最好的神经网络架构和/或超参，支持**各种训练环境**，如[本机](docs/zh_CN/TrainingService/LocalMode. md)，[远程服务器](docs/zh_CN/TrainingService/RemoteMachineMode. md)，[OpenPAI](docs/zh_CN/TrainingService/PaiMode. md)，[Kubeflow](docs/zh_CN/TrainingService/KubeflowMode. md)，[基于 K8S 的 FrameworkController（如，AKS 等)](docs/zh_CN/TrainingService/FrameworkControllerMode. md)， [DLWorkspace](docs/zh_CN/TrainingService/DLTSMode. md) (又称 DLTS)</a>, [AML](docs/zh_CN/TrainingService/AMLMode.md) (Azure Machine Learning) 以及其它环境。
 
 ## **使用场景**
 
@@ -19,7 +19,7 @@ NNI 管理自动机器学习 (AutoML) 的 Experiment，**调度运行**由调优
 * 想要更容易**实现或试验新的自动机器学习算法**的研究员或数据科学家，包括：超参调优算法，神经网络搜索算法以及模型压缩算法。
 * 在机器学习平台中**支持自动机器学习**。
 
-### **[NNI v1.6 已发布！](https://github.com/microsoft/nni/releases) &nbsp;[<img width="48" src="docs/img/release_icon.png" />](#nni-released-reminder)**
+### **[NNI v1.8 已发布！](https://github.com/microsoft/nni/releases) &nbsp;[<img width="48" src="docs/img/release_icon.png" />](#nni-released-reminder)**
 
 ## **NNI 功能一览**
 
@@ -164,6 +164,7 @@ NNI 提供命令行工具以及友好的 WebUI 来管理训练的 Experiment。
       <ul>
         <li><a href="docs/zh_CN/TrainingService/LocalMode.md">本机</a></li>
         <li><a href="docs/zh_CN/TrainingService/RemoteMachineMode.md">远程计算机</a></li>
+        <li><a href="docs/zh_CN/TrainingService/AMLMode.md">AML(Azure Machine Learning)</a></li>
         <li><b>基于 Kubernetes 的平台</b></li>
             <ul><li><a href="docs/zh_CN/TrainingService/PaiMode.md">OpenPAI</a></li>
             <li><a href="docs/zh_CN/TrainingService/KubeflowMode.md">Kubeflow</a></li>
@@ -208,7 +209,7 @@ NNI 提供命令行工具以及友好的 WebUI 来管理训练的 Experiment。
 
 ### **安装**
 
-NNI 支持并在 Ubuntu >= 16.04, macOS >= 10.14.1, 和 Windows 10 >= 1809 通过了测试。 在 `python 64-bit >= 3.5` 的环境中，只需要运行 `pip install` 即可完成安装。
+NNI 支持并在 Ubuntu >= 16.04, macOS >= 10.14.1, 和 Windows 10 >= 1809 通过了测试。 在 `python 64-bit >= 3.6` 的环境中，只需要运行 `pip install` 即可完成安装。
 
 Linux 或 macOS
 
@@ -239,7 +240,7 @@ Linux 和 macOS 下 NNI 系统需求[参考这里](https://nni.readthedocs.io/zh
 * 通过克隆源代码下载示例。
 
    ```bash
-   git clone -b v1.6 https://github.com/Microsoft/nni.git
+   git clone -b v1.8 https://github.com/Microsoft/nni.git
    ```
 
 * 运行 MNIST 示例。

diff --git a/deployment/docker/README_zh_CN.md b/deployment/docker/README_zh_CN.md
@@ -1,18 +1,20 @@
-# Dockerfile
+# Dockerfile 
 
 ## 1. 说明
 
 这是 NNI 项目的 Dockerfile 文件。 其中包含了 NNI 以及多个流行的深度学习框架。 在 `Ubuntu 16.04 LTS` 上进行过测试：
 
-    CUDA 9.0, CuDNN 7.0
-    numpy 1.14.3,scipy 1.1.0
-    TensorFlow-gpu 1.10.0
-    Keras 2.1.6
-    PyTorch 0.4.1
-    scikit-learn 0.20.0
+    CUDA 9.0
+    CuDNN 7.0
+    numpy 1.14.3
+    scipy 1.1.0
+    tensorflow-gpu 1.15.0
+    keras 2.1.6
+    torch 1.4.0
+    scikit-learn 0.23.2
     pandas 0.23.4
     lightgbm 2.2.2
-    NNI v0.7
+    nni
 
 
 此 Dockerfile 可作为定制的参考。
@@ -47,4 +49,4 @@
 
 使用下列命令从 docker Hub 中拉取 NNI docker 映像。
 
-    docker pull msranni/nni:latest
+    docker pull msranni/nni:latest
diff --git a/docs/en_US/Compressor/DependencyAware.md b/docs/en_US/Compressor/DependencyAware.md
@@ -0,0 +1,55 @@
+# Dependency-aware Mode for Filter Pruning
+
+Currently, we have several filter pruning algorithm for the convolutional layers: FPGM Pruner, L1Filter Pruner, L2Filter Pruner, Activation APoZ Rank Filter Pruner, Activation Mean Rank Filter Pruner, Taylor FO On Weight Pruner. In these filter pruning algorithms, the pruner will prune each convolutional layer separately. While pruning a convolution layer, the algorithm will quantify the importance of each filter based on some specific rules(such as l1-norm), and prune the less important filters.
+
+As [dependency analysis utils](./CompressionUtils.md) shows, if the output channels of two convolutional layers(conv1, conv2) are added together, then these two conv layers have channel dependency with each other(more details please see [Compression Utils](./CompressionUtils.md)). Take the following figure as an example.
+![](../../img/mask_conflict.jpg)
+
+If we prune the first 50% of output channels(filters) for conv1, and prune the last 50% of output channels for conv2. Although both layers have pruned 50% of the filters, the speedup module still needs to add zeros to align the output channels. In this case, we cannot harvest the speed benefit from the model pruning.
+
+
+ To better gain the speed benefit of the model pruning, we add a dependency-aware mode for the Filter Pruner. In the dependency-aware mode, the pruner prunes the model not only based on the l1 norm of each filter, but also the topology of the whole network architecture.
+
+In the dependency-aware mode(`dependency_aware` is set `True`), the pruner will try to prune the same output channels for the layers that have the channel dependencies with each other, as shown in the following figure.
+
+![](../../img/dependency-aware.jpg)
+
+Take the dependency-aware mode of L1Filter Pruner as an example. Specifically, the pruner will calculate the L1 norm (for example) sum of all the layers in the dependency set for each channel. Obviously, the number of channels that can actually be pruned of this dependency set in the end is determined by the minimum sparsity of layers in this dependency set(denoted by `min_sparsity`). According to the L1 norm sum of each channel, the pruner will prune the same `min_sparsity` channels for all the layers. Next, the pruner will additionally prune `sparsity` - `min_sparsity` channels for each convolutional layer based on its own L1 norm of each channel. For example, suppose the output channels of `conv1` , `conv2` are added together and the configured sparsities of `conv1` and `conv2` are 0.3, 0.2 respectively. In this case, the `dependency-aware pruner` will 
+
+    - First, prune the same 20% of channels for `conv1` and `conv2` according to L1 norm sum of `conv1` and `conv2`. 
+    - Second, the pruner will additionally prune 10% channels for `conv1` according to the L1 norm of each channel of `conv1`.
+
+In addition, for the convolutional layers that have more than one filter group, `dependency-aware pruner` will also try to prune the same number of the channels for each filter group. Overall, this pruner will prune the model according to the L1 norm of each filter and try to meet the topological constrains(channel dependency, etc) to improve the final speed gain after the speedup process. 
+
+In the dependency-aware mode, the pruner will provide a better speed gain from the model pruning.
+
+## Usage
+In this section, we will show how to enable the dependency-aware mode for the filter pruner. Currently, only the one-shot pruners such as FPGM Pruner, L1Filter Pruner, L2Filter Pruner, Activation APoZ Rank Filter Pruner, Activation Mean Rank Filter Pruner, Taylor FO On Weight Pruner, support the dependency-aware mode.
+
+To enable the dependency-aware mode for `L1FilterPruner`:
+```python
+from nni.compression.torch import L1FilterPruner
+config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
+# dummy_input is necessary for the dependency_aware mode
+dummy_input = torch.ones(1, 3, 224, 224).cuda()
+pruner = L1FilterPruner(model, config_list, dependency_aware=True, dummy_input=dummy_input)
+# for L2FilterPruner
+# pruner = L2FilterPruner(model, config_list, dependency_aware=True, dummy_input=dummy_input)
+# for FPGMPruner
+# pruner = FPGMPruner(model, config_list, dependency_aware=True, dummy_input=dummy_input)
+# for ActivationAPoZRankFilterPruner
+# pruner = ActivationAPoZRankFilterPruner(model, config_list, statistics_batch_num=1, , dependency_aware=True, dummy_input=dummy_input)
+# for ActivationMeanRankFilterPruner
+# pruner = ActivationMeanRankFilterPruner(model, config_list, statistics_batch_num=1, dependency_aware=True, dummy_input=dummy_input)
+# for TaylorFOWeightFilterPruner
+# pruner = TaylorFOWeightFilterPruner(model, config_list, statistics_batch_num=1, dependency_aware=True, dummy_input=dummy_input)
+
+pruner.compress()
+```
+
+## Evaluation
+In order to compare the performance of the pruner with or without the dependency-aware mode, we use L1FilterPruner to prune the Mobilenet_v2 separately when the dependency-aware mode is turned on and off. To simplify the experiment, we use the uniform pruning which means we allocate the same sparsity for all convolutional layers in the model.
+We trained a Mobilenet_v2 model on the cifar10 dataset and prune the model based on this pretrained checkpoint. The following figure shows the accuracy and FLOPs of the model pruned by different pruners.
+![](../../img/mobilev2_l1_cifar.jpg)
+
+In the figure, the `Dependency-aware` represents the L1FilterPruner with dependency-aware mode enabled. `L1 Filter` is the normal `L1FilterPruner` without the dependency-aware mode, and the `No-Dependency` means  pruner only prunes the layers that has no channel dependency with other layers. As we can see in the figure, when the dependency-aware mode enabled, the pruner can bring higher accuracy under the same Flops.
diff --git a/docs/en_US/Compressor/Pruner.md b/docs/en_US/Compressor/Pruner.md
@@ -114,7 +114,9 @@ FPGMPruner prune filters with the smallest geometric median.
 
  ![](../../img/fpgm_fig1.png)
 
->Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with “relatively less” importance.
+>Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with “relatively less” importance. 
+
+We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
 
 ### Usage
 
@@ -154,6 +156,8 @@ This is an one-shot pruner, In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https:
 > 4. A new kernel matrix is created for both the ![](http://latex.codecogs.com/gif.latex?i)th and ![](http://latex.codecogs.com/gif.latex?i+1)th layers, and the remaining kernel
 >      weights are copied to the new model.
 
+In addition, we also provide a dependency-aware mode for the L1FilterPruner. For more details about the dependency-aware mode, please reference [dependency-aware mode](./DependencyAware.md).
+
 ### Usage
 
 PyTorch code
@@ -189,6 +193,8 @@ The experiments code can be found at [examples/model_compress]( https://github.c
 
 This is a structured pruning algorithm that prunes the filters with the smallest L2 norm of the weights. It is implemented as a one-shot pruner.
 
+We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
+
 ### Usage
 
 PyTorch code
@@ -200,6 +206,7 @@ pruner = L2FilterPruner(model, config_list)
 pruner.compress()
 ```
 
+
 ### User configuration for L2Filter Pruner
 
 ##### PyTorch
@@ -208,6 +215,7 @@ pruner.compress()
 ```
 ***
 
+
 ## ActivationAPoZRankFilter Pruner
 
 ActivationAPoZRankFilter Pruner is a pruner which prunes the filters with the smallest importance criterion `APoZ` calculated from the output activations of convolution layers to achieve a preset level of network sparsity. The pruning criterion `APoZ` is explained in the paper [Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250).
@@ -216,6 +224,8 @@ The APoZ is defined as:
 
 ![](../../img/apoz.png)
 
+We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
+
 ### Usage
 
 PyTorch code
@@ -234,6 +244,8 @@ Note: ActivationAPoZRankFilterPruner is used to prune convolutional layers withi
 
 You can view [example](https://github.com/microsoft/nni/blob/master/examples/model_compress/model_prune_torch.py) for more information.
 
+
+
 ### User configuration for ActivationAPoZRankFilter Pruner
 
 ##### PyTorch
@@ -247,6 +259,8 @@ You can view [example](https://github.com/microsoft/nni/blob/master/examples/mod
 
 ActivationMeanRankFilterPruner is a pruner which prunes the filters with the smallest importance criterion `mean activation` calculated from the output activations of convolution layers to achieve a preset level of network sparsity. The pruning criterion `mean activation` is explained in section 2.2 of the paper[Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440). Other pruning criteria mentioned in this paper will be supported in future release.
 
+We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
+
 ### Usage
 
 PyTorch code
@@ -265,6 +279,7 @@ Note: ActivationMeanRankFilterPruner is used to prune convolutional layers withi
 
 You can view [example](https://github.com/microsoft/nni/blob/master/examples/model_compress/model_prune_torch.py) for more information.
 
+
 ### User configuration for ActivationMeanRankFilterPruner
 
 ##### PyTorch
@@ -273,6 +288,7 @@ You can view [example](https://github.com/microsoft/nni/blob/master/examples/mod
 ```
 ***
 
+
 ## TaylorFOWeightFilter Pruner
 
 TaylorFOWeightFilter Pruner is a pruner which prunes convolutional layers based on estimated importance calculated from the first order taylor expansion on weights to achieve a preset level of network sparsity. The estimated importance of filters is defined as the paper [Importance Estimation for Neural Network Pruning](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf). Other pruning criteria mentioned in this paper will be supported in future release.
@@ -281,6 +297,8 @@ TaylorFOWeightFilter Pruner is a pruner which prunes convolutional layers based
 
 ![](../../img/importance_estimation_sum.png)
 
+We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
+
 ### Usage
 
 PyTorch code

diff --git a/docs/en_US/TrainingService/RemoteMachineMode.md b/docs/en_US/TrainingService/RemoteMachineMode.md
@@ -107,3 +107,79 @@ Files in `codeDir` will be uploaded to remote machines automatically. You can ru
 ```bash
 nnictl create --config examples/trials/mnist-annotation/config_remote.yml
 ```
+
+### Configure python environment
+
+By default, commands and scripts will be executed in the default environment in remote machine. If there are multiple python virtual environments in your remote machine, and you want to run experiments in a specific environment, then use __preCommand__ to specify a python environment on your remote machine. 
+
+Use `examples/trials/mnist-tfv2` as the example. Below is content of `examples/trials/mnist-tfv2/config_remote.yml`:
+
+```yaml
+authorName: default
+experimentName: example_mnist
+trialConcurrency: 1
+maxExecDuration: 1h
+maxTrialNum: 10
+#choice: local, remote, pai
+trainingServicePlatform: remote
+searchSpacePath: search_space.json
+#choice: true, false
+useAnnotation: false
+tuner:
+  #choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner
+  #SMAC (SMAC should be installed through nnictl)
+  builtinTunerName: TPE
+  classArgs:
+    #choice: maximize, minimize
+    optimize_mode: maximize
+trial:
+  command: python3 mnist.py
+  codeDir: .
+  gpuNum: 0
+#machineList can be empty if the platform is local
+machineList:
+  - ip: ${replace_to_your_remote_machine_ip}
+    username: ${replace_to_your_remote_machine_username}
+    sshKeyPath: ${replace_to_your_remote_machine_sshKeyPath}
+    # Pre-command will be executed before the remote machine executes other commands.
+    # Below is an example of specifying python environment.
+    # If you want to execute multiple commands, please use "&&" to connect them.
+    # preCommand: source ${replace_to_absolute_path_recommended_here}/bin/activate
+    # preCommand: source ${replace_to_conda_path}/bin/activate ${replace_to_conda_env_name}
+    preCommand: export PATH=${replace_to_python_environment_path_in_your_remote_machine}:$PATH
+```
+
+The __preCommand__ will be executed before the remote machine executes other commands. So you can configure python environment path like this:
+
+```yaml
+# Linux remote machine
+preCommand: export PATH=${replace_to_python_environment_path_in_your_remote_machine}:$PATH
+# Windows remote machine
+preCommand: set path=${replace_to_python_environment_path_in_your_remote_machine};%path%
+```
+
+Or if you want to activate the `virtualenv` environment:
+
+```yaml
+# Linux remote machine
+preCommand: source ${replace_to_absolute_path_recommended_here}/bin/activate
+# Windows remote machine
+preCommand: ${replace_to_absolute_path_recommended_here}\\scripts\\activate
+```
+
+Or if you want to activate the `conda` environment:
+
+```yaml
+# Linux remote machine
+preCommand: source ${replace_to_conda_path}/bin/activate ${replace_to_conda_env_name}
+# Windows remote machine
+preCommand: call activate ${replace_to_conda_env_name}
+```
+
+If you want multiple commands to be executed, you can use `&&` to connect these commands:
+
+```yaml
+preCommand: command1 && command2 && command3
+```
+
+__Note__: Because __preCommand__ will execute before other commands each time, it is strongly not recommended to set __preCommand__ that will make changes to system, i.e. `mkdir` or `touch`.
diff --git a/docs/en_US/TrialExample/SklearnExamples.md b/docs/en_US/TrialExample/SklearnExamples.md
@@ -67,7 +67,7 @@ It is easy to use NNI in your scikit-learn code, there are only a few steps.
     "kernel": {"_type":"choice","_value":["linear", "rbf", "poly", "sigmoid"]},
     "degree": {"_type":"choice","_value":[1, 2, 3, 4]},
     "gamma": {"_type":"uniform","_value":[0.01, 0.1]},
-    "coef0 ": {"_type":"uniform","_value":[0.01, 0.1]}
+    "coef0": {"_type":"uniform","_value":[0.01, 0.1]}
   }
   ```