SparkSnail · SparkSnail · Feb 21, 2020 · Jan 16, 2020 · Jan 19, 2020 · Feb 5, 2020
diff --git a/README.md b/README.md
@@ -25,7 +25,7 @@ The tool manages automated machine learning (AutoML) experiments, **dispatches a
 * Researchers and data scientists who want to easily **implement and experiement new AutoML algorithms**, may it be: hyperparameter tuning algorithm, neural architect search algorithm or model compression algorithm.
 * ML Platform owners who want to **support AutoML in their platform**.
 
-### **NNI v1.3 has been released! &nbsp;<a href="#nni-released-reminder"><img width="48" src="docs/img/release_icon.png"></a>**
+### **NNI v1.4 has been released! &nbsp;<a href="#nni-released-reminder"><img width="48" src="docs/img/release_icon.png"></a>**
 
 ## **NNI capabilities in a glance**
 NNI provides CommandLine Tool as well as an user friendly WebUI to manage training experiements. With the extensible API, you can customize your own AutoML algorithms and training services. To make it easy for new users, NNI also provides a set of build-in stat-of-the-art AutoML algorithms and out of box support for popular training platforms. 
@@ -177,9 +177,9 @@ Within the following table, we summarized the current NNI capabilities, we are g
       </td>
      <td style="border-top:#FF0000 solid 0px;">
       <ul>
-        <li><a href="docs/en_US/sdk_reference.rst">Python API</a></li>
+        <li><a href="https://nni.readthedocs.io/en/latest/autotune_ref.html#trial">Python API</a></li>
         <li><a href="docs/en_US/Tutorial/AnnotationSpec.md">NNI Annotation</a></li>
-         <li><a href="docs/en_US/Tutorial/Installation.md">Supported OS</a></li>
+         <li><a href="https://nni.readthedocs.io/en/latest/installation.html">Supported OS</a></li>
       </ul>
       </td>
        <td style="border-top:#FF0000 solid 0px;">
@@ -216,9 +216,9 @@ Windows
 python -m pip install --upgrade nni
 ```
 
-If you want to try latest code, please [install NNI](docs/en_US/Tutorial/Installation.md) from source code.
+If you want to try latest code, please [install NNI](https://nni.readthedocs.io/en/latest/installation.html) from source code.
 
-For detail system requirements of NNI, please refer to [here](docs/en_US/Tutorial/Installation.md#system-requirements).
+For detail system requirements of NNI, please refer to [here](https://nni.readthedocs.io/en/latest/Tutorial/InstallationLinux.html#system-requirements) for Linux & macOS, and [here](https://nni.readthedocs.io/en/latest/Tutorial/InstallationWin.html#system-requirements) for Windows.
 
 Note:
 
@@ -233,7 +233,7 @@ The following example is built on TensorFlow 1.x. Make sure **TensorFlow 1.x is
 * Download the examples via clone the source code.
 
   ```bash
-  git clone -b v1.3 https://github.com/Microsoft/nni.git
+  git clone -b v1.4 https://github.com/Microsoft/nni.git
   ```
 
 * Run the MNIST example.

diff --git a/README_zh_CN.md b/README_zh_CN.md
@@ -172,9 +172,9 @@ NNI 提供命令行工具以及友好的 WebUI 来管理训练的 Experiment。
       </td>
      <td style="border-top:#FF0000 solid 0px;">
       <ul>
-        <li><a href="docs/zh_CN/sdk_reference.rst">Python API</a></li>
+        <li><a href="https://nni.readthedocs.io/zh/latest/autotune_ref.html#trial">Python API</a></li>
         <li><a href="docs/zh_CN/Tutorial/AnnotationSpec.md">NNI Annotation</a></li>
-         <li><a href="docs/zh_CN/Tutorial/Installation.md">支持的操作系统</a></li>
+         <li><a href="https://nni.readthedocs.io/zh/latest/installation.html">支持的操作系统</a></li>
       </ul>
       </td>
        <td style="border-top:#FF0000 solid 0px;">
@@ -211,9 +211,9 @@ Windows
 python -m pip install --upgrade nni
 ```
 
-如果想要尝试最新代码，可通过源代码[安装 NNI](docs/zh_CN/Tutorial/Installation.md)。
+如果想试试最新代码，可参考从源代码[安装 NNI](https://nni.readthedocs.io/zh/latest/installation.html)。
 
-有关 NNI 的详细系统要求，参考[这里](docs/zh_CN/Tutorial/Installation.md#system-requirements)。
+Linux 和 macOS 下 NNI 系统需求[参考这里](https://nni.readthedocs.io/zh/latest/Tutorial/InstallationLinux.html#system-requirements) ，Windows [参考这里](https://nni.readthedocs.io/zh/latest/Tutorial/InstallationWin.html#system-requirements)。
 
 注意：
 

diff --git a/azure-pipelines.yml b/azure-pipelines.yml
@@ -26,8 +26,8 @@ jobs:
       yarn eslint
     displayName: 'Run eslint'
   - script: |
-      python3 -m pip install torch==0.4.1 --user
-      python3 -m pip install torchvision==0.2.1 --user
+      python3 -m pip install torch==1.2.0 --user
+      python3 -m pip install torchvision==0.4.0 --user
       python3 -m pip install tensorflow==1.13.1 --user
       python3 -m pip install keras==2.1.6 --user
       python3 -m pip install gym onnx --user
@@ -91,8 +91,8 @@ jobs:
       echo "##vso[task.setvariable variable=PATH]${HOME}/Library/Python/3.7/bin:${PATH}"
     displayName: 'Install nni toolkit via source code'
   - script: |
-      python3 -m pip install torch==0.4.1 --user
-      python3 -m pip install torchvision==0.2.1 --user
+      python3 -m pip install torch==1.2.0 --user
+      python3 -m pip install torchvision==0.4.0 --user
       python3 -m pip install tensorflow==1.13.1 --user
       ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" < /dev/null 2> /dev/null
       brew install swig@3
@@ -131,7 +131,7 @@ jobs:
   - script: |
       python -m pip install scikit-learn==0.20.0 --user
       python -m pip install keras==2.1.6 --user
-      python -m pip install https://download.pytorch.org/whl/cu90/torch-0.4.1-cp36-cp36m-win_amd64.whl --user
+      python -m pip install torch===1.2.0 torchvision===0.4.1 -f https://download.pytorch.org/whl/torch_stable.html --user
       python -m pip install torchvision --user
       python -m pip install tensorflow==1.13.1 --user
     displayName: 'Install dependencies'

diff --git a/deployment/docker/Dockerfile b/deployment/docker/Dockerfile
@@ -52,7 +52,7 @@ RUN python3 -m pip --no-cache-dir install Keras==2.1.6
 # PyTorch
 #
 RUN python3 -m pip --no-cache-dir install torch==1.2.0
-RUN python3 -m pip install torchvision==0.4.0
+RUN python3 -m pip install torchvision==0.5.0
 
 #
 # sklearn 0.20.0

diff --git a/docs/en_US/Compressor/ModelSpeedup.md b/docs/en_US/Compressor/ModelSpeedup.md
@@ -0,0 +1,105 @@
+# Speed up Masked Model
+
+*This feature is still in Alpha version.*
+
+## Introduction
+
+Pruning algorithms usually use weight masks to simulate the real pruning. Masks can be used
+to check model performance of a specific pruning (or sparsity), but there is no real speedup.
+Since model speedup is the ultimate goal of model pruning, we try to provide a tool to users
+to convert a model to a smaller one based on user provided masks (the masks come from the
+pruning algorithms).
+
+There are two types of pruning. One is fine-grained pruning, it does not change the shape of weights, and input/output tensors. Sparse kernel is required to speed up a fine-grained pruned layer. The other is coarse-grained pruning (e.g., channels), shape of weights and input/output tensors usually change due to such pruning. To speed up this kind of pruning, there is no need to use sparse kernel, just replace the pruned layer with smaller one. Since the support of sparse kernels in community is limited, we only support the speedup of coarse-grained pruning and leave the support of fine-grained pruning in future.
+
+## Design and Implementation
+
+To speed up a model, the pruned layers should be replaced, either replaced with smaller layer for coarse-grained mask, or replaced with sparse kernel for fine-grained mask. Coarse-grained mask usually changes the shape of weights or input/output tensors, thus, we should do shape inference to check are there other unpruned layers should be replaced as well due to shape change. Therefore, in our design, there are two main steps: first, do shape inference to find out all the modules that should be replaced; second, replace the modules. The first step requires topology (i.e., connections) of the model, we use `jit.trace` to obtain the model grpah for PyTorch.
+
+For each module, we should prepare four functions, three for shape inference and one for module replacement. The three shape inference functions are: given weight shape infer input/output shape, given input shape infer weight/output shape, given output shape infer weight/input shape. The module replacement function returns a newly created module which is smaller.
+
+## Usage
+
+```python
+from nni.compression.speedup.torch import ModelSpeedup
+# model: the model you want to speed up
+# dummy_input: dummy input of the model, given to `jit.trace`
+# masks_file: the mask file created by pruning algorithms
+m_speedup = ModelSpeedup(model, dummy_input.to(device), masks_file)
+m_speedup.speedup_model()
+dummy_input = dummy_input.to(device)
+start = time.time()
+out = model(dummy_input)
+print('elapsed time: ', time.time() - start)
+```
+For complete examples please refer to [the code](https://github.com/microsoft/nni/tree/master/examples/model_compress/model_speedup.py)
+
+NOTE: The current implementation only works on torch 1.3.1 and torchvision 0.4.2
+
+## Limitations
+
+Since every module requires four functions for shape inference and module replacement, this is a large amount of work, we only implemented the ones that are required by the examples. If you want to speed up your own model which cannot supported by the current implementation, you are welcome to contribute.
+
+For PyTorch we can only replace modules, if functions in `forward` should be replaced, our current implementation does not work. One workaround is make the function a PyTorch module.
+
+## Speedup Results of Examples
+
+The code of these experiments can be found [here](https://github.com/microsoft/nni/tree/master/examples/model_compress/model_speedup.py).
+
+### slim pruner example
+
+on one V100 GPU,
+input tensor: `torch.randn(64, 3, 32, 32)`
+
+|Times| Mask Latency| Speedup Latency |
+|---|---|---|
+| 1 | 0.01197 | 0.005107 |
+| 2 | 0.02019 | 0.008769 |
+| 4 | 0.02733 | 0.014809 |
+| 8 | 0.04310 | 0.027441 |
+| 16 | 0.07731 | 0.05008 |
+| 32 | 0.14464 | 0.10027 |
+
+### fpgm pruner example
+
+on cpu,
+input tensor: `torch.randn(64, 1, 28, 28)`,
+too large variance
+
+|Times| Mask Latency| Speedup Latency |
+|---|---|---|
+| 1 | 0.01383 | 0.01839 |
+| 2 | 0.01167 | 0.003558 |
+| 4 | 0.01636 | 0.01088 |
+| 40 | 0.14412 | 0.08268 |
+| 40 | 1.29385 | 0.14408 |
+| 40 | 0.41035 | 0.46162 |
+| 400 | 6.29020 | 5.82143 |
+
+### l1filter pruner example
+
+on one V100 GPU,
+input tensor: `torch.randn(64, 3, 32, 32)`
+
+|Times| Mask Latency| Speedup Latency |
+|---|---|---|
+| 1 | 0.01026 | 0.003677 |
+| 2 | 0.01657 | 0.008161 |
+| 4 | 0.02458 | 0.020018 |
+| 8 | 0.03498 | 0.025504 |
+| 16 | 0.06757 | 0.047523 |
+| 32 | 0.10487 | 0.086442 |
+
+### APoZ pruner example
+
+on one V100 GPU,
+input tensor: `torch.randn(64, 3, 32, 32)`
+
+|Times| Mask Latency| Speedup Latency |
+|---|---|---|
+| 1 | 0.01389 | 0.004208 |
+| 2 | 0.01628 | 0.008310 |
+| 4 | 0.02521 | 0.014008 |
+| 8 | 0.03386 | 0.023923 |
+| 16 | 0.06042 | 0.046183 |
+| 32 | 0.12421 | 0.087113 |
diff --git a/docs/en_US/Compressor/Overview.md b/docs/en_US/Compressor/Overview.md
@@ -1,7 +1,7 @@
 # Model Compression with NNI
 As larger neural networks with more layers and nodes are considered, reducing their storage and computational cost becomes critical, especially for some real-time applications. Model compression can be used to address this problem. 
 
-We are glad to announce the alpha release for model compression toolkit on top of NNI, it's still in the experiment phase which might evolve based on usage feedback. We'd like to invite you to use, feedback and even contribute.
+We are glad to introduce model compression toolkit on top of NNI, it's still in the experiment phase which might evolve based on usage feedback. We'd like to invite you to use, feedback and even contribute.
 
 NNI provides an easy-to-use toolkit to help user design and use compression algorithms. It currently supports PyTorch with unified interface. For users to compress their models, they only need to add several lines in their code. There are some popular model compression algorithms built-in in NNI. Users could further use NNI's auto tuning power to find the best compressed model, which is detailed in [Auto Model Compression](./AutoCompression.md). On the other hand, users could easily customize their new compression algorithms using NNI's interface, refer to the tutorial [here](#customize-new-compression-algorithms).
 
@@ -335,9 +335,9 @@ class YourQuantizer(Quantizer):
 If you do not customize `QuantGrad`, the default backward is Straight-Through Estimator. 
 _Coming Soon_ ...
 
-## **Reference and Feedback**
+## Reference and Feedback
 * To [report a bug](https://github.com/microsoft/nni/issues/new?template=bug-report.md) for this feature in GitHub;
 * To [file a feature or improvement request](https://github.com/microsoft/nni/issues/new?template=enhancement.md) for this feature in GitHub;
-* To know more about [Feature Engineering with NNI](https://github.com/microsoft/nni/blob/master/docs/en_US/FeatureEngineering/Overview.md);
-* To know more about [NAS with NNI](https://github.com/microsoft/nni/blob/master/docs/en_US/NAS/Overview.md);
-* To know more about [Hyperparameter Tuning with NNI](https://github.com/microsoft/nni/blob/master/docs/en_US/Tuner/BuiltinTuner.md);
+* To know more about [Feature Engineering with NNI](../FeatureEngineering/Overview.md);
+* To know more about [NAS with NNI](../NAS/Overview.md);
+* To know more about [Hyperparameter Tuning with NNI](../Tuner/BuiltinTuner.md);
diff --git a/docs/en_US/Compressor/QuickStart.md b/docs/en_US/Compressor/QuickStart.md
@@ -0,0 +1,46 @@
+# Quick Start to Compress a Model
+
+NNI provides very simple APIs for compressing a model. The compression includes pruning algorithms and quantization algorithms. The usage of them are the same, thus, here we use slim pruner as an example to show the usage. The complete code of this example can be found [here](https://github.com/microsoft/nni/blob/master/examples/model_compress/slim_torch_cifar10.py).
+
+## Write configuration
+
+Write a configuration to specify the layers that you want to prune. The following configuration means pruning all the `BatchNorm2d`s to sparsity 0.7 while keeping other layers unpruned.
+
+```python
+configure_list = [{
+    'sparsity': 0.7,
+    'op_types': ['BatchNorm2d'],
+}]
+```
+
+The specification of configuration can be found [here](Overview.md#user-configuration-for-a-compression-algorithm). Note that different pruners may have their own defined fields in configuration, for exmaple `start_epoch` in AGP pruner. Please refer to each pruner's [usage](Overview.md#supported-algorithms) for details, and adjust the configuration accordingly.
+
+## Choose a compression algorithm
+
+Choose a pruner to prune your model. First instantiate the chosen pruner with your model and configuration as arguments, then invoke `compress()` to compress your model.
+
+```python
+pruner = SlimPruner(model, configure_list)
+model = pruner.compress()
+```
+
+Then, you can train your model using traditional training approach (e.g., SGD), pruning is applied transparently during the training. Some pruners prune once at the beginning, the following training can be seen as fine-tune. Some pruners prune your model iteratively, the masks are adjusted epoch by epoch during training.
+
+## Export compression result
+
+After training, you get accuracy of the pruned model. You can export model weights to a file, and the generated masks to a file as well. Exporting onnx model is also supported.
+
+```python
+pruner.export_model(model_path='pruned_vgg19_cifar10.pth', mask_path='mask_vgg19_cifar10.pth')
+```
+
+## Speed up the model
+
+Masks do not provide real speedup of your model. The model should be speeded up based on the exported masks, thus, we provide an API to speed up your model as shown below. After invoking `apply_compression_results` on your model, your model becomes a smaller one with shorter inference latency.
+
+```python
+from nni.compression.torch import apply_compression_results
+apply_compression_results(model, 'mask_vgg19_cifar10.pth')
+```
+
+Please refer to [here](ModelSpeedup.md) for detailed description.
diff --git a/docs/en_US/FeatureEngineering/Overview.md b/docs/en_US/FeatureEngineering/Overview.md
@@ -6,19 +6,25 @@ For now, we support the following feature selector:
 - [GradientFeatureSelector](./GradientFeatureSelector.md)
 - [GBDTSelector](./GBDTSelector.md)
 
+These selectors are suitable for tabular data(which means it doesn't include image, speech and text data).
 
-# How to use?
+In addition, those selector only for feature selection. If you want to:
+1) generate high-order combined features on nni while doing feature selection;
+2) leverage your distributed resources;
+you could try this [example](https://github.com/microsoft/nni/tree/master/examples/feature_engineering/auto-feature-engineering).
+
+## How to use?
 
 ```python
-from nni.feature_engineering.gradient_selector import GradientFeatureSelector
+from nni.feature_engineering.gradient_selector import FeatureGradientSelector
 # from nni.feature_engineering.gbdt_selector import GBDTSelector
 
 # load data
 ...
 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
 
 # initlize a selector
-fgs = GradientFeatureSelector(...)
+fgs = FeatureGradientSelector(...)
 # fit data
 fgs.fit(X_train, y_train)
 # get improtant features
@@ -30,7 +36,7 @@ print(fgs.get_selected_features(...))
 
 When using the built-in Selector, you first need to `import` a feature selector, and `initialize` it. You could call the function `fit` in the selector to pass the data to the selector. After that, you could use `get_seleteced_features` to get important features. The function parameters in different selectors might be different, so you need to check the docs before using it. 
 
-# How to customize?
+## How to customize?
 
 NNI provides _state-of-the-art_ feature selector algorithm in the builtin-selector. NNI also supports to build a feature selector by yourself.
 
@@ -239,7 +245,7 @@ print("Pipeline Score: ", pipeline.score(X_train, y_train))
 
 ```
 
-# Benchmark
+## Benchmark
 
 `Baseline` means without any feature selection, we directly pass the data to LogisticRegression. For this benchmark, we only use 10% data from the train as test data. For the GradientFeatureSelector, we only take the top20 features. The metric is the mean accuracy on the given test data and labels.
 
@@ -257,7 +263,7 @@ The dataset of benchmark could be download in [here](https://www.csie.ntu.edu.tw
 
 The code could be refenrence `/examples/feature_engineering/gradient_feature_selector/benchmark_test.py`.
 
-## **Reference and Feedback**
+## Reference and Feedback
 * To [report a bug](https://github.com/microsoft/nni/issues/new?template=bug-report.md) for this feature in GitHub;
 * To [file a feature or improvement request](https://github.com/microsoft/nni/issues/new?template=enhancement.md) for this feature in GitHub;
 * To know more about [Neural Architecture Search with NNI](https://github.com/microsoft/nni/blob/master/docs/en_US/NAS/Overview.md);