microsoft · QuanluZhang · Nov 27, 2020 · Jul 20, 2020 · Jul 20, 2020 · Jul 31, 2020
diff --git a/README.md b/README.md
@@ -135,6 +135,7 @@ Within the following table, we summarized the current NNI capabilities, we are g
               <li><a href="docs/en_US/NAS/Proxylessnas.md">ProxylessNAS</a></li>
               <li><a href="docs/en_US/Tuner/BuiltinTuner.md#NetworkMorphism">Network Morphism</a></li>
               <li><a href="docs/en_US/NAS/TextNAS.md">TextNAS</a></li>
+              <li><a href="docs/en_US/NAS/Cream.md">Cream</a></li>
             </ul>
           </ul>
           <a href="docs/en_US/Compression/Overview.md">Model Compression</a>

diff --git a/docs/en_US/NAS/CDARTS.md b/docs/en_US/NAS/CDARTS.md
@@ -1,16 +1,17 @@
+
 # CDARTS
 
 ## Introduction
 
-CDARTS builds a cyclic feedback mechanism between the search and evaluation networks. First, the search network generates an initial topology for evaluation, so that the weights of the evaluation network can be optimized. Second, the architecture topology in the search network is further optimized by the label supervision in classification, as well as the regularization from the evaluation network through feature distillation. Repeating the above cycle results in a joint optimization of the search and evaluation networks, and thus enables the evolution of the topology to fit the final evaluation network.
+[CDARTS](https://arxiv.org/pdf/2006.10724.pdf) builds a cyclic feedback mechanism between the search and evaluation networks. First, the search network generates an initial topology for evaluation, so that the weights of the evaluation network can be optimized. Second, the architecture topology in the search network is further optimized by the label supervision in classification, as well as the regularization from the evaluation network through feature distillation. Repeating the above cycle results in a joint optimization of the search and evaluation networks, and thus enables the evolution of the topology to fit the final evaluation network.
 
-In implementation of `CdartsTrainer`, it first instantiates two models and two mutators (one for each). The first model is the so-called "search network", which is mutated with a `RegularizedDartsMutator` -- a mutator with subtle differences with `DartsMutator`. The second model is the "evaluation network", which is mutated with a discrete mutator that leverages the previous search network mutator, to sample a single path each time. Trainers train models and mutators alternatively. Users can refer to [references](#reference) if they are interested in more details on these trainers and mutators.
+In implementation of `CdartsTrainer`, it first instantiates two models and two mutators (one for each). The first model is the so-called "search network", which is mutated with a `RegularizedDartsMutator` -- a mutator with subtle differences with `DartsMutator`. The second model is the "evaluation network", which is mutated with a discrete mutator that leverages the previous search network mutator, to sample a single path each time. Trainers train models and mutators alternatively. Users can refer to [paper](https://arxiv.org/pdf/2006.10724.pdf) if they are interested in more details on these trainers and mutators.
 
 ## Reproduction Results
 
 This is CDARTS based on the NNI platform, which currently supports CIFAR10 search and retrain. ImageNet search and retrain should also be supported, and we provide corresponding interfaces. Our reproduced results on NNI are slightly lower than the paper, but much higher than the original DARTS. Here we show the results of three independent experiments on CIFAR10.
 
-| Runs | Paper | NNI |
+| Runs | Paper | NNI | 
 | ---- |:-------------:| :-----:|
 | 1 | 97.52 | 97.44 |
 | 2 | 97.53 | 97.48 |
@@ -19,7 +20,7 @@ This is CDARTS based on the NNI platform, which currently supports CIFAR10 searc
 
 ## Examples
 
-[Example code](https://github.com/microsoft/nni/tree/v1.9/examples/nas/cdarts)
+[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/cdarts)
 
 ```bash
 # In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
@@ -55,3 +56,4 @@ bash run_retrain_cifar.sh
 ..  autoclass:: nni.algorithms.nas.pytorch.cdarts.RegularizedMutatorParallel
     :members:
 ```
+
diff --git a/docs/en_US/NAS/Cream.md b/docs/en_US/NAS/Cream.md
@@ -0,0 +1,127 @@
+# Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search 
+
+**[[Paper]](https://papers.nips.cc/paper/2020/file/d072677d210ac4c03ba046120f0802ec-Paper.pdf) [[Models-Google Drive]](https://drive.google.com/drive/folders/1NLGAbBF9bA1IUAxKlk2VjgRXhr6RHvRW?usp=sharing)[[Models-Baidu Disk (PWD: wqw6)]](https://pan.baidu.com/s/1TqQNm2s14oEdyNPimw3T9g) [[BibTex]](https://scholar.googleusercontent.com/scholar.bib?q=info:ICWVXc_SsKAJ:scholar.google.com/&output=citation&scisdr=CgUmooXfEMfTi0cV5aU:AAGBfm0AAAAAX7sQ_aXoamdKRaBI12tAVN8REq1VKNwM&scisig=AAGBfm0AAAAAX7sQ_RdYtp6BSro3zgbXVJU2MCgsG730&scisf=4&ct=citation&cd=-1&hl=ja)**  <br/>
+
+In this work, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop. The discovered architectures achieve superior performance compared to the recent [MobileNetV3](https://arxiv.org/abs/1905.02244) and [EfficientNet](https://arxiv.org/abs/1905.11946) families under aligned settings.
+
+<div >
+    <img src="https://github.com/microsoft/Cream/blob/main/demo/intro.jpg" width="800"/>
+</div>
+
+
+## Reproduced Results
+Top-1 Accuracy on ImageNet. The top-1 accuracy of Cream search algorithm surpasses MobileNetV3 and EfficientNet-B0/B1 on ImageNet.
+The training with 16 Gpus is a little bit superior than 8 Gpus, as below.
+
+| Model (M Flops) |  8Gpus | 16Gpus | 
+| ---- |:-------------:| :-----:|
+| 14M | 53.7 | 53.8 |
+| 43M | 65.8 | 66.5 |
+| 114M | 72.1 | 72.8 |
+| 287M | 76.7 | 77.6 |
+| 481M | 78.9 | 79.2 |
+| 604M | 79.4 | 80.0 |
+
+<table style="border: none">
+    <th><img src="./../../img/cream_flops100.jpg" alt="drawing" width="400"/></th>
+    <th><img src="./../../img/cream_flops600.jpg" alt="drawing" width="400"/></th>
+</table>
+
+## Examples
+
+[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/cream)
+
+Please run the following scripts in the example folder.
+
+## Data Preparation
+
+You need to first download the [ImageNet-2012](http://www.image-net.org/) to the folder `./data/imagenet` and move the validation set to the subfolder `./data/imagenet/val`. To move the validation set, you cloud use the following script: <https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh> 
+
+Put the imagenet data in `./data`. It should be like following:
+
+```
+./data/imagenet/train
+./data/imagenet/val
+...
+```
+
+## Quick Start
+
+### I. Search
+
+First, build environments for searching.
+
+```
+pip install -r ./requirements
+
+git clone https://github.com/NVIDIA/apex.git
+cd apex
+python setup.py install --cpp_ext --cuda_ext
+```
+
+To search for an architecture, you need to configure the parameters `FLOPS_MINIMUM` and `FLOPS_MAXIMUM` to specify the desired model flops, such as [0,600]MB flops. You can specify the flops interval by changing these two parameters in `./configs/train.yaml`
+
+```
+FLOPS_MINIMUM: 0 # Minimum Flops of Architecture
+FLOPS_MAXIMUM: 600 # Maximum Flops of Architecture
+```
+
+For example, if you expect to search an architecture with model flops <= 200M, please set the `FLOPS_MINIMUM` and `FLOPS_MAXIMUM` to be `0` and `200`.
+
+After you specify the flops of the architectures you would like to search, you can search an architecture now by running:
+
+```
+python -m torch.distributed.launch --nproc_per_node=8 ./train.py --cfg ./configs/train.yaml
+```
+
+The searched architectures need to be retrained and obtain the final model. The final model is saved in `.pth.tar` format. Retraining code will be released soon.
+
+### II. Retrain
+
+To train searched architectures, you need to configure the parameter `MODEL_SELECTION` to specify the model Flops. To specify which model to train, you should add `MODEL_SELECTION` in `./configs/retrain.yaml`. You can select one from [14,43,112,287,481,604], which stands for different Flops(MB).
+
+```
+MODEL_SELECTION: 43 # Retrain 43m model
+MODEL_SELECTION: 481 # Retrain 481m model
+......
+```
+
+To train random architectures, you need specify `MODEL_SELECTION` to `-1` and configure the parameter `INPUT_ARCH`:
+
+```
+MODEL_SELECTION: -1 # Train random architectures
+INPUT_ARCH: [[0], [3], [3, 3], [3, 1, 3], [3, 3, 3, 3], [3, 3, 3], [0]] # Random Architectures
+......
+```
+
+After adding `MODEL_SELECTION` in `./configs/retrain.yaml`, you need to use the following command to train the model.
+
+```
+python -m torch.distributed.launch --nproc_per_node=8 ./retrain.py --cfg ./configs/retrain.yaml
+```
+
+### III. Test
+
+To test our trained of models, you need to use `MODEL_SELECTION` in `./configs/test.yaml` to specify which model to test.
+
+```
+MODEL_SELECTION: 43 # test 43m model
+MODEL_SELECTION: 481 # test 470m model
+......
+```
+
+After specifying the flops of the model, you need to write the path to the resume model in `./test.sh`.
+
+```
+RESUME_PATH: './43.pth.tar'
+RESUME_PATH: './481.pth.tar'
+......
+```
+
+We provide 14M/43M/114M/287M/481M/604M pretrained models in [google drive](https://drive.google.com/drive/folders/1CQjyBryZ4F20Rutj7coF8HWFcedApUn2) or [[Models-Baidu Disk (password: wqw6)]](https://pan.baidu.com/s/1TqQNm2s14oEdyNPimw3T9g) .
+
+After downloading the pretrained models and adding `MODEL_SELECTION` and `RESUME_PATH` in './configs/test.yaml', you need to use the following command to test the model.
+
+```
+python -m torch.distributed.launch --nproc_per_node=8 ./test.py --cfg ./configs/test.yaml
+```
diff --git a/docs/en_US/NAS/one_shot_nas.rst b/docs/en_US/NAS/one_shot_nas.rst
@@ -14,4 +14,5 @@ One-shot NAS algorithms leverage weight sharing among models in neural architect
     SPOS <SPOS>
     CDARTS <CDARTS>
     ProxylessNAS <Proxylessnas>
-    TextNAS <TextNAS>
+    TextNAS <TextNAS>
+    Cream <Cream>
diff --git a/docs/img/cream.png b/docs/img/cream.png
diff --git a/docs/img/cream_flops100.jpg b/docs/img/cream_flops100.jpg
diff --git a/docs/img/cream_flops600.jpg b/docs/img/cream_flops600.jpg
diff --git a/examples/__init__.py b/examples/__init__.py
diff --git a/examples/nas/__init__.py b/examples/nas/__init__.py
diff --git a/examples/nas/cream/Cream.md b/examples/nas/cream/Cream.md
@@ -0,0 +1 @@
+[Documentation](https://nni.readthedocs.io/en/latest/NAS/Cream.html)
diff --git a/examples/nas/cream/__init__.py b/examples/nas/cream/__init__.py
diff --git a/examples/nas/cream/configs/retrain.yaml b/examples/nas/cream/configs/retrain.yaml
@@ -0,0 +1,52 @@
+AUTO_RESUME: False
+DATA_DIR: './data/imagenet'
+MODEL: '604m_retrain'
+RESUME_PATH: './experiments/workspace/retrain/resume.pth.tar'
+SAVE_PATH: './'
+SEED: 42
+LOG_INTERVAL: 50
+RECOVERY_INTERVAL: 0
+WORKERS: 4
+NUM_GPU: 2
+SAVE_IMAGES: False
+AMP: False
+OUTPUT: 'None'
+EVAL_METRICS: 'prec1'
+TTA: 0
+LOCAL_RANK: 0
+
+DATASET:
+  NUM_CLASSES: 1000
+  IMAGE_SIZE: 224 # image patch size
+  INTERPOLATION: 'random' # Image resize interpolation type
+  BATCH_SIZE: 32 # batch size
+  NO_PREFECHTER: False
+
+NET:
+  GP: 'avg'
+  DROPOUT_RATE: 0.0
+  SELECTION: 42
+
+  EMA:
+    USE: True
+    FORCE_CPU: False # force model ema to be tracked on CPU
+    DECAY: 0.9998
+
+OPT: 'sgd'
+OPT_EPS: 1e-2
+MOMENTUM: 0.9
+DECAY_RATE: 0.1
+
+SCHED: 'sgd'
+LR_NOISE: None
+LR_NOISE_PCT: 0.67
+LR_NOISE_STD: 1.0
+WARMUP_LR: 1e-4
+MIN_LR: 1e-5
+EPOCHS: 200
+START_EPOCH: None
+DECAY_EPOCHS: 30.0
+WARMUP_EPOCHS: 3
+COOLDOWN_EPOCHS: 10
+PATIENCE_EPOCHS: 10
+LR: 1e-2
diff --git a/examples/nas/cream/configs/test.yaml b/examples/nas/cream/configs/test.yaml
@@ -0,0 +1,37 @@
+AUTO_RESUME: True
+DATA_DIR: './data/imagenet'
+MODEL: 'Childnet_Testing'
+RESUME_PATH: './experiments/workspace/ckps/42.pth.tar'
+SAVE_PATH: './'
+SEED: 42
+LOG_INTERVAL: 50
+RECOVERY_INTERVAL: 0
+WORKERS: 4
+NUM_GPU: 2
+SAVE_IMAGES: False
+AMP: False
+OUTPUT: 'None'
+EVAL_METRICS: 'prec1'
+TTA: 0
+LOCAL_RANK: 0
+
+DATASET:
+  NUM_CLASSES: 1000
+  IMAGE_SIZE: 224 # image patch size
+  INTERPOLATION: 'bilinear' # Image resize interpolation type
+  BATCH_SIZE: 32 # batch size
+  NO_PREFECHTER: False
+
+NET:
+  GP: 'avg'
+  DROPOUT_RATE: 0.0
+  SELECTION: 42
+
+  EMA:
+    USE: True
+    FORCE_CPU: False # force model ema to be tracked on CPU
+    DECAY: 0.9998
+
+OPTIMIZER:
+  MOMENTUM: 0.9
+  WEIGHT_DECAY: 1e-3
diff --git a/examples/nas/cream/configs/train.yaml b/examples/nas/cream/configs/train.yaml
@@ -0,0 +1,53 @@
+AUTO_RESUME: False
+DATA_DIR: './data/imagenet'
+MODEL: 'Supernet_Training'
+RESUME_PATH: './experiments/workspace/train/resume.pth.tar'
+SAVE_PATH: './'
+SEED: 42
+LOG_INTERVAL: 50
+RECOVERY_INTERVAL: 0
+WORKERS: 8
+NUM_GPU: 8
+SAVE_IMAGES: False
+AMP: False
+OUTPUT: 'None'
+EVAL_METRICS: 'prec1'
+TTA: 0
+LOCAL_RANK: 0
+
+DATASET:
+  NUM_CLASSES: 1000
+  IMAGE_SIZE: 224 # image patch size
+  INTERPOLATION: 'bilinear' # Image resize interpolation type
+  BATCH_SIZE: 128 # batch size
+
+NET:
+  GP: 'avg'
+  DROPOUT_RATE: 0.0
+
+  EMA:
+    USE: True
+    FORCE_CPU: False # force model ema to be tracked on CPU
+    DECAY: 0.9998
+
+OPT: 'sgd'
+LR: 1.0
+EPOCHS: 120
+META_LR: 1e-4
+
+BATCHNORM:
+  SYNC_BN: False
+
+SUPERNET:
+  UPDATE_ITER: 200
+  SLICE: 4
+  POOL_SIZE: 10
+  RESUNIT: False
+  DIL_CONV: False
+  UPDATE_2ND: True
+  FLOPS_MINIMUM: 0
+  FLOPS_MAXIMUM: 600
+  PICK_METHOD: 'meta'
+  META_STA_EPOCH: 20
+  HOW_TO_PROB: 'pre_prob'
+  PRE_PROB: (0.05,0.2,0.05,0.5,0.05,0.15)