From bd7c0f0f645fc77512c94e05f0d02f56e1148d05 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Tue, 5 Nov 2019 09:57:22 +0800
Subject: [PATCH 01/60] update doc

---
 docs/en_US/NAS/Overview.md | 55 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)
 create mode 100644 docs/en_US/NAS/Overview.md

diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md
new file mode 100644
index 0000000000..dabfe1a4cd
--- /dev/null
+++ b/docs/en_US/NAS/Overview.md
@@ -0,0 +1,55 @@
+# NNI Programming Interface for Neural Architecture Search (NAS)
+
+*This is an experimental feature, programming APIs are almost done, NAS trainers are under intensive deveropment.*
+
+Automatic neural architecture search is taking an increasingly important role on finding better models. Recent research works have proved the feasibility of automatic NAS, and also found some models that could beat manually designed and tuned models. Some of representative works are [NASNet][2], [ENAS][1], [DARTS][3], [Network Morphism][4], and [Evolution][5]. There are new innovations keeping emerging. However, it takes great efforts to implement those algorithms, and it is hard to reuse code base of one algorithm for implementing another.
+
+To facilitate NAS innovations (e.g., design/implement new NAS models, compare different NAS models side-by-side), an easy-to-use and flexible programming interface is crucial.
+
+## Programming interface
+
+A new programming interface for designing and searching for a model is often demanded in two scenarios.
+    
+    1. When designing a neural network, the designer may have multiple choices for a layer, sub-model, or connection, and not sure which one or a combination performs the best. It would be appealing to have an easy way to express the candidate layers/sub-models they want to try. 
+    2. For the researchers who are working on automatic NAS, they want to have an unified way to express the search space of neural architectures. And making unchanged trial code adapted to different searching algorithms.
+
+For expressing neural architecture search space, we provide two APIs:
+
+```python
+# choose one ``op`` from ``ops``, for pytorch this is a module.
+# ops: for pytorch ``ops`` is a list of modules, for tensorflow it is a list of keras layers.
+# key: the name of this ``LayerChoice`` instance
+nni.nas.LayerChoice(ops, key)
+# choose ``n_selected`` from ``n_candidates`` inputs.
+# n_candidates: the number of candidate inputs
+# n_selected: the number of chosen inputs
+# reduction: reduction operation for the chosen inputs
+# key: the name of this ``InputChoice`` instance
+nni.nas.InputChoice(n_candidates, n_selected, reduction, key)
+```
+
+After writing your model with search space embedded in the model using the above two APIs, the next step is finding the best model from the search space. Similar to optimizers of deep learning models, the procedure of finding the best model from search space can be viewed as a type of optimizing process, we call it `NAS trainer`. There have been several NAS trainers, for example, `DartsTrainer` which uses SGD to train architecture weights and model weights iteratively, `ENASTrainer` which uses a controller to train the model. New and more efficient NAS trainers keep emerging in research community.
+
+NNI provides some popular NAS trainers, to use a NAS trainer, users could initialize a trainer after the model is defined:
+
+```python
+# create a DartsTrainer
+trainer = DartsTrainer(model,
+                       loss=criterion,
+                       metrics=lambda output, target: accuracy(output, target, topk=(1,)),
+                       model_optim=optim,
+                       lr_scheduler=lr_scheduler,
+                       num_epochs=50,
+                       dataset_train=dataset_train,
+                       dataset_valid=dataset_valid,
+                       batch_size=args.batch_size,
+                       log_frequency=args.log_frequency)
+# finding the best model from search space
+trainer.train()
+# export the best found model
+trainer.export_model()
+```
+
+Different trainers could have different input arguments depending on their algorithms. After training, users could export the best one of the found models through `trainer.export_model()`.
+
+[Here](https://github.com/microsoft/nni/blob/dev-nas-refactor/examples/nas/darts/main.py) is a trial example using DartsTrainer.
\ No newline at end of file

From d9f3afb4960391948c8dcba22684a26be7bc81cf Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Tue, 5 Nov 2019 10:00:37 +0800
Subject: [PATCH 02/60] update

---
 docs/en_US/NAS/Overview.md | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md
index dabfe1a4cd..8731b69678 100644
--- a/docs/en_US/NAS/Overview.md
+++ b/docs/en_US/NAS/Overview.md
@@ -52,4 +52,10 @@ trainer.export_model()
 
 Different trainers could have different input arguments depending on their algorithms. After training, users could export the best one of the found models through `trainer.export_model()`.
 
-[Here](https://github.com/microsoft/nni/blob/dev-nas-refactor/examples/nas/darts/main.py) is a trial example using DartsTrainer.
\ No newline at end of file
+[Here](https://github.com/microsoft/nni/blob/dev-nas-refactor/examples/nas/darts/main.py) is a trial example using DartsTrainer.
+
+[1]: https://arxiv.org/abs/1802.03268
+[2]: https://arxiv.org/abs/1707.07012
+[3]: https://arxiv.org/abs/1806.09055
+[4]: https://arxiv.org/abs/1806.10282
+[5]: https://arxiv.org/abs/1703.01041
\ No newline at end of file

From b5c295c8aa1814d3d5d55e5ce69019d0c94f7ccb Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Tue, 5 Nov 2019 10:15:46 +0800
Subject: [PATCH 03/60] update

---
 docs/en_US/NAS/Overview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md
index 8731b69678..bc6b216bfc 100644
--- a/docs/en_US/NAS/Overview.md
+++ b/docs/en_US/NAS/Overview.md
@@ -1,6 +1,6 @@
 # NNI Programming Interface for Neural Architecture Search (NAS)
 
-*This is an experimental feature, programming APIs are almost done, NAS trainers are under intensive deveropment.*
+*This is an experimental feature, programming APIs are almost done, NAS trainers are under intensive deveropment. ([NAS annotation](../AdvancedFeature/GeneralNasInterfaces.md) will become deprecated in future)*
 
 Automatic neural architecture search is taking an increasingly important role on finding better models. Recent research works have proved the feasibility of automatic NAS, and also found some models that could beat manually designed and tuned models. Some of representative works are [NASNet][2], [ENAS][1], [DARTS][3], [Network Morphism][4], and [Evolution][5]. There are new innovations keeping emerging. However, it takes great efforts to implement those algorithms, and it is hard to reuse code base of one algorithm for implementing another.
 

From 8f9c7bc7cf40dc000d5965c51ba5c051dd88b481 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Tue, 5 Nov 2019 10:19:38 +0800
Subject: [PATCH 04/60] update

---
 docs/en_US/NAS/Overview.md | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md
index bc6b216bfc..48eab76e57 100644
--- a/docs/en_US/NAS/Overview.md
+++ b/docs/en_US/NAS/Overview.md
@@ -17,7 +17,12 @@ For expressing neural architecture search space, we provide two APIs:
 
 ```python
 # choose one ``op`` from ``ops``, for pytorch this is a module.
-# ops: for pytorch ``ops`` is a list of modules, for tensorflow it is a list of keras layers.
+# ops: for pytorch ``ops`` is a list of modules, for tensorflow it is a list of keras layers. An example in pytroch:
+# ops = [PoolBN('max', channels, 3, stride, 1, affine=False),
+#        PoolBN('avg', channels, 3, stride, 1, affine=False),
+#        FactorizedReduce(channels, channels, affine=False),
+#        SepConv(channels, channels, 3, stride, 1, affine=False),
+#        DilConv(channels, channels, 3, stride, 2, 2, affine=False)]
 # key: the name of this ``LayerChoice`` instance
 nni.nas.LayerChoice(ops, key)
 # choose ``n_selected`` from ``n_candidates`` inputs.

From 0e7f6b961c6411d38455b83d5c4ea71ac9a97ba8 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Tue, 5 Nov 2019 10:22:00 +0800
Subject: [PATCH 05/60] update

---
 docs/en_US/NAS/Overview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md
index 48eab76e57..bedf503b79 100644
--- a/docs/en_US/NAS/Overview.md
+++ b/docs/en_US/NAS/Overview.md
@@ -1,6 +1,6 @@
 # NNI Programming Interface for Neural Architecture Search (NAS)
 
-*This is an experimental feature, programming APIs are almost done, NAS trainers are under intensive deveropment. ([NAS annotation](../AdvancedFeature/GeneralNasInterfaces.md) will become deprecated in future)*
+*This is an experimental feature, programming APIs are almost done, NAS trainers are under intensive development. ([NAS annotation](../AdvancedFeature/GeneralNasInterfaces.md) will become deprecated in future)*
 
 Automatic neural architecture search is taking an increasingly important role on finding better models. Recent research works have proved the feasibility of automatic NAS, and also found some models that could beat manually designed and tuned models. Some of representative works are [NASNet][2], [ENAS][1], [DARTS][3], [Network Morphism][4], and [Evolution][5]. There are new innovations keeping emerging. However, it takes great efforts to implement those algorithms, and it is hard to reuse code base of one algorithm for implementing another.
 

From bccb536d1b09169ddf26d05ca4b1271cb2d71ba6 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Wed, 13 Nov 2019 19:55:55 +0800
Subject: [PATCH 06/60] init commit

---
 examples/nas/proxylessnas/datasets.py         | 203 ++++++
 examples/nas/proxylessnas/model.py            | 150 ++++
 examples/nas/proxylessnas/ops.py              | 680 ++++++++++++++++++
 examples/nas/proxylessnas/search.py           | 114 +++
 examples/nas/proxylessnas/utils.py            |  62 ++
 .../nni/nas/pytorch/proxylessnas/__init__.py  |   2 +
 .../nni/nas/pytorch/proxylessnas/mutator.py   |  38 +
 .../nni/nas/pytorch/proxylessnas/trainer.py   | 101 +++
 8 files changed, 1350 insertions(+)
 create mode 100644 examples/nas/proxylessnas/datasets.py
 create mode 100644 examples/nas/proxylessnas/model.py
 create mode 100644 examples/nas/proxylessnas/ops.py
 create mode 100644 examples/nas/proxylessnas/search.py
 create mode 100644 examples/nas/proxylessnas/utils.py
 create mode 100644 src/sdk/pynni/nni/nas/pytorch/proxylessnas/__init__.py
 create mode 100644 src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
 create mode 100644 src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py

diff --git a/examples/nas/proxylessnas/datasets.py b/examples/nas/proxylessnas/datasets.py
new file mode 100644
index 0000000000..4052298305
--- /dev/null
+++ b/examples/nas/proxylessnas/datasets.py
@@ -0,0 +1,203 @@
+# Copyright (c) Microsoft Corporation
+# All rights reserved.
+#
+# MIT License
+#
+# Permission is hereby granted, free of charge,
+# to any person obtaining a copy of this software and associated
+# documentation files (the "Software"), to deal in the Software without restriction,
+# including without limitation the rights to use, copy, modify, merge, publish,
+# distribute, sublicense, and/or sell copies of the Software, and
+# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+# The above copyright notice and this permission notice shall be included
+# in all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
+# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
+# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+
+import numpy as np
+import torch.utils.data
+import torchvision.transforms as transforms
+import torchvision.datasets as datasets
+
+
+class DataProvider:
+    VALID_SEED = 0  # random seed for the validation set
+
+    @staticmethod
+    def name():
+        """ Return name of the dataset """
+        raise NotImplementedError
+
+    @property
+    def data_shape(self):
+        """ Return shape as python list of one data entry """
+        raise NotImplementedError
+
+    @property
+    def n_classes(self):
+        """ Return `int` of num classes """
+        raise NotImplementedError
+
+    @property
+    def save_path(self):
+        """ local path to save the data """
+        raise NotImplementedError
+
+    @property
+    def data_url(self):
+        """ link to download the data """
+        raise NotImplementedError
+
+    @staticmethod
+    def random_sample_valid_set(train_labels, valid_size, n_classes):
+        train_size = len(train_labels)
+        assert train_size > valid_size
+
+        g = torch.Generator()
+        g.manual_seed(DataProvider.VALID_SEED)  # set random seed before sampling validation set
+        rand_indexes = torch.randperm(train_size, generator=g).tolist()
+
+        train_indexes, valid_indexes = [], []
+        per_class_remain = get_split_list(valid_size, n_classes)
+
+        for idx in rand_indexes:
+            label = train_labels[idx]
+            if isinstance(label, float):
+                label = int(label)
+            elif isinstance(label, np.ndarray):
+                label = np.argmax(label)
+            else:
+                assert isinstance(label, int)
+            if per_class_remain[label] > 0:
+                valid_indexes.append(idx)
+                per_class_remain[label] -= 1
+            else:
+                train_indexes.append(idx)
+        return train_indexes, valid_indexes
+
+
+class ImagenetDataProvider(DataProvider):
+
+    def __init__(self, save_path=None, train_batch_size=256, test_batch_size=512, valid_size=None,
+                 n_worker=32, resize_scale=0.08, distort_color=None):
+
+        self._save_path = save_path
+        train_transforms = self.build_train_transform(distort_color, resize_scale)
+        train_dataset = datasets.ImageFolder(self.train_path, train_transforms)
+
+        if valid_size is not None:
+            if isinstance(valid_size, float):
+                valid_size = int(valid_size * len(train_dataset))
+            else:
+                assert isinstance(valid_size, int), 'invalid valid_size: %s' % valid_size
+            train_indexes, valid_indexes = self.random_sample_valid_set(
+                [cls for _, cls in train_dataset.samples], valid_size, self.n_classes,
+            )
+            train_sampler = torch.utils.data.sampler.SubsetRandomSampler(train_indexes)
+            valid_sampler = torch.utils.data.sampler.SubsetRandomSampler(valid_indexes)
+
+            valid_dataset = datasets.ImageFolder(self.train_path, transforms.Compose([
+                transforms.Resize(self.resize_value),
+                transforms.CenterCrop(self.image_size),
+                transforms.ToTensor(),
+                self.normalize,
+            ]))
+
+            self.train = torch.utils.data.DataLoader(
+                train_dataset, batch_size=train_batch_size, sampler=train_sampler,
+                num_workers=n_worker, pin_memory=True,
+            )
+            self.valid = torch.utils.data.DataLoader(
+                valid_dataset, batch_size=test_batch_size, sampler=valid_sampler,
+                num_workers=n_worker, pin_memory=True,
+            )
+        else:
+            self.train = torch.utils.data.DataLoader(
+                train_dataset, batch_size=train_batch_size, shuffle=True,
+                num_workers=n_worker, pin_memory=True,
+            )
+            self.valid = None
+
+        self.test = torch.utils.data.DataLoader(
+            datasets.ImageFolder(self.valid_path, transforms.Compose([
+                transforms.Resize(self.resize_value),
+                transforms.CenterCrop(self.image_size),
+                transforms.ToTensor(),
+                self.normalize,
+            ])), batch_size=test_batch_size, shuffle=False, num_workers=n_worker, pin_memory=True,
+        )
+
+        if self.valid is None:
+            self.valid = self.test
+
+    @staticmethod
+    def name():
+        return 'imagenet'
+
+    @property
+    def data_shape(self):
+        return 3, self.image_size, self.image_size  # C, H, W
+
+    @property
+    def n_classes(self):
+        return 1000
+
+    @property
+    def save_path(self):
+        if self._save_path is None:
+            self._save_path = '/dataset/imagenet'
+        return self._save_path
+
+    @property
+    def data_url(self):
+        raise ValueError('unable to download ImageNet')
+
+    @property
+    def train_path(self):
+        return os.path.join(self.save_path, 'train')
+
+    @property
+    def valid_path(self):
+        return os.path.join(self._save_path, 'val')
+
+    @property
+    def normalize(self):
+        return transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
+
+    def build_train_transform(self, distort_color, resize_scale):
+        print('Color jitter: %s' % distort_color)
+        if distort_color == 'strong':
+            color_transform = transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4, hue=0.1)
+        elif distort_color == 'normal':
+            color_transform = transforms.ColorJitter(brightness=32. / 255., saturation=0.5)
+        else:
+            color_transform = None
+        if color_transform is None:
+            train_transforms = transforms.Compose([
+                transforms.RandomResizedCrop(self.image_size, scale=(resize_scale, 1.0)),
+                transforms.RandomHorizontalFlip(),
+                transforms.ToTensor(),
+                self.normalize,
+            ])
+        else:
+            train_transforms = transforms.Compose([
+                transforms.RandomResizedCrop(self.image_size, scale=(resize_scale, 1.0)),
+                transforms.RandomHorizontalFlip(),
+                color_transform,
+                transforms.ToTensor(),
+                self.normalize,
+            ])
+        return train_transforms
+
+    @property
+    def resize_value(self):
+        return 256
+
+    @property
+    def image_size(self):
+        return 224
\ No newline at end of file
diff --git a/examples/nas/proxylessnas/model.py b/examples/nas/proxylessnas/model.py
new file mode 100644
index 0000000000..93c629dd66
--- /dev/null
+++ b/examples/nas/proxylessnas/model.py
@@ -0,0 +1,150 @@
+# Copyright (c) Microsoft Corporation
+# All rights reserved.
+#
+# MIT License
+#
+# Permission is hereby granted, free of charge,
+# to any person obtaining a copy of this software and associated
+# documentation files (the "Software"), to deal in the Software without restriction,
+# including without limitation the rights to use, copy, modify, merge, publish,
+# distribute, sublicense, and/or sell copies of the Software, and
+# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+# The above copyright notice and this permission notice shall be included
+# in all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
+# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
+# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+import torch
+import torch.nn as nn
+
+import ops
+import utils
+from nni.nas import pytorch as nas
+
+class SearchMobileNet(nn.Module):
+    def __init__(self,
+                 width_stages=[24,40,80,96,192,320],
+                 n_cell_stages=[4,4,4,4,4,1],
+                 stride_stages=[2,2,2,1,2,1],
+                 width_mult=1, n_classes=1000,
+                 dropout_rate=0, bn_param=(0.1, 1e-3)):
+        """
+        Parameters
+        ----------
+        width_stages: str
+            width (output channels) of each cell stage in the block
+        n_cell_stages: str
+            number of cells in each cell stage
+        stride_strages: str
+            stride of each cell stage in the block
+        width_mult : int
+            the scale factor of width
+        """
+        input_channel = utils.make_devisible(32 * width_mult, 8)
+        first_cell_width = utils.make_devisible(16 * width_mult, 8)
+        for i in range(len(width_stages)):
+            width_stages[i] = utils.make_devisible(width_stages[i] * width_mult, 8)
+        # first conv
+        first_conv = ConvLayer(3, input_channel, kernel_size=3, stride=2, use_bn=True, act_func='relu6', ops_order='weight_bn_act')
+        # first block
+        first_block_conv = ops.OPS['3x3_MBConv1'](input_channel, first_cell_width, 1)
+        first_block = MobileInvertedResidualBlock(first_block_conv, None)
+
+        input_channel = first_cell_width
+
+        blocks = [first_block]
+
+        stage_cnt = 0
+        for width, n_cell, s in zip(width_stages, n_cell_stages, stride_stages):
+            for i in range(n_cell):
+                if i == 0:
+                    stride = s
+                else:
+                    stride = 1
+                if stride == 1 and input_channel == width:
+                    # if it is not the first one
+                    conv_op = nas.mutables.LayerChoice([ops.OPS['3x3_MBConv3'](input_channel, width, stride),
+                                               ops.OPS['3x3_MBConv6'](input_channel, width, stride),
+                                               ops.OPS['5x5_MBConv3'](input_channel, width, stride),
+                                               ops.OPS['5x5_MBConv6'](input_channel, width, stride),
+                                               ops.OPS['7x7_MBConv3'](input_channel, width, stride),
+                                               ops.OPS['7x7_MBConv6'](input_channel, width, stride),
+                                               ops.OPS['Zero'](input_channel, width, stride)],
+                                               key="s{}_c{}".format(stage_cnt, i))
+                else:
+                    conv_op = nas.mutables.LayerChoice([ops.OPS['3x3_MBConv3'](input_channel, width, stride),
+                                               ops.OPS['3x3_MBConv6'](input_channel, width, stride),
+                                               ops.OPS['5x5_MBConv3'](input_channel, width, stride),
+                                               ops.OPS['5x5_MBConv6'](input_channel, width, stride),
+                                               ops.OPS['7x7_MBConv3'](input_channel, width, stride),
+                                               ops.OPS['7x7_MBConv6'](input_channel, width, stride)],
+                                               key="s{}_c{}".format(stage_cnt, i))
+                # shortcut
+                if stride == 1 and input_channel == width:
+                    # if not first cell
+                    shortcut = IndentityLayer(input_channel, input_channel)
+                else:
+                    shortcut = None
+                inverted_residual_block = MobileInvertedResidualBlock(conv_op, shortcut)
+                blocks.append(inverted_residual_block)
+                input_channel = width
+            stage_cnt += 1
+
+        # feature mix layer
+        last_channel = utils.make_devisible(1280 * width_mult, 8) if width_mult > 1.0 else 1280
+        feature_mix_layer = ConvLayer(input_channel, last_channel, kernel_size=1, use_bn=True, act_func='relu6', ops_order='weight_bn_act', )
+        classifier = LinearLayer(last_channel, n_classes, dropout_rate=dropout_rate)
+
+        self.first_conv = first_conv
+        self.blocks = nn.ModuleList(blocks)
+        self.feature_mix_layer = feature_mix_layer
+        self.global_avg_pooling = nn.AdaptiveAvgPool2d(1)
+        self.classifier = classifier
+
+        # set bn param
+        self.set_bn_param(momentum=bn_param[0], eps=bn_param[1])
+
+    def forward(self, x):
+        x = self.first_conv(x)
+        for block in self.blocks:
+            x = block(x)
+        x = self.feature_mix_layer(x)
+        x = self.global_avg_pooling(x)
+        x = x.view(x.size(0), -1)
+        x = self.classifier(x)
+        return x
+
+    def set_bn_param(self, momentum, eps):
+        for m in self.modules():
+            if isinstance(m, nn.BatchNorm2d) or isinstance(m, nn.BatchNorm1d):
+                m.momentum = momentum
+                m.eps = eps
+        return
+
+    def init_model(self, model_init='he_fout', init_div_groups=False):
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                if model_init == 'he_fout':
+                    n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
+                    if init_dev_groups:
+                        n /= m.groups
+                    m.weight.data.normal_(0, math, sqrt(2. / n))
+                elif model_init == 'he_fin':
+                    n = m.kernel_size[0] * m.kernel_size[1] * m.in_channels
+                    if init_dev_groups:
+                        n /= m.groups
+                    m.weight.data.normal_(0, math, sqrt(2. / n))
+                else:
+                    raise NotImplementedError
+            elif isinstance(m, nn.BatchNorm2d) or isinstance(m, nn.BatchNorm1d):
+                m.weight.data.fill_(1)
+                m.bias.data.zero_()
+            elif isinstance(m, nn.Linear):
+                stdv = 1. / math.sqrt(m.weight.size(1))
+                m.weight.data.uniform_(-stdv, stdv)
+                if m.bias is not None:
+                    m.bias.data.zero_()
diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py
new file mode 100644
index 0000000000..143ab73c2d
--- /dev/null
+++ b/examples/nas/proxylessnas/ops.py
@@ -0,0 +1,680 @@
+# Copyright (c) Microsoft Corporation
+# All rights reserved.
+#
+# MIT License
+#
+# Permission is hereby granted, free of charge,
+# to any person obtaining a copy of this software and associated
+# documentation files (the "Software"), to deal in the Software without restriction,
+# including without limitation the rights to use, copy, modify, merge, publish,
+# distribute, sublicense, and/or sell copies of the Software, and
+# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+# The above copyright notice and this permission notice shall be included
+# in all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
+# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
+# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+from utils import *
+from collections import OrderedDict
+import torch.nn as nn
+
+
+OPS = {
+    'Identity': lambda in_C, out_C, stride: IdentityLayer(in_C, out_C, ops_order='weight_bn_act'),
+    'Zero': lambda in_C, out_C, stride: ZeroLayer(stride=stride),
+    '3x3_MBConv1': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 3, stride, 1),
+    '3x3_MBConv2': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 3, stride, 2),
+    '3x3_MBConv3': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 3, stride, 3),
+    '3x3_MBConv4': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 3, stride, 4),
+    '3x3_MBConv5': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 3, stride, 5),
+    '3x3_MBConv6': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 3, stride, 6),
+    '5x5_MBConv1': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 5, stride, 1),
+    '5x5_MBConv2': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 5, stride, 2),
+    '5x5_MBConv3': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 5, stride, 3),
+    '5x5_MBConv4': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 5, stride, 4),
+    '5x5_MBConv5': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 5, stride, 5),
+    '5x5_MBConv6': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 5, stride, 6),
+    '7x7_MBConv1': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 7, stride, 1),
+    '7x7_MBConv2': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 7, stride, 2),
+    '7x7_MBConv3': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 7, stride, 3),
+    '7x7_MBConv4': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 7, stride, 4),
+    '7x7_MBConv5': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 7, stride, 5),
+    '7x7_MBConv6': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 7, stride, 6)
+}
+
+#========================================
+
+class MobileInvertedResidualBlock(MyModule):
+    
+    def __init__(self, mobile_inverted_conv, shortcut):
+        super(MobileInvertedResidualBlock, self).__init__()
+
+        self.mobile_inverted_conv = mobile_inverted_conv
+        self.shortcut = shortcut
+
+    def forward(self, x):
+        if self.mobile_inverted_conv.is_zero_layer():
+            res = x
+        elif self.shortcut is None or self.shortcut.is_zero_layer():
+            res = self.mobile_inverted_conv(x)
+        else:
+            conv_x = self.mobile_inverted_conv(x)
+            skip_x = self.shortcut(x)
+            res = skip_x + conv_x
+        return res
+
+    @property
+    def module_str(self):
+        return '(%s, %s)' % (
+            self.mobile_inverted_conv.module_str, self.shortcut.module_str if self.shortcut is not None else None
+        )
+
+    @property
+    def config(self):
+        return {
+            'name': MobileInvertedResidualBlock.__name__,
+            'mobile_inverted_conv': self.mobile_inverted_conv.config,
+            'shortcut': self.shortcut.config if self.shortcut is not None else None,
+        }
+
+    @staticmethod
+    def build_from_config(config):
+        mobile_inverted_conv = set_layer_from_config(config['mobile_inverted_conv'])
+        shortcut = set_layer_from_config(config['shortcut'])
+        return MobileInvertedResidualBlock(mobile_inverted_conv, shortcut)
+
+    def get_flops(self, x):
+        flops1, conv_x = self.mobile_inverted_conv.get_flops(x)
+        if self.shortcut:
+            flops2, _ = self.shortcut.get_flops(x)
+        else:
+            flops2 = 0
+
+        return flops1 + flops2, self.forward(x)
+
+#========================================
+
+def count_conv_flop(layer, x):
+    out_h = int(x.size()[2] / layer.stride[0])
+    out_w = int(x.size()[3] / layer.stride[1])
+    delta_ops = layer.in_channels * layer.out_channels * layer.kernel_size[0] * layer.kernel_size[1] * \
+                out_h * out_w / layer.groups
+    return delta_ops
+
+class ShuffleLayer(nn.Module):
+    def __init__(self, groups):
+        super(ShuffleLayer, self).__init__()
+        self.groups = groups
+
+    def forward(self, x):
+        batchsize, num_channels, height, width = x.size()
+        channels_per_group = num_channels // self.groups
+        # reshape
+        x = x.view(batchsize, self.groups, channels_per_group, height, width)
+        # noinspection PyUnresolvedReferences
+        x = torch.transpose(x, 1, 2).contiguous()
+        # flatten
+        x = x.view(batchsize, -1, height, width)
+        return x
+
+class MyModule(nn.Module):
+
+    def forward(self, x):
+        raise NotImplementedError
+
+    @property
+    def module_str(self):
+        raise NotImplementedError
+
+    @property
+    def config(self):
+        raise NotImplementedError
+
+    @staticmethod
+    def build_from_config(config):
+        raise NotImplementedError
+
+    def get_flops(self, x):
+        raise NotImplementedError
+
+class My2DLayer(MyModule):
+    
+    def __init__(self, in_channels, out_channels,
+                 use_bn=True, act_func='relu', dropout_rate=0, ops_order='weight_bn_act'):
+        super(My2DLayer, self).__init__()
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+
+        self.use_bn = use_bn
+        self.act_func = act_func
+        self.dropout_rate = dropout_rate
+        self.ops_order = ops_order
+
+        """ modules """
+        modules = {}
+        # batch norm
+        if self.use_bn:
+            if self.bn_before_weight:
+                modules['bn'] = nn.BatchNorm2d(in_channels)
+            else:
+                modules['bn'] = nn.BatchNorm2d(out_channels)
+        else:
+            modules['bn'] = None
+        # activation
+        modules['act'] = build_activation(self.act_func, self.ops_list[0] != 'act')
+        # dropout
+        if self.dropout_rate > 0:
+            modules['dropout'] = nn.Dropout2d(self.dropout_rate, inplace=True)
+        else:
+            modules['dropout'] = None
+        # weight
+        modules['weight'] = self.weight_op()
+
+        # add modules
+        for op in self.ops_list:
+            if modules[op] is None:
+                continue
+            elif op == 'weight':
+                if modules['dropout'] is not None:
+                    self.add_module('dropout', modules['dropout'])
+                for key in modules['weight']:
+                    self.add_module(key, modules['weight'][key])
+            else:
+                self.add_module(op, modules[op])
+
+    @property
+    def ops_list(self):
+        return self.ops_order.split('_')
+
+    @property
+    def bn_before_weight(self):
+        for op in self.ops_list:
+            if op == 'bn':
+                return True
+            elif op == 'weight':
+                return False
+        raise ValueError('Invalid ops_order: %s' % self.ops_order)
+
+    def weight_op(self):
+        raise NotImplementedError
+
+    """ Methods defined in MyModule """
+
+    def forward(self, x):
+        for module in self._modules.values():
+            x = module(x)
+        return x
+
+    @property
+    def module_str(self):
+        raise NotImplementedError
+
+    @property
+    def config(self):
+        return {
+            'in_channels': self.in_channels,
+            'out_channels': self.out_channels,
+            'use_bn': self.use_bn,
+            'act_func': self.act_func,
+            'dropout_rate': self.dropout_rate,
+            'ops_order': self.ops_order,
+        }
+
+    @staticmethod
+    def build_from_config(config):
+        raise NotImplementedError
+
+    def get_flops(self, x):
+        raise NotImplementedError
+
+    @staticmethod
+    def is_zero_layer():
+        return False
+
+
+class ConvLayer(My2DLayer):
+
+    def __init__(self, in_channels, out_channels,
+                 kernel_size=3, stride=1, dilation=1, groups=1, bias=False, has_shuffle=False,
+                 use_bn=True, act_func='relu', dropout_rate=0, ops_order='weight_bn_act'):
+        self.kernel_size = kernel_size
+        self.stride = stride
+        self.dilation = dilation
+        self.groups = groups
+        self.bias = bias
+        self.has_shuffle = has_shuffle
+
+        super(ConvLayer, self).__init__(in_channels, out_channels, use_bn, act_func, dropout_rate, ops_order)
+
+    def weight_op(self):
+        padding = get_same_padding(self.kernel_size)
+        if isinstance(padding, int):
+            padding *= self.dilation
+        else:
+            padding[0] *= self.dilation
+            padding[1] *= self.dilation
+
+        weight_dict = OrderedDict()
+        weight_dict['conv'] = nn.Conv2d(
+            self.in_channels, self.out_channels, kernel_size=self.kernel_size, stride=self.stride, padding=padding,
+            dilation=self.dilation, groups=self.groups, bias=self.bias
+        )
+        if self.has_shuffle and self.groups > 1:
+            weight_dict['shuffle'] = ShuffleLayer(self.groups)
+
+        return weight_dict
+
+    @property
+    def module_str(self):
+        if isinstance(self.kernel_size, int):
+            kernel_size = (self.kernel_size, self.kernel_size)
+        else:
+            kernel_size = self.kernel_size
+        if self.groups == 1:
+            if self.dilation > 1:
+                return '%dx%d_DilatedConv' % (kernel_size[0], kernel_size[1])
+            else:
+                return '%dx%d_Conv' % (kernel_size[0], kernel_size[1])
+        else:
+            if self.dilation > 1:
+                return '%dx%d_DilatedGroupConv' % (kernel_size[0], kernel_size[1])
+            else:
+                return '%dx%d_GroupConv' % (kernel_size[0], kernel_size[1])
+
+    @property
+    def config(self):
+        return {
+            'name': ConvLayer.__name__,
+            'kernel_size': self.kernel_size,
+            'stride': self.stride,
+            'dilation': self.dilation,
+            'groups': self.groups,
+            'bias': self.bias,
+            'has_shuffle': self.has_shuffle,
+            **super(ConvLayer, self).config,
+        }
+
+    @staticmethod
+    def build_from_config(config):
+        return ConvLayer(**config)
+
+    def get_flops(self, x):
+        return count_conv_flop(self.conv, x), self.forward(x)
+
+
+class DepthConvLayer(My2DLayer):
+
+    def __init__(self, in_channels, out_channels,
+                 kernel_size=3, stride=1, dilation=1, groups=1, bias=False, has_shuffle=False,
+                 use_bn=True, act_func='relu', dropout_rate=0, ops_order='weight_bn_act'):
+        self.kernel_size = kernel_size
+        self.stride = stride
+        self.dilation = dilation
+        self.groups = groups
+        self.bias = bias
+        self.has_shuffle = has_shuffle
+
+        super(DepthConvLayer, self).__init__(
+            in_channels, out_channels, use_bn, act_func, dropout_rate, ops_order
+        )
+
+    def weight_op(self):
+        padding = get_same_padding(self.kernel_size)
+        if isinstance(padding, int):
+            padding *= self.dilation
+        else:
+            padding[0] *= self.dilation
+            padding[1] *= self.dilation
+
+        weight_dict = OrderedDict()
+        weight_dict['depth_conv'] = nn.Conv2d(
+            self.in_channels, self.in_channels, kernel_size=self.kernel_size, stride=self.stride, padding=padding,
+            dilation=self.dilation, groups=self.in_channels, bias=False
+        )
+        weight_dict['point_conv'] = nn.Conv2d(
+            self.in_channels, self.out_channels, kernel_size=1, groups=self.groups, bias=self.bias
+        )
+        if self.has_shuffle and self.groups > 1:
+            weight_dict['shuffle'] = ShuffleLayer(self.groups)
+        return weight_dict
+
+    @property
+    def module_str(self):
+        if isinstance(self.kernel_size, int):
+            kernel_size = (self.kernel_size, self.kernel_size)
+        else:
+            kernel_size = self.kernel_size
+        if self.dilation > 1:
+            return '%dx%d_DilatedDepthConv' % (kernel_size[0], kernel_size[1])
+        else:
+            return '%dx%d_DepthConv' % (kernel_size[0], kernel_size[1])
+
+    @property
+    def config(self):
+        return {
+            'name': DepthConvLayer.__name__,
+            'kernel_size': self.kernel_size,
+            'stride': self.stride,
+            'dilation': self.dilation,
+            'groups': self.groups,
+            'bias': self.bias,
+            'has_shuffle': self.has_shuffle,
+            **super(DepthConvLayer, self).config,
+        }
+
+    @staticmethod
+    def build_from_config(config):
+        return DepthConvLayer(**config)
+
+    def get_flops(self, x):
+        depth_flop = count_conv_flop(self.depth_conv, x)
+        x = self.depth_conv(x)
+        point_flop = count_conv_flop(self.point_conv, x)
+        x = self.point_conv(x)
+        return depth_flop + point_flop, self.forward(x)
+
+
+class PoolingLayer(My2DLayer):
+
+    def __init__(self, in_channels, out_channels,
+                 pool_type, kernel_size=2, stride=2,
+                 use_bn=False, act_func=None, dropout_rate=0, ops_order='weight_bn_act'):
+        self.pool_type = pool_type
+        self.kernel_size = kernel_size
+        self.stride = stride
+
+        super(PoolingLayer, self).__init__(in_channels, out_channels, use_bn, act_func, dropout_rate, ops_order)
+
+    def weight_op(self):
+        if self.stride == 1:
+            # same padding if `stride == 1`
+            padding = get_same_padding(self.kernel_size)
+        else:
+            padding = 0
+
+        weight_dict = OrderedDict()
+        if self.pool_type == 'avg':
+            weight_dict['pool'] = nn.AvgPool2d(
+                self.kernel_size, stride=self.stride, padding=padding, count_include_pad=False
+            )
+        elif self.pool_type == 'max':
+            weight_dict['pool'] = nn.MaxPool2d(self.kernel_size, stride=self.stride, padding=padding)
+        else:
+            raise NotImplementedError
+        return weight_dict
+
+    @property
+    def module_str(self):
+        if isinstance(self.kernel_size, int):
+            kernel_size = (self.kernel_size, self.kernel_size)
+        else:
+            kernel_size = self.kernel_size
+        return '%dx%d_%sPool' % (kernel_size[0], kernel_size[1], self.pool_type.upper())
+
+    @property
+    def config(self):
+        return {
+            'name': PoolingLayer.__name__,
+            'pool_type': self.pool_type,
+            'kernel_size': self.kernel_size,
+            'stride': self.stride,
+            **super(PoolingLayer, self).config
+        }
+
+    @staticmethod
+    def build_from_config(config):
+        return PoolingLayer(**config)
+
+    def get_flops(self, x):
+        return 0, self.forward(x)
+
+
+class IdentityLayer(My2DLayer):
+
+    def __init__(self, in_channels, out_channels,
+                 use_bn=False, act_func=None, dropout_rate=0, ops_order='weight_bn_act'):
+        super(IdentityLayer, self).__init__(in_channels, out_channels, use_bn, act_func, dropout_rate, ops_order)
+
+    def weight_op(self):
+        return None
+
+    @property
+    def module_str(self):
+        return 'Identity'
+
+    @property
+    def config(self):
+        return {
+            'name': IdentityLayer.__name__,
+            **super(IdentityLayer, self).config,
+        }
+
+    @staticmethod
+    def build_from_config(config):
+        return IdentityLayer(**config)
+
+    def get_flops(self, x):
+        return 0, self.forward(x)
+
+
+class LinearLayer(MyModule):
+
+    def __init__(self, in_features, out_features, bias=True,
+                 use_bn=False, act_func=None, dropout_rate=0, ops_order='weight_bn_act'):
+        super(LinearLayer, self).__init__()
+
+        self.in_features = in_features
+        self.out_features = out_features
+        self.bias = bias
+
+        self.use_bn = use_bn
+        self.act_func = act_func
+        self.dropout_rate = dropout_rate
+        self.ops_order = ops_order
+
+        """ modules """
+        modules = {}
+        # batch norm
+        if self.use_bn:
+            if self.bn_before_weight:
+                modules['bn'] = nn.BatchNorm1d(in_features)
+            else:
+                modules['bn'] = nn.BatchNorm1d(out_features)
+        else:
+            modules['bn'] = None
+        # activation
+        modules['act'] = build_activation(self.act_func, self.ops_list[0] != 'act')
+        # dropout
+        if self.dropout_rate > 0:
+            modules['dropout'] = nn.Dropout(self.dropout_rate, inplace=True)
+        else:
+            modules['dropout'] = None
+        # linear
+        modules['weight'] = {'linear': nn.Linear(self.in_features, self.out_features, self.bias)}
+
+        # add modules
+        for op in self.ops_list:
+            if modules[op] is None:
+                continue
+            elif op == 'weight':
+                if modules['dropout'] is not None:
+                    self.add_module('dropout', modules['dropout'])
+                for key in modules['weight']:
+                    self.add_module(key, modules['weight'][key])
+            else:
+                self.add_module(op, modules[op])
+
+    @property
+    def ops_list(self):
+        return self.ops_order.split('_')
+
+    @property
+    def bn_before_weight(self):
+        for op in self.ops_list:
+            if op == 'bn':
+                return True
+            elif op == 'weight':
+                return False
+        raise ValueError('Invalid ops_order: %s' % self.ops_order)
+
+    def forward(self, x):
+        for module in self._modules.values():
+            x = module(x)
+        return x
+
+    @property
+    def module_str(self):
+        return '%dx%d_Linear' % (self.in_features, self.out_features)
+
+    @property
+    def config(self):
+        return {
+            'name': LinearLayer.__name__,
+            'in_features': self.in_features,
+            'out_features': self.out_features,
+            'bias': self.bias,
+            'use_bn': self.use_bn,
+            'act_func': self.act_func,
+            'dropout_rate': self.dropout_rate,
+            'ops_order': self.ops_order,
+        }
+
+    @staticmethod
+    def build_from_config(config):
+        return LinearLayer(**config)
+
+    def get_flops(self, x):
+        return self.linear.weight.numel(), self.forward(x)
+
+    @staticmethod
+    def is_zero_layer():
+        return False
+
+
+class MBInvertedConvLayer(MyModule):
+
+    def __init__(self, in_channels, out_channels,
+                 kernel_size=3, stride=1, expand_ratio=6, mid_channels=None):
+        super(MBInvertedConvLayer, self).__init__()
+
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+
+        self.kernel_size = kernel_size
+        self.stride = stride
+        self.expand_ratio = expand_ratio
+        self.mid_channels = mid_channels
+
+        if self.mid_channels is None:
+            feature_dim = round(self.in_channels * self.expand_ratio)
+        else:
+            feature_dim = self.mid_channels
+
+        if self.expand_ratio == 1:
+            self.inverted_bottleneck = None
+        else:
+            self.inverted_bottleneck = nn.Sequential(OrderedDict([
+                ('conv', nn.Conv2d(self.in_channels, feature_dim, 1, 1, 0, bias=False)),
+                ('bn', nn.BatchNorm2d(feature_dim)),
+                ('act', nn.ReLU6(inplace=True)),
+            ]))
+
+        pad = get_same_padding(self.kernel_size)
+        self.depth_conv = nn.Sequential(OrderedDict([
+            ('conv', nn.Conv2d(feature_dim, feature_dim, kernel_size, stride, pad, groups=feature_dim, bias=False)),
+            ('bn', nn.BatchNorm2d(feature_dim)),
+            ('act', nn.ReLU6(inplace=True)),
+        ]))
+
+        self.point_linear = nn.Sequential(OrderedDict([
+            ('conv', nn.Conv2d(feature_dim, out_channels, 1, 1, 0, bias=False)),
+            ('bn', nn.BatchNorm2d(out_channels)),
+        ]))
+
+    def forward(self, x):
+        if self.inverted_bottleneck:
+            x = self.inverted_bottleneck(x)
+        x = self.depth_conv(x)
+        x = self.point_linear(x)
+        return x
+
+    @property
+    def module_str(self):
+        return '%dx%d_MBConv%d' % (self.kernel_size, self.kernel_size, self.expand_ratio)
+
+    @property
+    def config(self):
+        return {
+            'name': MBInvertedConvLayer.__name__,
+            'in_channels': self.in_channels,
+            'out_channels': self.out_channels,
+            'kernel_size': self.kernel_size,
+            'stride': self.stride,
+            'expand_ratio': self.expand_ratio,
+            'mid_channels': self.mid_channels,
+        }
+
+    @staticmethod
+    def build_from_config(config):
+        return MBInvertedConvLayer(**config)
+
+    def get_flops(self, x):
+        if self.inverted_bottleneck:
+            flop1 = count_conv_flop(self.inverted_bottleneck.conv, x)
+            x = self.inverted_bottleneck(x)
+        else:
+            flop1 = 0
+
+        flop2 = count_conv_flop(self.depth_conv.conv, x)
+        x = self.depth_conv(x)
+
+        flop3 = count_conv_flop(self.point_linear.conv, x)
+        x = self.point_linear(x)
+
+        return flop1 + flop2 + flop3, x
+
+    @staticmethod
+    def is_zero_layer():
+        return False
+
+
+class ZeroLayer(MyModule):
+
+    def __init__(self, stride):
+        super(ZeroLayer, self).__init__()
+        self.stride = stride
+
+    def forward(self, x):
+        n, c, h, w = x.size()
+        h //= self.stride
+        w //= self.stride
+        device = x.get_device() if x.is_cuda else torch.device('cpu')
+        # noinspection PyUnresolvedReferences
+        padding = torch.zeros(n, c, h, w, device=device, requires_grad=False)
+        return padding
+
+    @property
+    def module_str(self):
+        return 'Zero'
+
+    @property
+    def config(self):
+        return {
+            'name': ZeroLayer.__name__,
+            'stride': self.stride,
+        }
+
+    @staticmethod
+    def build_from_config(config):
+        return ZeroLayer(**config)
+
+    def get_flops(self, x):
+        return 0, self.forward(x)
+
+    @staticmethod
+    def is_zero_layer():
+        return True
\ No newline at end of file
diff --git a/examples/nas/proxylessnas/search.py b/examples/nas/proxylessnas/search.py
new file mode 100644
index 0000000000..c7b3c1bb5b
--- /dev/null
+++ b/examples/nas/proxylessnas/search.py
@@ -0,0 +1,114 @@
+# Copyright (c) Microsoft Corporation
+# All rights reserved.
+#
+# MIT License
+#
+# Permission is hereby granted, free of charge,
+# to any person obtaining a copy of this software and associated
+# documentation files (the "Software"), to deal in the Software without restriction,
+# including without limitation the rights to use, copy, modify, merge, publish,
+# distribute, sublicense, and/or sell copies of the Software, and
+# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+# The above copyright notice and this permission notice shall be included
+# in all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
+# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
+# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+from argparse import ArgumentParser
+
+import datasets
+import torch
+import torch.nn as nn
+
+from model import *
+from nni.nas.pytorch.darts import ProxylessNasTrainer
+from utils import *
+
+def get_parameters(keys=None, mode='include'):
+    if keys is None:
+        for name, param in self.named_parameters():
+            yield param
+    elif mode == 'include':
+        for name, param in self.named_parameters():
+            flag = False
+            for key in keys:
+                if key in name:
+                    flag = True
+                    break
+            if flag:
+                yield param
+    elif mode == 'exclude':
+        for name, param in self.named_parameters():
+            flag = True
+            for key in keys:
+                if key in name:
+                    flag = False
+                    break
+            if flag:
+                yield param
+    else:
+        raise ValueError('do not support: %s' % mode)
+
+
+if __name__ == "__main__":
+    parser = ArgumentParser("proxylessnas")
+    parser.add_argument("--layers", default=4, type=int)
+    parser.add_argument("--nodes", default=2, type=int)
+    parser.add_argument("--batch-size", default=128, type=int)
+    parser.add_argument("--log-frequency", default=1, type=int)
+    args = parser.parse_args()
+
+    #dataset_train, dataset_valid = datasets.get_dataset("cifar10")
+
+    model = SearchMobileNet()
+    model.init_model()
+
+    # move network to GPU if available
+    if torch.cuda.is_available():
+        device = torch.device('cuda:0')
+        #self.net = torch.nn.DataParallel(self.net)
+        model.to(device)
+        cudnn.benchmark = True
+    else:
+        raise ValueError
+        # self.device = torch.device('cpu')
+
+    # TODO: net info
+
+    criterion = nn.CrossEntropyLoss()
+
+    # TODO: removed decay_key
+    no_decay_keys = True
+    if no_decay_keys:
+        keys = ['bn']
+        momentum, nesterov = 0.9, True
+        optimizer = torch.optim.SGD([
+            {'params': get_parameters(keys, mode='exclude'), 'weight_decay': 4e-5},
+            {'params': get_parameters(keys, mode='include'), 'weight_decay': 0},
+        ], lr=0.05, momentum=momentum, nesterov=nesterov)
+    else:
+        optimizer = torch.optim.SGD(get_parameters(), lr=0.05, momentum=momentum, nesterov=nesterov, weight_decay=4e-5)
+
+    #n_epochs = 50
+    #lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optim, n_epochs, eta_min=0.001)
+
+    # TODO: 
+    data_provider = ImagenetDataProvider(train_batch_size=256,
+                                         test_batch_size=500,
+                                         valid_size=None,
+                                         n_worker=32,
+                                         resize_scale=0.08,
+                                         distort_color='normal')
+    train_loader = data_provider.train
+
+    trainer = ProxylessNasTrainer(model,
+                                  model_optim=optimizer,
+                                  train_loader=train_loader,
+                                  device=device)
+
+    trainer.train()
+    trainer.export()
diff --git a/examples/nas/proxylessnas/utils.py b/examples/nas/proxylessnas/utils.py
new file mode 100644
index 0000000000..0244da48c7
--- /dev/null
+++ b/examples/nas/proxylessnas/utils.py
@@ -0,0 +1,62 @@
+# Copyright (c) Microsoft Corporation
+# All rights reserved.
+#
+# MIT License
+#
+# Permission is hereby granted, free of charge,
+# to any person obtaining a copy of this software and associated
+# documentation files (the "Software"), to deal in the Software without restriction,
+# including without limitation the rights to use, copy, modify, merge, publish,
+# distribute, sublicense, and/or sell copies of the Software, and
+# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+# The above copyright notice and this permission notice shall be included
+# in all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
+# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
+# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+def make_divisible(v, divisor, min_val=None):
+    """
+    This function is taken from the original tf repo.
+    It ensures that all layers have a channel number that is divisible by 8
+    It can be seen here:
+    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
+    :param v:
+    :param divisor:
+    :param min_val:
+    :return:
+    """
+    if min_val is None:
+        min_val = divisor
+    new_v = max(min_val, int(v + divisor / 2) // divisor * divisor)
+    # Make sure that round down does not go down by more than 10%.
+    if new_v < 0.9 * v:
+        new_v += divisor
+    return new_v
+
+class AverageMeter(object):
+    """
+    Computes and stores the average and current value
+    Copied from: https://github.com/pytorch/examples/blob/master/imagenet/main.py
+    """
+
+    def __init__(self):
+        self.val = 0
+        self.avg = 0
+        self.sum = 0
+        self.count = 0
+
+    def reset(self):
+        self.val = 0
+        self.avg = 0
+        self.sum = 0
+        self.count = 0
+
+    def update(self, val, n=1):
+        self.val = val
+        self.sum += val * n
+        self.count += n
+        self.avg = self.sum / self.count
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/__init__.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/__init__.py
new file mode 100644
index 0000000000..26feedba7d
--- /dev/null
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/__init__.py
@@ -0,0 +1,2 @@
+from .mutator import ProxylessNasMutator
+from .trainer import ProxylessNasTrainer
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
new file mode 100644
index 0000000000..873dec12f4
--- /dev/null
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -0,0 +1,38 @@
+# Copyright (c) Microsoft Corporation
+# All rights reserved.
+#
+# MIT License
+#
+# Permission is hereby granted, free of charge,
+# to any person obtaining a copy of this software and associated
+# documentation files (the "Software"), to deal in the Software without restriction,
+# including without limitation the rights to use, copy, modify, merge, publish,
+# distribute, sublicense, and/or sell copies of the Software, and
+# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+# The above copyright notice and this permission notice shall be included
+# in all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
+# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
+# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+import torch
+from torch import nn as nn
+from torch.nn import functional as F
+
+from nni.nas.pytorch.mutables import LayerChoice
+from nni.nas.pytorch.mutator import PyTorchMutator
+
+
+class ProxylessNasMutator(PyTorchMutator):
+
+    def before_build(self, model):
+        self.choices = nn.ParameterDict()
+
+    def on_init_layer_choice(self, mutable: LayerChoice):
+        self.choices[mutable.key] = nn.Parameter(1.0E-3 * torch.randn(mutable.length))
+
+    def on_calc_layer_choice_mask(self, mutable: LayerChoice):
+        return F.softmax(self.choices[mutable.key], dim=-1)
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
new file mode 100644
index 0000000000..5ada24a5aa
--- /dev/null
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -0,0 +1,101 @@
+# Copyright (c) Microsoft Corporation
+# All rights reserved.
+#
+# MIT License
+#
+# Permission is hereby granted, free of charge,
+# to any person obtaining a copy of this software and associated
+# documentation files (the "Software"), to deal in the Software without restriction,
+# including without limitation the rights to use, copy, modify, merge, publish,
+# distribute, sublicense, and/or sell copies of the Software, and
+# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+# The above copyright notice and this permission notice shall be included
+# in all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
+# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
+# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+import copy
+import math
+
+import torch
+from torch import nn as nn
+
+from nni.nas.pytorch.trainer import Trainer
+from nni.nas.utils import AverageMeterGroup, auto_device
+from .mutator import ProxylessNasMutator
+
+
+class ProxylessNasTrainer(Trainer):
+    def __init__(self, model, model_optim, train_loader, device):
+        self.model = model
+        self.model_optim = model_optim
+        self.train_loader = train_loader
+        self.device = device
+
+        # TODO: arch search configs
+
+        self._init_arch_params()
+
+        # build architecture optimizer
+        self.arch_optimizer = torch.optim.Adam(self._architecture_parameters(), 1e-3, weight_decay=0)
+
+        self.warmup = True
+        self.warmup_epoch = 0
+
+    def _architecture_parameters(self):
+        for name, param in self.named_parameters():
+            if 'AP_path_alpha' in name:
+                yield param
+
+    def _init_arch_params(self, init_type='normal', init_ratio=1e-3):
+        for param in self._architecture_parameters():
+            if init_type == 'normal':
+                param.data.normal_(0, init_ratio)
+            elif init_type == 'uniform':
+                param.data.uniform_(-init_ratio, init_ratio)
+            else:
+                raise NotImplementedError
+
+    def _warm_up(self, warmup_epochs=25):
+        lr_max = 0.05
+        data_loader = self.train_loader
+        nBatch = len(data_loader)
+        T_total = warmup_epochs * nBatch # total num of batches
+
+        for epoch in range(self.warmup_epoch, warmup_epochs):
+            print('\n', '-' * 30, 'Warmup epoch: %d' % (epoch + 1), '-' * 30, '\n')
+            batch_time = AverageMeter()
+            data_time = AverageMeter()
+            losses = AverageMeter()
+            top1 = AverageMeter()
+            top5 = AverageMeter()
+            # switch to train mode
+            self.model.train()
+
+            end = time.time()
+            for i, (images, labels) in enumerate(data_loader):
+                data_time.update(time.time() - end)
+                # lr
+                T_cur = epoch * nBatch + i
+                warmup_lr = 0.5 * lr_max * (1 + math.cos(math.pi * T_cur / T_total))
+                for param_group in self.model_optim.param_groups:
+                    param_group['lr'] = warmup_lr
+                images, labels = images.to(self.device), labels.to(self.device)
+                # compute output
+                self._reset_binary_gates() # random sample binary gates
+                # TODO: 
+                #self._unused_modules_off() # remove unused module for speedup
+                output = self.model(images)
+
+    def _reset_binary_gates(self):
+        for m in self.
+
+    def train(self):
+        pass
+
+    def export(self):
+        pass

From 5647dd04ee655cb8bfd998e6980b0c579dbf24ff Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Thu, 14 Nov 2019 16:10:08 +0800
Subject: [PATCH 07/60] update

---
 examples/nas/proxylessnas/model.py            |  31 +--
 examples/nas/proxylessnas/ops.py              |  49 +++-
 .../nas/proxylessnas/{utils.py => putils.py}  |   1 +
 examples/nas/proxylessnas/search.py           |   5 +-
 .../nni/nas/pytorch/proxylessnas/mutator.py   | 154 +++++++++++-
 .../nni/nas/pytorch/proxylessnas/trainer.py   | 237 +++++++++++++++++-
 6 files changed, 448 insertions(+), 29 deletions(-)
 rename examples/nas/proxylessnas/{utils.py => putils.py} (99%)

diff --git a/examples/nas/proxylessnas/model.py b/examples/nas/proxylessnas/model.py
index 93c629dd66..d7275641ad 100644
--- a/examples/nas/proxylessnas/model.py
+++ b/examples/nas/proxylessnas/model.py
@@ -20,9 +20,10 @@
 
 import torch
 import torch.nn as nn
+import math
 
 import ops
-import utils
+import putils
 from nni.nas import pytorch as nas
 
 class SearchMobileNet(nn.Module):
@@ -44,15 +45,17 @@ def __init__(self,
         width_mult : int
             the scale factor of width
         """
-        input_channel = utils.make_devisible(32 * width_mult, 8)
-        first_cell_width = utils.make_devisible(16 * width_mult, 8)
+        super(SearchMobileNet, self).__init__()
+
+        input_channel = putils.make_divisible(32 * width_mult, 8)
+        first_cell_width = putils.make_divisible(16 * width_mult, 8)
         for i in range(len(width_stages)):
-            width_stages[i] = utils.make_devisible(width_stages[i] * width_mult, 8)
+            width_stages[i] = putils.make_divisible(width_stages[i] * width_mult, 8)
         # first conv
-        first_conv = ConvLayer(3, input_channel, kernel_size=3, stride=2, use_bn=True, act_func='relu6', ops_order='weight_bn_act')
+        first_conv = ops.ConvLayer(3, input_channel, kernel_size=3, stride=2, use_bn=True, act_func='relu6', ops_order='weight_bn_act')
         # first block
         first_block_conv = ops.OPS['3x3_MBConv1'](input_channel, first_cell_width, 1)
-        first_block = MobileInvertedResidualBlock(first_block_conv, None)
+        first_block = ops.MobileInvertedResidualBlock(first_block_conv, None)
 
         input_channel = first_cell_width
 
@@ -86,18 +89,18 @@ def __init__(self,
                 # shortcut
                 if stride == 1 and input_channel == width:
                     # if not first cell
-                    shortcut = IndentityLayer(input_channel, input_channel)
+                    shortcut = ops.IdentityLayer(input_channel, input_channel)
                 else:
                     shortcut = None
-                inverted_residual_block = MobileInvertedResidualBlock(conv_op, shortcut)
+                inverted_residual_block = ops.MobileInvertedResidualBlock(conv_op, shortcut)
                 blocks.append(inverted_residual_block)
                 input_channel = width
             stage_cnt += 1
 
         # feature mix layer
         last_channel = utils.make_devisible(1280 * width_mult, 8) if width_mult > 1.0 else 1280
-        feature_mix_layer = ConvLayer(input_channel, last_channel, kernel_size=1, use_bn=True, act_func='relu6', ops_order='weight_bn_act', )
-        classifier = LinearLayer(last_channel, n_classes, dropout_rate=dropout_rate)
+        feature_mix_layer = ops.ConvLayer(input_channel, last_channel, kernel_size=1, use_bn=True, act_func='relu6', ops_order='weight_bn_act', )
+        classifier = ops.LinearLayer(last_channel, n_classes, dropout_rate=dropout_rate)
 
         self.first_conv = first_conv
         self.blocks = nn.ModuleList(blocks)
@@ -130,14 +133,14 @@ def init_model(self, model_init='he_fout', init_div_groups=False):
             if isinstance(m, nn.Conv2d):
                 if model_init == 'he_fout':
                     n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
-                    if init_dev_groups:
+                    if init_div_groups:
                         n /= m.groups
-                    m.weight.data.normal_(0, math, sqrt(2. / n))
+                    m.weight.data.normal_(0, math.sqrt(2. / n))
                 elif model_init == 'he_fin':
                     n = m.kernel_size[0] * m.kernel_size[1] * m.in_channels
-                    if init_dev_groups:
+                    if init_div_groups:
                         n /= m.groups
-                    m.weight.data.normal_(0, math, sqrt(2. / n))
+                    m.weight.data.normal_(0, math.sqrt(2. / n))
                 else:
                     raise NotImplementedError
             elif isinstance(m, nn.BatchNorm2d) or isinstance(m, nn.BatchNorm1d):
diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py
index 143ab73c2d..5538577909 100644
--- a/examples/nas/proxylessnas/ops.py
+++ b/examples/nas/proxylessnas/ops.py
@@ -18,7 +18,6 @@
 # DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 
-from utils import *
 from collections import OrderedDict
 import torch.nn as nn
 
@@ -48,6 +47,52 @@
 
 #========================================
 
+def get_same_padding(kernel_size):
+    if isinstance(kernel_size, tuple):
+        assert len(kernel_size) == 2, 'invalid kernel size: %s' % kernel_size
+        p1 = get_same_padding(kernel_size[0])
+        p2 = get_same_padding(kernel_size[1])
+        return p1, p2
+    assert isinstance(kernel_size, int), 'kernel size should be either `int` or `tuple`'
+    assert kernel_size % 2 > 0, 'kernel size should be odd number'
+    return kernel_size // 2
+
+def build_activation(act_func, inplace=True):
+    if act_func == 'relu':
+        return nn.ReLU(inplace=inplace)
+    elif act_func == 'relu6':
+        return nn.ReLU6(inplace=inplace)
+    elif act_func == 'tanh':
+        return nn.Tanh()
+    elif act_func == 'sigmoid':
+        return nn.Sigmoid()
+    elif act_func is None:
+        return None
+    else:
+        raise ValueError('do not support: %s' % act_func)
+
+#========================================
+
+class MyModule(nn.Module):
+    
+    def forward(self, x):
+        raise NotImplementedError
+
+    @property
+    def module_str(self):
+        raise NotImplementedError
+
+    @property
+    def config(self):
+        raise NotImplementedError
+
+    @staticmethod
+    def build_from_config(config):
+        raise NotImplementedError
+
+    def get_flops(self, x):
+        raise NotImplementedError
+
 class MobileInvertedResidualBlock(MyModule):
     
     def __init__(self, mobile_inverted_conv, shortcut):
@@ -677,4 +722,4 @@ def get_flops(self, x):
 
     @staticmethod
     def is_zero_layer():
-        return True
\ No newline at end of file
+        return True
diff --git a/examples/nas/proxylessnas/utils.py b/examples/nas/proxylessnas/putils.py
similarity index 99%
rename from examples/nas/proxylessnas/utils.py
rename to examples/nas/proxylessnas/putils.py
index 0244da48c7..5c1d47d1f3 100644
--- a/examples/nas/proxylessnas/utils.py
+++ b/examples/nas/proxylessnas/putils.py
@@ -18,6 +18,7 @@
 # DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 
+
 def make_divisible(v, divisor, min_val=None):
     """
     This function is taken from the original tf repo.
diff --git a/examples/nas/proxylessnas/search.py b/examples/nas/proxylessnas/search.py
index c7b3c1bb5b..5e2dc2eb58 100644
--- a/examples/nas/proxylessnas/search.py
+++ b/examples/nas/proxylessnas/search.py
@@ -25,8 +25,7 @@
 import torch.nn as nn
 
 from model import *
-from nni.nas.pytorch.darts import ProxylessNasTrainer
-from utils import *
+from nni.nas.pytorch.proxylessnas import ProxylessNasTrainer
 
 def get_parameters(keys=None, mode='include'):
     if keys is None:
@@ -104,10 +103,12 @@ def get_parameters(keys=None, mode='include'):
                                          resize_scale=0.08,
                                          distort_color='normal')
     train_loader = data_provider.train
+    valid_loader = data_provider.valid
 
     trainer = ProxylessNasTrainer(model,
                                   model_optim=optimizer,
                                   train_loader=train_loader,
+                                  valid_loader=valid_loader,
                                   device=device)
 
     trainer.train()
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 873dec12f4..6c90bee75a 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -26,13 +26,159 @@
 from nni.nas.pytorch.mutator import PyTorchMutator
 
 
+class ArchGradientFunction(torch.autograd.Function):
+    
+    @staticmethod
+    def forward(ctx, x, binary_gates, run_func, backward_func):
+        ctx.run_func = run_func
+        ctx.backward_func = backward_func
+
+        detached_x = detach_variable(x)
+        with torch.enable_grad():
+            output = run_func(detached_x)
+        ctx.save_for_backward(detached_x, output)
+        return output.data
+
+    @staticmethod
+    def backward(ctx, grad_output):
+        detached_x, output = ctx.saved_tensors
+
+        grad_x = torch.autograd.grad(output, detached_x, grad_output, only_inputs=True)
+        # compute gradients w.r.t. binary_gates
+        binary_grads = ctx.backward_func(detached_x.data, output.data, grad_output.data)
+
+        return grad_x[0], binary_grads, None, None
+
+class MixedOp(nn.Module):
+    def __init__(self, mutable):
+        self.mutable = mutable
+        self.AP_path_alpha = nn.Parameter(torch.Tensor(mutable.length))
+        self.AP_path_wb = nn.Parameter(torch.Tensor(mutable.length))
+        self.active_index = [0]
+        self.inactive_index = None
+        self.log_prob = None
+        self.current_prob_over_ops = None
+    
+    def forward(self, x):
+        # only full_v2
+        def run_function(candidate_ops, active_id):
+            def forward(_x):
+                return candidate_ops[active_id](_x)
+            return forward
+
+        def backward_function(candidate_ops, active_id, binary_gates):
+            def backward(_x, _output, grad_output):
+                binary_grads = torch.zeros_like(binary_gates.data)
+                with torch.no_grad():
+                    for k in range(len(candidate_ops)):
+                        if k != active_id:
+                            out_k = candidate_ops[k](_x.data)
+                        else:
+                            out_k = _output.data
+                        grad_k = torch.sum(out_k * grad_output)
+                        binary_grads[k] = grad_k
+                return binary_grads
+            return backward
+        output = ArchGradientFunction.apply(
+            x, self.AP_path_wb, run_function(self.mutable.choices, self.active_index[0]),
+            backward_function(self.mutable.choices, self.active_index[0], self.AP_path_wb))
+        return output
+
+    @property
+    def probs_over_ops(self):
+        probs = F.softmax(self.AP_path_alpha, dim=0)  # softmax to probability
+        return probs
+
+    @property
+    def chosen_index(self):
+        probs = self.probs_over_ops.data.cpu().numpy()
+        index = int(np.argmax(probs))
+        return index, probs[index]
+
+    @property
+    def active_op(self):
+        """ assume only one path is active """
+        return self.mutable.choices[self.active_index[0]]
+
+    def set_chosen_op_active(self):
+        chosen_idx, _ = self.chosen_index
+        self.active_index = [chosen_idx]
+        self.inactive_index = [_i for _i in range(0, chosen_idx)] + \
+                              [_i for _i in range(chosen_idx + 1, self.n_choices)]
+
+    def binarize(self):
+        self.log_prob = None
+        # reset binary gates
+        self.AP_path_wb.data.zero_()
+        probs = self.probs_over_ops
+        sample = torch.multinomial(probs.data, 1)[0].item()
+        self.active_index = [sample]
+        self.inactive_index = [_i for _i in range(0, sample)] + \
+                              [_i for _i in range(sample + 1, len(self.mutable.choices))]
+        self.log_prob = torch.log(probs[sample])
+        self.current_prob_over_ops = probs
+        self.AP_path_wb.data[sample] = 1.0
+        # avoid over-regularization
+        for choice in self.mutable.choices:
+            for _, param in choice.named_parameters():
+                param.grad = None
+
+    def _delta_ij(i, j):
+        if i == j:
+            return 1
+        else:
+            return 0
+
+    def set_arch_param_grad(self):
+        binary_grads = self.AP_path_wb.grad.data
+        if self.active_op.is_zero_layer():
+            self.AP_path_alpha.grad = None
+            return
+        if self.AP_path_alpha.grad is None:
+            self.AP_path_alpha.grad = torch.zeros_like(self.AP_path_alpha.data)
+        probs = self.probs_over_ops.data
+        for i in range(len(self.mutable.choices)):
+            for j in range(len(self.mutable.choices)):
+                self.AP_path_alpha.grad.data[i] += binary_grads[j] * probs[j] * (self._delta_ij(i, j) - probs[i])
+
+
 class ProxylessNasMutator(PyTorchMutator):
 
     def before_build(self, model):
-        self.choices = nn.ParameterDict()
+        self.mixed_ops = {}
 
     def on_init_layer_choice(self, mutable: LayerChoice):
-        self.choices[mutable.key] = nn.Parameter(1.0E-3 * torch.randn(mutable.length))
+        self.mixed_ops[mutable.key] = MixedOp(mutable)
+
+    def on_forward_layer_choice(self, mutable, *inputs):
+        """
+        Callback of layer choice forward. Override if you are an advanced user.
+        On default, this method calls :meth:`on_calc_layer_choice_mask` to get a mask on how to choose between layers
+        (either by switch or by weights), then it will reduce the list of all tensor outputs with the policy speicified
+        in `mutable.reduction`. It will also cache the mask with corresponding `mutable.key`.
+
+        Parameters
+        ----------
+        mutable: LayerChoice
+        inputs: list of torch.Tensor
+
+        Returns
+        -------
+        torch.Tensor
+        """
+        return self.mixed_ops[mutable.key].forward(*inputs)
+
+    def reset_binary_gates(self):
+        for k in self.mixed_ops.keys():
+            self.mixed_ops[k].binarize()
+
+    def set_chosen_op_active(self):
+        for k in self.mixed_ops.keys():
+            self.mixed_ops[k].set_chosen_op_active()
+
+    def num_arch_params(self):
+        return len(self.mixed_ops)
 
-    def on_calc_layer_choice_mask(self, mutable: LayerChoice):
-        return F.softmax(self.choices[mutable.key], dim=-1)
+    def set_arch_param_grad(self):
+        for k in self.mixed_ops.keys():
+            self.mixed_ops[k].set_arch_param_grad()
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 5ada24a5aa..90079e0278 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -29,12 +29,42 @@
 from .mutator import ProxylessNasMutator
 
 
+def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1):
+    logsoftmax = nn.LogSoftmax()
+    n_classes = pred.size(1)
+    # convert to one-hot
+    target = torch.unsqueeze(target, 1)
+    soft_target = torch.zeros_like(pred)
+    soft_target.scatter_(1, target, 1)
+    # label smoothing
+    soft_target = soft_target * (1 - label_smoothing) + label_smoothing / n_classes
+    return torch.mean(torch.sum(- soft_target * logsoftmax(pred), 1))
+
+def accuracy(output, target, topk=(1,)):
+    """ Computes the precision@k for the specified values of k """
+    maxk = max(topk)
+    batch_size = target.size(0)
+
+    _, pred = output.topk(maxk, 1, True, True)
+    pred = pred.t()
+    correct = pred.eq(target.view(1, -1).expand_as(pred))
+
+    res = []
+    for k in topk:
+        correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
+        res.append(correct_k.mul_(100.0 / batch_size))
+    return res
+
 class ProxylessNasTrainer(Trainer):
-    def __init__(self, model, model_optim, train_loader, device):
+    def __init__(self, model, model_optim, train_loader, valid_loader, device):
         self.model = model
         self.model_optim = model_optim
         self.train_loader = train_loader
+        self.valid_loader = valid_loader
         self.device = device
+        # init mutator
+        self.mutator = ProxylessNasMutator(model)
+        self._valid_iter = None
 
         # TODO: arch search configs
 
@@ -46,6 +76,8 @@ def __init__(self, model, model_optim, train_loader, device):
         self.warmup = True
         self.warmup_epoch = 0
 
+        self.criterion = nn.CrossEntropyLoss()
+
     def _architecture_parameters(self):
         for name, param in self.named_parameters():
             if 'AP_path_alpha' in name:
@@ -60,6 +92,42 @@ def _init_arch_params(self, init_type='normal', init_ratio=1e-3):
             else:
                 raise NotImplementedError
 
+    def _validate(self):
+        self.valid_loader.batch_sampler.batch_size = 500
+        self.valid_loader.batch_sampler.drop_last = False
+
+        self.mutator.set_chosen_op_active()
+        # test on validation set under train mode
+        self.model.train()
+        batch_time = AverageMeter()
+        losses = AverageMeter()
+        top1 = AverageMeter()
+        top5 = AverageMeter()
+        end = time.time()
+        with torch.no_grad():
+            for i, (images, labels) in enumerate(self.valid_loader):
+                images, labels = images.to(self.device), labels.to(self.device)
+                output = self.model(images)
+                loss = self.criterion(output, labels)
+                acc1, acc5 = accuracy(output, labels, topk=(1, 5))
+                losses.update(loss, images.size(0))
+                top1.update(acc1[0], images.size(0))
+                top5.update(acc5[0], images.size(0))
+                # measure elapsed time
+                batch_time.update(time.time() - end)
+                end = time.time()
+
+                if i % 10 == 0 or i + 1 == len(self.valid_loader):
+                    test_log = 'Valid' + ': [{0}/{1}]\t'\
+                                        'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'\
+                                        'Loss {loss.val:.4f} ({loss.avg:.4f})\t'\
+                                        'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})'.\
+                        format(i, len(data_loader) - 1, batch_time=batch_time, loss=losses, top1=top1)
+                    if return_top5:
+                        test_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5)
+                    print(test_log)
+        return losses.avg, top1.avg, top5.avg
+
     def _warm_up(self, warmup_epochs=25):
         lr_max = 0.05
         data_loader = self.train_loader
@@ -86,16 +154,171 @@ def _warm_up(self, warmup_epochs=25):
                     param_group['lr'] = warmup_lr
                 images, labels = images.to(self.device), labels.to(self.device)
                 # compute output
-                self._reset_binary_gates() # random sample binary gates
-                # TODO: 
-                #self._unused_modules_off() # remove unused module for speedup
+                self.mutator.reset_binary_gates() # random sample binary gates
                 output = self.model(images)
+                label_smoothing = 0.1
+                if label_smoothing > 0:
+                    loss = cross_entropy_with_label_smoothing(output, labels, label_smoothing)
+                else:
+                    loss = self.criterion(output, labels)
+                # measure accuracy and record loss
+                acc1, acc5 = accuracy(output, labels, topk=(1, 5))
+                losses.update(loss, images.size(0))
+                top1.update(acc1[0], images.size(0))
+                top5.update(acc5[0], images.size(0))
+                # compute gradient and do SGD step
+                self.model.zero_grad()
+                loss.backward()
+                self.model_optim.step()
+                # measure elapsed time
+                batch_time.update(time.time() - end)
+                end = time.time()
+
+                if i % 10 == 0 or i + 1 == nBatch:
+                    batch_log = 'Warmup Train [{0}][{1}/{2}]\t' \
+                                'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' \
+                                'Data {data_time.val:.3f} ({data_time.avg:.3f})\t' \
+                                'Loss {losses.val:.4f} ({losses.avg:.4f})\t' \
+                                'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})\t' \
+                                'Top-5 acc {top5.val:.3f} ({top5.avg:.3f})\tlr {lr:.5f}'. \
+                        format(epoch + 1, i, nBatch - 1, batch_time=batch_time, data_time=data_time,
+                               losses=losses, top1=top1, top5=top5, lr=warmup_lr)
+                    print(batch_log)
+            valid_res, flops, latency = self._validate()
+            val_log = 'Warmup Valid [{0}/{1}]\tloss {2:.3f}\ttop-1 acc {3:.3f}\ttop-5 acc {4:.3f}\t' \
+                      'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}\tflops: {5:.1f}M'. \
+                format(epoch + 1, warmup_epochs, *valid_res, flops / 1e6, top1=top1, top5=top5)
+            print(val_log)
+
+    def _get_update_schedule(self, nBatch):
+        schedule = {}
+        grad_update_arch_param_every = 5
+        grad_update_steps = 1
+        for i in range(nBatch):
+            if (i + 1) % grad_update_arch_param_every == 0:
+                schedule[i] = grad_update_steps
+        return schedule
 
-    def _reset_binary_gates(self):
-        for m in self.
+    def _calc_learning_rate(self, epoch, batch=0, nBatch=None):
+        T_total = self.n_epochs * nBatch
+        T_cur = epoch * nBatch + batch
+        lr = 0.5 * self.init_lr * (1 + math.cos(math.pi * T_cur / T_total))
+
+    def _adjust_learning_rate(self, optimizer, epoch, batch=0, nBatch=None):
+        """ adjust learning of a given optimizer and return the new learning rate """
+        new_lr = self._calc_learning_rate(epoch, batch, nBatch)
+        for param_group in optimizer.param_groups:
+            param_group['lr'] = new_lr
+        return new_lr
+
+    def _train(self):
+        nBatch = len(self.train_loader)
+        arch_param_num = self.mutator.num_arch_params()
+        binary_gates_num = self.mutator.num_arch_params()
+        #weight_param_num = len(list(self.net.weight_parameters()))
+        print(
+            '#arch_params: %d\t#binary_gates: %d\t#weight_params: xx' %
+            (arch_param_num, binary_gates_num)
+        )
+
+        update_schedule = self._get_update_schedule(nBatch)
+
+        for epoch in range(0, 120):
+            print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n')
+            batch_time = AverageMeter()
+            data_time = AverageMeter()
+            losses = AverageMeter()
+            top1 = AverageMeter()
+            top5 = AverageMeter()
+            entropy = AverageMeter()
+            # switch to train mode
+            self.model.train()
+
+            end = time.time()
+            for i, (images, labels) in enumerate(self.train_loader):
+                data_time.update(time.time() - end)
+                lr = self._adjust_learning_rate(self.model_optim, epoch, batch=i, nBatch=nBatch)
+                # network entropy
+                #net_entropy = self.mutator.entropy()
+                #entropy.update(net_entropy.data.item() / arch_param_num, 1)
+                # train weight parameters
+                images, labels = images.to(self.device), labels.to(self.device)
+                self.mutator.reset_binary_gates()
+                output = self.model(images)
+                label_smoothing = 0.1
+                if label_smoothing > 0:
+                    loss = cross_entropy_with_label_smoothing(output, labels, label_smoothing)
+                else:
+                    loss = self.criterion(output, labels)
+                acc1, acc5 = accuracy(output, labels, topk=(1, 5))
+                losses.update(loss, images.size(0))
+                top1.update(acc1[0], images.size(0))
+                top5.update(acc5[0], images.size(0))
+                self.model.zero_grad()
+                loss.backward()
+                self.model_optim.step()
+                if epoch > 0:
+                    for j in range(update_schedule.get(i, 0)):
+                        start_time = time.time()
+                        # GradientArchSearchConfig
+                        arch_loss, exp_value = self._gradient_step()
+                        used_time = time.time() - start_time
+                        log_str = 'Architecture [%d-%d]\t Time %.4f\t Loss %.4f\t null %s' % \
+                                    (epoch + 1, i, used_time, arch_loss, exp_value)
+                        print(log_str)
+                batch_time.update(time.time() - end)
+                end = time.time()
+                # training log
+                if i % 10 == 0 or i + 1 == nBatch:
+                    batch_log = 'Train [{0}][{1}/{2}]\t' \
+                                'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' \
+                                'Data Time {data_time.val:.3f} ({data_time.avg:.3f})\t' \
+                                'Loss {losses.val:.4f} ({losses.avg:.4f})\t' \
+                                'Entropy {entropy.val:.5f} ({entropy.avg:.5f})\t' \
+                                'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})\t' \
+                                'Top-5 acc {top5.val:.3f} ({top5.avg:.3f})\tlr {lr:.5f}'. \
+                        format(epoch + 1, i, nBatch - 1, batch_time=batch_time, data_time=data_time,
+                               losses=losses, entropy=entropy, top1=top1, top5=top5, lr=lr)
+                    print(batch_log)
+                # TODO: print current network architecture
+                # TODO: validate
+        # convert to normal network according to architecture parameters
+
+    def _valid_next_batch(self):
+        if self._valid_iter is None:
+            self._valid_iter = iter(self.valid_loader)
+        try:
+            data = next(self._valid_iter)
+        except StopIteration:
+            self._valid_iter = iter(self.valid_loader)
+            data = next(self._valid_iter)
+        return data
+
+    def _gradient_step(self):
+        self.valid_loader.batch_sampler.batch_size = 256
+        self.valid_loader.batch_sampler.drop_last = True
+        self.model.train()
+        time1 = time.time()  # time
+        # sample a batch of data from validation set
+        images, labels = self._valid_next_batch()
+        images, labels = images.to(self.device), labels.to(self.device)
+        time2 = time.time()  # time
+        self.mutator.reset_binary_gates()
+        output = self.model(images)
+        time3 = time.time()
+        ce_loss = self.criterion(output, labels)
+        expected_value = None
+        loss = ce_loss
+        self.model.zero_grad()
+        loss.backward()
+        self.mutator.set_arch_param_grad()
+        self.arch_optimizer.step()
+        time4 = time.time()
+        print('(%.4f, %.4f, %.4f)' % (time2 - time1, time3 - time2, time4 - time3))
 
     def train(self):
-        pass
+        self._warm_up()
+        self._train()
 
     def export(self):
         pass

From 5b7cb4367348194f39f1027419eb49305c22a2f3 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Thu, 14 Nov 2019 20:11:38 +0800
Subject: [PATCH 08/60] update

---
 examples/nas/proxylessnas/datasets.py         |  2 +-
 examples/nas/proxylessnas/search.py           | 29 +++++++++------
 .../nni/nas/pytorch/proxylessnas/mutator.py   | 16 +++++++-
 .../nni/nas/pytorch/proxylessnas/trainer.py   | 37 +++++++++++++++----
 4 files changed, 62 insertions(+), 22 deletions(-)

diff --git a/examples/nas/proxylessnas/datasets.py b/examples/nas/proxylessnas/datasets.py
index 4052298305..ebd756045c 100644
--- a/examples/nas/proxylessnas/datasets.py
+++ b/examples/nas/proxylessnas/datasets.py
@@ -18,7 +18,7 @@
 # DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 
-
+import os
 import numpy as np
 import torch.utils.data
 import torchvision.transforms as transforms
diff --git a/examples/nas/proxylessnas/search.py b/examples/nas/proxylessnas/search.py
index 5e2dc2eb58..2c982f4bd4 100644
--- a/examples/nas/proxylessnas/search.py
+++ b/examples/nas/proxylessnas/search.py
@@ -27,12 +27,12 @@
 from model import *
 from nni.nas.pytorch.proxylessnas import ProxylessNasTrainer
 
-def get_parameters(keys=None, mode='include'):
+def get_parameters(model, keys=None, mode='include'):
     if keys is None:
-        for name, param in self.named_parameters():
+        for name, param in model.named_parameters():
             yield param
     elif mode == 'include':
-        for name, param in self.named_parameters():
+        for name, param in model.named_parameters():
             flag = False
             for key in keys:
                 if key in name:
@@ -41,7 +41,7 @@ def get_parameters(keys=None, mode='include'):
             if flag:
                 yield param
     elif mode == 'exclude':
-        for name, param in self.named_parameters():
+        for name, param in model.named_parameters():
             flag = True
             for key in keys:
                 if key in name:
@@ -64,52 +64,57 @@ def get_parameters(keys=None, mode='include'):
     #dataset_train, dataset_valid = datasets.get_dataset("cifar10")
 
     model = SearchMobileNet()
+    print('=============================================SearchMobileNet model create done')
     model.init_model()
+    print('=============================================SearchMobileNet model init done')
 
     # move network to GPU if available
     if torch.cuda.is_available():
         device = torch.device('cuda:0')
         #self.net = torch.nn.DataParallel(self.net)
         model.to(device)
-        cudnn.benchmark = True
+        #cudnn.benchmark = True
     else:
         raise ValueError
         # self.device = torch.device('cpu')
 
     # TODO: net info
 
-    criterion = nn.CrossEntropyLoss()
-
     # TODO: removed decay_key
     no_decay_keys = True
     if no_decay_keys:
         keys = ['bn']
         momentum, nesterov = 0.9, True
         optimizer = torch.optim.SGD([
-            {'params': get_parameters(keys, mode='exclude'), 'weight_decay': 4e-5},
-            {'params': get_parameters(keys, mode='include'), 'weight_decay': 0},
+            {'params': get_parameters(model, keys, mode='exclude'), 'weight_decay': 4e-5},
+            {'params': get_parameters(model, keys, mode='include'), 'weight_decay': 0},
         ], lr=0.05, momentum=momentum, nesterov=nesterov)
     else:
-        optimizer = torch.optim.SGD(get_parameters(), lr=0.05, momentum=momentum, nesterov=nesterov, weight_decay=4e-5)
+        optimizer = torch.optim.SGD(model, get_parameters(), lr=0.05, momentum=momentum, nesterov=nesterov, weight_decay=4e-5)
 
     #n_epochs = 50
     #lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optim, n_epochs, eta_min=0.001)
 
+    print('=============================================Start to create data provider')
     # TODO: 
-    data_provider = ImagenetDataProvider(train_batch_size=256,
+    data_provider = datasets.ImagenetDataProvider(save_path='/data/hdd3/yugzh/imagenet/',
+                                         train_batch_size=256,
                                          test_batch_size=500,
                                          valid_size=None,
-                                         n_worker=32,
+                                         n_worker=0, #32,
                                          resize_scale=0.08,
                                          distort_color='normal')
+    print('=============================================Finish to create data provider')
     train_loader = data_provider.train
     valid_loader = data_provider.valid
 
+    print('=============================================Start to create ProxylessNasTrainer')
     trainer = ProxylessNasTrainer(model,
                                   model_optim=optimizer,
                                   train_loader=train_loader,
                                   valid_loader=valid_loader,
                                   device=device)
 
+    print('=============================================Start to train ProxylessNasTrainer')
     trainer.train()
     trainer.export()
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 6c90bee75a..7c6333170e 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -51,6 +51,7 @@ def backward(ctx, grad_output):
 
 class MixedOp(nn.Module):
     def __init__(self, mutable):
+        super(MixedOp, self).__init__()
         self.mutable = mutable
         self.AP_path_alpha = nn.Parameter(torch.Tensor(mutable.length))
         self.AP_path_wb = nn.Parameter(torch.Tensor(mutable.length))
@@ -59,6 +60,9 @@ def __init__(self, mutable):
         self.log_prob = None
         self.current_prob_over_ops = None
     
+    def get_AP_path_alpha(self):
+        return self.AP_path_alpha
+
     def forward(self, x):
         # only full_v2
         def run_function(candidate_ops, active_id):
@@ -111,7 +115,10 @@ def binarize(self):
         # reset binary gates
         self.AP_path_wb.data.zero_()
         probs = self.probs_over_ops
-        sample = torch.multinomial(probs.data, 1)[0].item()
+        print('probs: ', probs.data)
+        print('probs type: ', probs.type())
+        sample = torch.multinomial(probs, 1)[0].item()
+        print('sample: ', sample)
         self.active_index = [sample]
         self.inactive_index = [_i for _i in range(0, sample)] + \
                               [_i for _i in range(sample + 1, len(self.mutable.choices))]
@@ -166,10 +173,11 @@ def on_forward_layer_choice(self, mutable, *inputs):
         -------
         torch.Tensor
         """
-        return self.mixed_ops[mutable.key].forward(*inputs)
+        return self.mixed_ops[mutable.key].forward(*inputs), None
 
     def reset_binary_gates(self):
         for k in self.mixed_ops.keys():
+            print('+++++++++++++++++++k: ', k)
             self.mixed_ops[k].binarize()
 
     def set_chosen_op_active(self):
@@ -182,3 +190,7 @@ def num_arch_params(self):
     def set_arch_param_grad(self):
         for k in self.mixed_ops.keys():
             self.mixed_ops[k].set_arch_param_grad()
+
+    def get_architecture_parameters(self):
+        for k in self.mixed_ops.keys():
+            yield self.mixed_ops[k].get_AP_path_alpha()
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 90079e0278..3538583714 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -20,6 +20,7 @@
 
 import copy
 import math
+import time
 
 import torch
 from torch import nn as nn
@@ -29,6 +30,31 @@
 from .mutator import ProxylessNasMutator
 
 
+class AverageMeter(object):
+    """
+    Computes and stores the average and current value
+    Copied from: https://github.com/pytorch/examples/blob/master/imagenet/main.py
+    """
+
+    def __init__(self):
+        self.val = 0
+        self.avg = 0
+        self.sum = 0
+        self.count = 0
+
+    def reset(self):
+        self.val = 0
+        self.avg = 0
+        self.sum = 0
+        self.count = 0
+
+    def update(self, val, n=1):
+        self.val = val
+        self.sum += val * n
+        self.count += n
+        self.avg = self.sum / self.count
+
+
 def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1):
     logsoftmax = nn.LogSoftmax()
     n_classes = pred.size(1)
@@ -71,20 +97,15 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device):
         self._init_arch_params()
 
         # build architecture optimizer
-        self.arch_optimizer = torch.optim.Adam(self._architecture_parameters(), 1e-3, weight_decay=0)
+        self.arch_optimizer = torch.optim.Adam(self.mutator.get_architecture_parameters(), 1e-3, weight_decay=0)
 
         self.warmup = True
         self.warmup_epoch = 0
 
         self.criterion = nn.CrossEntropyLoss()
 
-    def _architecture_parameters(self):
-        for name, param in self.named_parameters():
-            if 'AP_path_alpha' in name:
-                yield param
-
     def _init_arch_params(self, init_type='normal', init_ratio=1e-3):
-        for param in self._architecture_parameters():
+        for param in self.mutator.get_architecture_parameters():
             if init_type == 'normal':
                 param.data.normal_(0, init_ratio)
             elif init_type == 'uniform':
@@ -145,7 +166,9 @@ def _warm_up(self, warmup_epochs=25):
             self.model.train()
 
             end = time.time()
+            print('=====================_warm_up, epoch: ', epoch)
             for i, (images, labels) in enumerate(data_loader):
+                print('=====================_warm_up, minibatch i: ', i)
                 data_time.update(time.time() - end)
                 # lr
                 T_cur = epoch * nBatch + i

From 366b79314bf17c93f3b3a942dc4768f24a150ca7 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Sun, 17 Nov 2019 01:54:25 +0800
Subject: [PATCH 09/60] debug

---
 examples/nas/proxylessnas/model.py            |  5 ++-
 examples/nas/proxylessnas/ops.py              | 21 +++++++----
 examples/nas/proxylessnas/search.py           |  4 +-
 .../nni/nas/pytorch/proxylessnas/mutator.py   | 37 ++++++++++++++++---
 .../nni/nas/pytorch/proxylessnas/trainer.py   | 21 ++++++++---
 5 files changed, 65 insertions(+), 23 deletions(-)

diff --git a/examples/nas/proxylessnas/model.py b/examples/nas/proxylessnas/model.py
index d7275641ad..f640e7a916 100644
--- a/examples/nas/proxylessnas/model.py
+++ b/examples/nas/proxylessnas/model.py
@@ -55,7 +55,8 @@ def __init__(self,
         first_conv = ops.ConvLayer(3, input_channel, kernel_size=3, stride=2, use_bn=True, act_func='relu6', ops_order='weight_bn_act')
         # first block
         first_block_conv = ops.OPS['3x3_MBConv1'](input_channel, first_cell_width, 1)
-        first_block = ops.MobileInvertedResidualBlock(first_block_conv, None)
+        #first_block = ops.MobileInvertedResidualBlock(first_block_conv, None, False)
+        first_block = first_block_conv
 
         input_channel = first_cell_width
 
@@ -77,6 +78,7 @@ def __init__(self,
                                                ops.OPS['7x7_MBConv3'](input_channel, width, stride),
                                                ops.OPS['7x7_MBConv6'](input_channel, width, stride),
                                                ops.OPS['Zero'](input_channel, width, stride)],
+                                               return_mask=True,
                                                key="s{}_c{}".format(stage_cnt, i))
                 else:
                     conv_op = nas.mutables.LayerChoice([ops.OPS['3x3_MBConv3'](input_channel, width, stride),
@@ -85,6 +87,7 @@ def __init__(self,
                                                ops.OPS['5x5_MBConv6'](input_channel, width, stride),
                                                ops.OPS['7x7_MBConv3'](input_channel, width, stride),
                                                ops.OPS['7x7_MBConv6'](input_channel, width, stride)],
+                                               return_mask=True,
                                                key="s{}_c{}".format(stage_cnt, i))
                 # shortcut
                 if stride == 1 and input_channel == width:
diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py
index 5538577909..8a67ca3988 100644
--- a/examples/nas/proxylessnas/ops.py
+++ b/examples/nas/proxylessnas/ops.py
@@ -19,6 +19,7 @@
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 
 from collections import OrderedDict
+import torch
 import torch.nn as nn
 
 
@@ -102,14 +103,17 @@ def __init__(self, mobile_inverted_conv, shortcut):
         self.shortcut = shortcut
 
     def forward(self, x):
-        if self.mobile_inverted_conv.is_zero_layer():
+        out, idx = self.mobile_inverted_conv(x)
+        print('*****************************idx: ', idx)
+        if idx == 6:
             res = x
-        elif self.shortcut is None or self.shortcut.is_zero_layer():
-            res = self.mobile_inverted_conv(x)
+            #res = out
+        elif self.shortcut is None:
+            res = out #self.mobile_inverted_conv(x)
         else:
-            conv_x = self.mobile_inverted_conv(x)
-            skip_x = self.shortcut(x)
-            res = skip_x + conv_x
+           conv_x = out #self.mobile_inverted_conv(x)
+           skip_x = self.shortcut(x)
+           res = skip_x + conv_x
         return res
 
     @property
@@ -694,13 +698,14 @@ def __init__(self, stride):
         self.stride = stride
 
     def forward(self, x):
-        n, c, h, w = x.size()
+        '''n, c, h, w = x.size()
         h //= self.stride
         w //= self.stride
         device = x.get_device() if x.is_cuda else torch.device('cpu')
         # noinspection PyUnresolvedReferences
         padding = torch.zeros(n, c, h, w, device=device, requires_grad=False)
-        return padding
+        return padding'''
+        return x * 0
 
     @property
     def module_str(self):
diff --git a/examples/nas/proxylessnas/search.py b/examples/nas/proxylessnas/search.py
index 2c982f4bd4..e1624c7304 100644
--- a/examples/nas/proxylessnas/search.py
+++ b/examples/nas/proxylessnas/search.py
@@ -98,8 +98,8 @@ def get_parameters(model, keys=None, mode='include'):
     print('=============================================Start to create data provider')
     # TODO: 
     data_provider = datasets.ImagenetDataProvider(save_path='/data/hdd3/yugzh/imagenet/',
-                                         train_batch_size=256,
-                                         test_batch_size=500,
+                                         train_batch_size=2, #256,
+                                         test_batch_size=2, #500,
                                          valid_size=None,
                                          n_worker=0, #32,
                                          resize_scale=0.08,
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 7c6333170e..3afa8cbd0d 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -21,10 +21,18 @@
 import torch
 from torch import nn as nn
 from torch.nn import functional as F
+import numpy as np
 
 from nni.nas.pytorch.mutables import LayerChoice
 from nni.nas.pytorch.mutator import PyTorchMutator
 
+def detach_variable(inputs):
+    if isinstance(inputs, tuple):
+        return tuple([detach_variable(x) for x in inputs])
+    else:
+        x = inputs.detach()
+        x.requires_grad = inputs.requires_grad
+        return x
 
 class ArchGradientFunction(torch.autograd.Function):
     
@@ -32,20 +40,26 @@ class ArchGradientFunction(torch.autograd.Function):
     def forward(ctx, x, binary_gates, run_func, backward_func):
         ctx.run_func = run_func
         ctx.backward_func = backward_func
+        #ctx.mutable_key = mutable_key
 
         detached_x = detach_variable(x)
         with torch.enable_grad():
             output = run_func(detached_x)
         ctx.save_for_backward(detached_x, output)
+        print('ctx forward: ', ctx.__dict__)
+        #print('mutable key: ', ctx.mutable_key)
         return output.data
 
     @staticmethod
     def backward(ctx, grad_output):
+        print('ctx backward: ', ctx.__dict__)
+        #print('mutable key: ', ctx.mutable_key)
         detached_x, output = ctx.saved_tensors
 
         grad_x = torch.autograd.grad(output, detached_x, grad_output, only_inputs=True)
         # compute gradients w.r.t. binary_gates
         binary_grads = ctx.backward_func(detached_x.data, output.data, grad_output.data)
+        print('++++++++++++++++++++++++++++: ', binary_grads)
 
         return grad_x[0], binary_grads, None, None
 
@@ -65,13 +79,15 @@ def get_AP_path_alpha(self):
 
     def forward(self, x):
         # only full_v2
-        def run_function(candidate_ops, active_id):
+        def run_function(key, candidate_ops, active_id):
             def forward(_x):
+                print('key forward: ', key)
                 return candidate_ops[active_id](_x)
             return forward
 
-        def backward_function(candidate_ops, active_id, binary_gates):
+        def backward_function(key, candidate_ops, active_id, binary_gates):
             def backward(_x, _output, grad_output):
+                print('key backward: ', key)
                 binary_grads = torch.zeros_like(binary_gates.data)
                 with torch.no_grad():
                     for k in range(len(candidate_ops)):
@@ -84,8 +100,8 @@ def backward(_x, _output, grad_output):
                 return binary_grads
             return backward
         output = ArchGradientFunction.apply(
-            x, self.AP_path_wb, run_function(self.mutable.choices, self.active_index[0]),
-            backward_function(self.mutable.choices, self.active_index[0], self.AP_path_wb))
+            x, self.AP_path_wb, run_function(self.mutable.key, self.mutable.choices, self.active_index[0]),
+            backward_function(self.mutable.key, self.mutable.choices, self.active_index[0], self.AP_path_wb))
         return output
 
     @property
@@ -104,6 +120,10 @@ def active_op(self):
         """ assume only one path is active """
         return self.mutable.choices[self.active_index[0]]
 
+    @property
+    def active_op_index(self):
+        return self.active_index[0]
+
     def set_chosen_op_active(self):
         chosen_idx, _ = self.chosen_index
         self.active_index = [chosen_idx]
@@ -119,6 +139,7 @@ def binarize(self):
         print('probs type: ', probs.type())
         sample = torch.multinomial(probs, 1)[0].item()
         print('sample: ', sample)
+        print('mutable key: ', self.mutable.key)
         self.active_index = [sample]
         self.inactive_index = [_i for _i in range(0, sample)] + \
                               [_i for _i in range(sample + 1, len(self.mutable.choices))]
@@ -129,14 +150,17 @@ def binarize(self):
         for choice in self.mutable.choices:
             for _, param in choice.named_parameters():
                 param.grad = None
+        print('binarize: ', self.AP_path_wb.grad)
 
-    def _delta_ij(i, j):
+    def _delta_ij(self, i, j):
         if i == j:
             return 1
         else:
             return 0
 
     def set_arch_param_grad(self):
+        print('mutable key: ', self.mutable.key)
+        print('set_arch_param_grad: ', self.AP_path_wb.grad)
         binary_grads = self.AP_path_wb.grad.data
         if self.active_op.is_zero_layer():
             self.AP_path_alpha.grad = None
@@ -173,7 +197,8 @@ def on_forward_layer_choice(self, mutable, *inputs):
         -------
         torch.Tensor
         """
-        return self.mixed_ops[mutable.key].forward(*inputs), None
+        idx = self.mixed_ops[mutable.key].active_op_index
+        return self.mixed_ops[mutable.key].forward(*inputs), idx
 
     def reset_binary_gates(self):
         for k in self.mixed_ops.keys():
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 3538583714..913e94fb7c 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -88,6 +88,8 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device):
         self.train_loader = train_loader
         self.valid_loader = valid_loader
         self.device = device
+        self.n_epochs = 150
+        self.init_lr = 0.05
         # init mutator
         self.mutator = ProxylessNasMutator(model)
         self._valid_iter = None
@@ -178,7 +180,8 @@ def _warm_up(self, warmup_epochs=25):
                 images, labels = images.to(self.device), labels.to(self.device)
                 # compute output
                 self.mutator.reset_binary_gates() # random sample binary gates
-                output = self.model(images)
+                with self.mutator.forward_pass():
+                    output = self.model(images)
                 label_smoothing = 0.1
                 if label_smoothing > 0:
                     loss = cross_entropy_with_label_smoothing(output, labels, label_smoothing)
@@ -226,10 +229,12 @@ def _calc_learning_rate(self, epoch, batch=0, nBatch=None):
         T_total = self.n_epochs * nBatch
         T_cur = epoch * nBatch + batch
         lr = 0.5 * self.init_lr * (1 + math.cos(math.pi * T_cur / T_total))
+        return lr
 
     def _adjust_learning_rate(self, optimizer, epoch, batch=0, nBatch=None):
         """ adjust learning of a given optimizer and return the new learning rate """
         new_lr = self._calc_learning_rate(epoch, batch, nBatch)
+        print('-----------------------------: ', new_lr)
         for param_group in optimizer.param_groups:
             param_group['lr'] = new_lr
         return new_lr
@@ -267,7 +272,8 @@ def _train(self):
                 # train weight parameters
                 images, labels = images.to(self.device), labels.to(self.device)
                 self.mutator.reset_binary_gates()
-                output = self.model(images)
+                with self.mutator.forward_pass():
+                    output = self.model(images)
                 label_smoothing = 0.1
                 if label_smoothing > 0:
                     loss = cross_entropy_with_label_smoothing(output, labels, label_smoothing)
@@ -280,7 +286,8 @@ def _train(self):
                 self.model.zero_grad()
                 loss.backward()
                 self.model_optim.step()
-                if epoch > 0:
+                #if epoch > 0:
+                if epoch >= 0:
                     for j in range(update_schedule.get(i, 0)):
                         start_time = time.time()
                         # GradientArchSearchConfig
@@ -318,7 +325,7 @@ def _valid_next_batch(self):
         return data
 
     def _gradient_step(self):
-        self.valid_loader.batch_sampler.batch_size = 256
+        self.valid_loader.batch_sampler.batch_size = 2 #256
         self.valid_loader.batch_sampler.drop_last = True
         self.model.train()
         time1 = time.time()  # time
@@ -327,7 +334,8 @@ def _gradient_step(self):
         images, labels = images.to(self.device), labels.to(self.device)
         time2 = time.time()  # time
         self.mutator.reset_binary_gates()
-        output = self.model(images)
+        with self.mutator.forward_pass():
+            output = self.model(images)
         time3 = time.time()
         ce_loss = self.criterion(output, labels)
         expected_value = None
@@ -338,9 +346,10 @@ def _gradient_step(self):
         self.arch_optimizer.step()
         time4 = time.time()
         print('(%.4f, %.4f, %.4f)' % (time2 - time1, time3 - time2, time4 - time3))
+        return loss.data.item(), expected_value.item() if expected_value is not None else None
 
     def train(self):
-        self._warm_up()
+        #self._warm_up()
         self._train()
 
     def export(self):

From 088a56c6ae268e79f77b4061c496b769a7998cef Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Sun, 17 Nov 2019 20:37:43 +0800
Subject: [PATCH 10/60] update

---
 examples/nas/proxylessnas/ops.py | 294 +------------------------------
 1 file changed, 5 insertions(+), 289 deletions(-)

diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py
index 8a67ca3988..f968f68f7c 100644
--- a/examples/nas/proxylessnas/ops.py
+++ b/examples/nas/proxylessnas/ops.py
@@ -74,27 +74,7 @@ def build_activation(act_func, inplace=True):
 
 #========================================
 
-class MyModule(nn.Module):
-    
-    def forward(self, x):
-        raise NotImplementedError
-
-    @property
-    def module_str(self):
-        raise NotImplementedError
-
-    @property
-    def config(self):
-        raise NotImplementedError
-
-    @staticmethod
-    def build_from_config(config):
-        raise NotImplementedError
-
-    def get_flops(self, x):
-        raise NotImplementedError
-
-class MobileInvertedResidualBlock(MyModule):
+class MobileInvertedResidualBlock(nn.Module):
     
     def __init__(self, mobile_inverted_conv, shortcut):
         super(MobileInvertedResidualBlock, self).__init__()
@@ -116,34 +96,6 @@ def forward(self, x):
            res = skip_x + conv_x
         return res
 
-    @property
-    def module_str(self):
-        return '(%s, %s)' % (
-            self.mobile_inverted_conv.module_str, self.shortcut.module_str if self.shortcut is not None else None
-        )
-
-    @property
-    def config(self):
-        return {
-            'name': MobileInvertedResidualBlock.__name__,
-            'mobile_inverted_conv': self.mobile_inverted_conv.config,
-            'shortcut': self.shortcut.config if self.shortcut is not None else None,
-        }
-
-    @staticmethod
-    def build_from_config(config):
-        mobile_inverted_conv = set_layer_from_config(config['mobile_inverted_conv'])
-        shortcut = set_layer_from_config(config['shortcut'])
-        return MobileInvertedResidualBlock(mobile_inverted_conv, shortcut)
-
-    def get_flops(self, x):
-        flops1, conv_x = self.mobile_inverted_conv.get_flops(x)
-        if self.shortcut:
-            flops2, _ = self.shortcut.get_flops(x)
-        else:
-            flops2 = 0
-
-        return flops1 + flops2, self.forward(x)
 
 #========================================
 
@@ -170,27 +122,7 @@ def forward(self, x):
         x = x.view(batchsize, -1, height, width)
         return x
 
-class MyModule(nn.Module):
-
-    def forward(self, x):
-        raise NotImplementedError
-
-    @property
-    def module_str(self):
-        raise NotImplementedError
-
-    @property
-    def config(self):
-        raise NotImplementedError
-
-    @staticmethod
-    def build_from_config(config):
-        raise NotImplementedError
-
-    def get_flops(self, x):
-        raise NotImplementedError
-
-class My2DLayer(MyModule):
+class My2DLayer(nn.Module):
     
     def __init__(self, in_channels, out_channels,
                  use_bn=True, act_func='relu', dropout_rate=0, ops_order='weight_bn_act'):
@@ -251,35 +183,11 @@ def bn_before_weight(self):
     def weight_op(self):
         raise NotImplementedError
 
-    """ Methods defined in MyModule """
-
     def forward(self, x):
         for module in self._modules.values():
             x = module(x)
         return x
 
-    @property
-    def module_str(self):
-        raise NotImplementedError
-
-    @property
-    def config(self):
-        return {
-            'in_channels': self.in_channels,
-            'out_channels': self.out_channels,
-            'use_bn': self.use_bn,
-            'act_func': self.act_func,
-            'dropout_rate': self.dropout_rate,
-            'ops_order': self.ops_order,
-        }
-
-    @staticmethod
-    def build_from_config(config):
-        raise NotImplementedError
-
-    def get_flops(self, x):
-        raise NotImplementedError
-
     @staticmethod
     def is_zero_layer():
         return False
@@ -317,43 +225,6 @@ def weight_op(self):
 
         return weight_dict
 
-    @property
-    def module_str(self):
-        if isinstance(self.kernel_size, int):
-            kernel_size = (self.kernel_size, self.kernel_size)
-        else:
-            kernel_size = self.kernel_size
-        if self.groups == 1:
-            if self.dilation > 1:
-                return '%dx%d_DilatedConv' % (kernel_size[0], kernel_size[1])
-            else:
-                return '%dx%d_Conv' % (kernel_size[0], kernel_size[1])
-        else:
-            if self.dilation > 1:
-                return '%dx%d_DilatedGroupConv' % (kernel_size[0], kernel_size[1])
-            else:
-                return '%dx%d_GroupConv' % (kernel_size[0], kernel_size[1])
-
-    @property
-    def config(self):
-        return {
-            'name': ConvLayer.__name__,
-            'kernel_size': self.kernel_size,
-            'stride': self.stride,
-            'dilation': self.dilation,
-            'groups': self.groups,
-            'bias': self.bias,
-            'has_shuffle': self.has_shuffle,
-            **super(ConvLayer, self).config,
-        }
-
-    @staticmethod
-    def build_from_config(config):
-        return ConvLayer(**config)
-
-    def get_flops(self, x):
-        return count_conv_flop(self.conv, x), self.forward(x)
-
 
 class DepthConvLayer(My2DLayer):
 
@@ -391,41 +262,6 @@ def weight_op(self):
             weight_dict['shuffle'] = ShuffleLayer(self.groups)
         return weight_dict
 
-    @property
-    def module_str(self):
-        if isinstance(self.kernel_size, int):
-            kernel_size = (self.kernel_size, self.kernel_size)
-        else:
-            kernel_size = self.kernel_size
-        if self.dilation > 1:
-            return '%dx%d_DilatedDepthConv' % (kernel_size[0], kernel_size[1])
-        else:
-            return '%dx%d_DepthConv' % (kernel_size[0], kernel_size[1])
-
-    @property
-    def config(self):
-        return {
-            'name': DepthConvLayer.__name__,
-            'kernel_size': self.kernel_size,
-            'stride': self.stride,
-            'dilation': self.dilation,
-            'groups': self.groups,
-            'bias': self.bias,
-            'has_shuffle': self.has_shuffle,
-            **super(DepthConvLayer, self).config,
-        }
-
-    @staticmethod
-    def build_from_config(config):
-        return DepthConvLayer(**config)
-
-    def get_flops(self, x):
-        depth_flop = count_conv_flop(self.depth_conv, x)
-        x = self.depth_conv(x)
-        point_flop = count_conv_flop(self.point_conv, x)
-        x = self.point_conv(x)
-        return depth_flop + point_flop, self.forward(x)
-
 
 class PoolingLayer(My2DLayer):
 
@@ -456,31 +292,6 @@ def weight_op(self):
             raise NotImplementedError
         return weight_dict
 
-    @property
-    def module_str(self):
-        if isinstance(self.kernel_size, int):
-            kernel_size = (self.kernel_size, self.kernel_size)
-        else:
-            kernel_size = self.kernel_size
-        return '%dx%d_%sPool' % (kernel_size[0], kernel_size[1], self.pool_type.upper())
-
-    @property
-    def config(self):
-        return {
-            'name': PoolingLayer.__name__,
-            'pool_type': self.pool_type,
-            'kernel_size': self.kernel_size,
-            'stride': self.stride,
-            **super(PoolingLayer, self).config
-        }
-
-    @staticmethod
-    def build_from_config(config):
-        return PoolingLayer(**config)
-
-    def get_flops(self, x):
-        return 0, self.forward(x)
-
 
 class IdentityLayer(My2DLayer):
 
@@ -491,26 +302,8 @@ def __init__(self, in_channels, out_channels,
     def weight_op(self):
         return None
 
-    @property
-    def module_str(self):
-        return 'Identity'
-
-    @property
-    def config(self):
-        return {
-            'name': IdentityLayer.__name__,
-            **super(IdentityLayer, self).config,
-        }
-
-    @staticmethod
-    def build_from_config(config):
-        return IdentityLayer(**config)
 
-    def get_flops(self, x):
-        return 0, self.forward(x)
-
-
-class LinearLayer(MyModule):
+class LinearLayer(nn.Module):
 
     def __init__(self, in_features, out_features, bias=True,
                  use_bn=False, act_func=None, dropout_rate=0, ops_order='weight_bn_act'):
@@ -575,36 +368,12 @@ def forward(self, x):
             x = module(x)
         return x
 
-    @property
-    def module_str(self):
-        return '%dx%d_Linear' % (self.in_features, self.out_features)
-
-    @property
-    def config(self):
-        return {
-            'name': LinearLayer.__name__,
-            'in_features': self.in_features,
-            'out_features': self.out_features,
-            'bias': self.bias,
-            'use_bn': self.use_bn,
-            'act_func': self.act_func,
-            'dropout_rate': self.dropout_rate,
-            'ops_order': self.ops_order,
-        }
-
-    @staticmethod
-    def build_from_config(config):
-        return LinearLayer(**config)
-
-    def get_flops(self, x):
-        return self.linear.weight.numel(), self.forward(x)
-
     @staticmethod
     def is_zero_layer():
         return False
 
 
-class MBInvertedConvLayer(MyModule):
+class MBInvertedConvLayer(nn.Module):
 
     def __init__(self, in_channels, out_channels,
                  kernel_size=3, stride=1, expand_ratio=6, mid_channels=None):
@@ -651,47 +420,12 @@ def forward(self, x):
         x = self.point_linear(x)
         return x
 
-    @property
-    def module_str(self):
-        return '%dx%d_MBConv%d' % (self.kernel_size, self.kernel_size, self.expand_ratio)
-
-    @property
-    def config(self):
-        return {
-            'name': MBInvertedConvLayer.__name__,
-            'in_channels': self.in_channels,
-            'out_channels': self.out_channels,
-            'kernel_size': self.kernel_size,
-            'stride': self.stride,
-            'expand_ratio': self.expand_ratio,
-            'mid_channels': self.mid_channels,
-        }
-
-    @staticmethod
-    def build_from_config(config):
-        return MBInvertedConvLayer(**config)
-
-    def get_flops(self, x):
-        if self.inverted_bottleneck:
-            flop1 = count_conv_flop(self.inverted_bottleneck.conv, x)
-            x = self.inverted_bottleneck(x)
-        else:
-            flop1 = 0
-
-        flop2 = count_conv_flop(self.depth_conv.conv, x)
-        x = self.depth_conv(x)
-
-        flop3 = count_conv_flop(self.point_linear.conv, x)
-        x = self.point_linear(x)
-
-        return flop1 + flop2 + flop3, x
-
     @staticmethod
     def is_zero_layer():
         return False
 
 
-class ZeroLayer(MyModule):
+class ZeroLayer(nn.Module):
 
     def __init__(self, stride):
         super(ZeroLayer, self).__init__()
@@ -707,24 +441,6 @@ def forward(self, x):
         return padding'''
         return x * 0
 
-    @property
-    def module_str(self):
-        return 'Zero'
-
-    @property
-    def config(self):
-        return {
-            'name': ZeroLayer.__name__,
-            'stride': self.stride,
-        }
-
-    @staticmethod
-    def build_from_config(config):
-        return ZeroLayer(**config)
-
-    def get_flops(self, x):
-        return 0, self.forward(x)
-
     @staticmethod
     def is_zero_layer():
         return True

From 0a47184956c74431af725ca55d0175804dd000ec Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Sun, 17 Nov 2019 20:52:19 +0800
Subject: [PATCH 11/60] update

---
 .../nas/proxylessnas/{search.py => main.py}   |  0
 examples/nas/proxylessnas/ops.py              | 44 ++-----------------
 examples/nas/proxylessnas/putils.py           | 27 ++++++++++++
 3 files changed, 31 insertions(+), 40 deletions(-)
 rename examples/nas/proxylessnas/{search.py => main.py} (100%)

diff --git a/examples/nas/proxylessnas/search.py b/examples/nas/proxylessnas/main.py
similarity index 100%
rename from examples/nas/proxylessnas/search.py
rename to examples/nas/proxylessnas/main.py
diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py
index f968f68f7c..a7c3bf1b44 100644
--- a/examples/nas/proxylessnas/ops.py
+++ b/examples/nas/proxylessnas/ops.py
@@ -22,6 +22,8 @@
 import torch
 import torch.nn as nn
 
+from putils import get_same_padding, build_activation
+
 
 OPS = {
     'Identity': lambda in_C, out_C, stride: IdentityLayer(in_C, out_C, ops_order='weight_bn_act'),
@@ -46,33 +48,6 @@
     '7x7_MBConv6': lambda in_C, out_C, stride: MBInvertedConvLayer(in_C, out_C, 7, stride, 6)
 }
 
-#========================================
-
-def get_same_padding(kernel_size):
-    if isinstance(kernel_size, tuple):
-        assert len(kernel_size) == 2, 'invalid kernel size: %s' % kernel_size
-        p1 = get_same_padding(kernel_size[0])
-        p2 = get_same_padding(kernel_size[1])
-        return p1, p2
-    assert isinstance(kernel_size, int), 'kernel size should be either `int` or `tuple`'
-    assert kernel_size % 2 > 0, 'kernel size should be odd number'
-    return kernel_size // 2
-
-def build_activation(act_func, inplace=True):
-    if act_func == 'relu':
-        return nn.ReLU(inplace=inplace)
-    elif act_func == 'relu6':
-        return nn.ReLU6(inplace=inplace)
-    elif act_func == 'tanh':
-        return nn.Tanh()
-    elif act_func == 'sigmoid':
-        return nn.Sigmoid()
-    elif act_func is None:
-        return None
-    else:
-        raise ValueError('do not support: %s' % act_func)
-
-#========================================
 
 class MobileInvertedResidualBlock(nn.Module):
     
@@ -84,28 +59,17 @@ def __init__(self, mobile_inverted_conv, shortcut):
 
     def forward(self, x):
         out, idx = self.mobile_inverted_conv(x)
-        print('*****************************idx: ', idx)
         if idx == 6:
             res = x
-            #res = out
         elif self.shortcut is None:
-            res = out #self.mobile_inverted_conv(x)
+            res = out
         else:
-           conv_x = out #self.mobile_inverted_conv(x)
+           conv_x = out
            skip_x = self.shortcut(x)
            res = skip_x + conv_x
         return res
 
 
-#========================================
-
-def count_conv_flop(layer, x):
-    out_h = int(x.size()[2] / layer.stride[0])
-    out_w = int(x.size()[3] / layer.stride[1])
-    delta_ops = layer.in_channels * layer.out_channels * layer.kernel_size[0] * layer.kernel_size[1] * \
-                out_h * out_w / layer.groups
-    return delta_ops
-
 class ShuffleLayer(nn.Module):
     def __init__(self, groups):
         super(ShuffleLayer, self).__init__()
diff --git a/examples/nas/proxylessnas/putils.py b/examples/nas/proxylessnas/putils.py
index 5c1d47d1f3..9e5bd6451d 100644
--- a/examples/nas/proxylessnas/putils.py
+++ b/examples/nas/proxylessnas/putils.py
@@ -18,6 +18,33 @@
 # DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 
+import torch.nn as nn
+
+
+def get_same_padding(kernel_size):
+    if isinstance(kernel_size, tuple):
+        assert len(kernel_size) == 2, 'invalid kernel size: %s' % kernel_size
+        p1 = get_same_padding(kernel_size[0])
+        p2 = get_same_padding(kernel_size[1])
+        return p1, p2
+    assert isinstance(kernel_size, int), 'kernel size should be either `int` or `tuple`'
+    assert kernel_size % 2 > 0, 'kernel size should be odd number'
+    return kernel_size // 2
+
+def build_activation(act_func, inplace=True):
+    if act_func == 'relu':
+        return nn.ReLU(inplace=inplace)
+    elif act_func == 'relu6':
+        return nn.ReLU6(inplace=inplace)
+    elif act_func == 'tanh':
+        return nn.Tanh()
+    elif act_func == 'sigmoid':
+        return nn.Sigmoid()
+    elif act_func is None:
+        return None
+    else:
+        raise ValueError('do not support: %s' % act_func)
+
 
 def make_divisible(v, divisor, min_val=None):
     """

From 52dd7403b023b5fd7cbde6f1eef98f1fd5930bef Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Mon, 18 Nov 2019 09:41:15 +0800
Subject: [PATCH 12/60] update

---
 examples/nas/proxylessnas/main.py   | 34 +++++------------------------
 examples/nas/proxylessnas/putils.py | 25 +++++++++++++++++++++
 2 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py
index e1624c7304..43a9c80aeb 100644
--- a/examples/nas/proxylessnas/main.py
+++ b/examples/nas/proxylessnas/main.py
@@ -24,40 +24,16 @@
 import torch
 import torch.nn as nn
 
-from model import *
+from putils import get_parameters
+from model import SearchMobileNet
 from nni.nas.pytorch.proxylessnas import ProxylessNasTrainer
 
-def get_parameters(model, keys=None, mode='include'):
-    if keys is None:
-        for name, param in model.named_parameters():
-            yield param
-    elif mode == 'include':
-        for name, param in model.named_parameters():
-            flag = False
-            for key in keys:
-                if key in name:
-                    flag = True
-                    break
-            if flag:
-                yield param
-    elif mode == 'exclude':
-        for name, param in model.named_parameters():
-            flag = True
-            for key in keys:
-                if key in name:
-                    flag = False
-                    break
-            if flag:
-                yield param
-    else:
-        raise ValueError('do not support: %s' % mode)
-
 
 if __name__ == "__main__":
     parser = ArgumentParser("proxylessnas")
-    parser.add_argument("--layers", default=4, type=int)
-    parser.add_argument("--nodes", default=2, type=int)
-    parser.add_argument("--batch-size", default=128, type=int)
+    parser.add_argument("--n_cell_stages", default='4,4,4,4,4,1', type=str)
+    parser.add_argument("--stride_stages", default='2,2,2,1,2,1', type=str)
+    parser.add_argument("--width_stages", default='24,40,80,96,192,320', type=str)
     parser.add_argument("--log-frequency", default=1, type=int)
     args = parser.parse_args()
 
diff --git a/examples/nas/proxylessnas/putils.py b/examples/nas/proxylessnas/putils.py
index 9e5bd6451d..cf2b23d6b5 100644
--- a/examples/nas/proxylessnas/putils.py
+++ b/examples/nas/proxylessnas/putils.py
@@ -20,6 +20,31 @@
 
 import torch.nn as nn
 
+def get_parameters(model, keys=None, mode='include'):
+    if keys is None:
+        for name, param in model.named_parameters():
+            yield param
+    elif mode == 'include':
+        for name, param in model.named_parameters():
+            flag = False
+            for key in keys:
+                if key in name:
+                    flag = True
+                    break
+            if flag:
+                yield param
+    elif mode == 'exclude':
+        for name, param in model.named_parameters():
+            flag = True
+            for key in keys:
+                if key in name:
+                    flag = False
+                    break
+            if flag:
+                yield param
+    else:
+        raise ValueError('do not support: %s' % mode)
+
 
 def get_same_padding(kernel_size):
     if isinstance(kernel_size, tuple):

From 95b1974a407b702bebe42693626c69f1bf34392c Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Mon, 18 Nov 2019 18:31:55 +0800
Subject: [PATCH 13/60] update

---
 examples/nas/proxylessnas/main.py             |  52 ++++----
 .../nni/nas/pytorch/proxylessnas/mutator.py   |   3 -
 .../nni/nas/pytorch/proxylessnas/trainer.py   | 114 ++++++++++++------
 3 files changed, 110 insertions(+), 59 deletions(-)

diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py
index 43a9c80aeb..f46844a097 100644
--- a/examples/nas/proxylessnas/main.py
+++ b/examples/nas/proxylessnas/main.py
@@ -31,35 +31,46 @@
 
 if __name__ == "__main__":
     parser = ArgumentParser("proxylessnas")
+    # configurations of the model
     parser.add_argument("--n_cell_stages", default='4,4,4,4,4,1', type=str)
     parser.add_argument("--stride_stages", default='2,2,2,1,2,1', type=str)
     parser.add_argument("--width_stages", default='24,40,80,96,192,320', type=str)
-    parser.add_argument("--log-frequency", default=1, type=int)
+    parser.add_argument("--bn_momentum", default=0.1, type=float)
+    parser.add_argument("--bn_eps", default=1e-3, type=float)
+    parser.add_argument("--dropout_rate", default=0, type=float)
+    parser.add_argument("--no_decay_keys", default='bn', type=str, choices=[None, 'bn', 'bn#bias'])
+    # configurations of imagenet dataset
+    parser.add_argument("--data_path", default='/data/hdd3/yugzh/imagenet/', type=str)
+    parser.add_argument("--train_batch_size", default=2, type=int)
+    parser.add_argument("--test_batch_size", default=2, type=int)
+    parser.add_argument("--n_worker", default=0, type=int)
+    parser.add_argument("--resize_scale", default=0.08, type=float)
+    parser.add_argument("--distort_color", default='normal', type=str, choices=['normal', 'strong', 'None'])
+    #parser.add_argument("--log-frequency", default=1, type=int)
     args = parser.parse_args()
 
-    #dataset_train, dataset_valid = datasets.get_dataset("cifar10")
-
-    model = SearchMobileNet()
+    model = SearchMobileNet(width_stages=[int(i) for i in args.width_stages.split(',')],
+                            n_cell_stages=[int(i) for i in args.n_cell_stages.split(',')],
+                            stride_stages=[int(i) for i in args.stride_stages.split(',')],
+                            n_classes=1000,
+                            dropout_rate=args.dropout_rate,
+                            bn_param=(args.bn_momentum, args.bn_eps))
     print('=============================================SearchMobileNet model create done')
     model.init_model()
     print('=============================================SearchMobileNet model init done')
 
     # move network to GPU if available
+    # data parallelism not supported yet
     if torch.cuda.is_available():
         device = torch.device('cuda:0')
-        #self.net = torch.nn.DataParallel(self.net)
         model.to(device)
-        #cudnn.benchmark = True
     else:
-        raise ValueError
-        # self.device = torch.device('cpu')
+        device = torch.device('cpu')
 
     # TODO: net info
 
-    # TODO: removed decay_key
-    no_decay_keys = True
-    if no_decay_keys:
-        keys = ['bn']
+    if args.no_decay_keys:
+        keys = args.no_decay_keys
         momentum, nesterov = 0.9, True
         optimizer = torch.optim.SGD([
             {'params': get_parameters(model, keys, mode='exclude'), 'weight_decay': 4e-5},
@@ -68,18 +79,15 @@
     else:
         optimizer = torch.optim.SGD(model, get_parameters(), lr=0.05, momentum=momentum, nesterov=nesterov, weight_decay=4e-5)
 
-    #n_epochs = 50
-    #lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optim, n_epochs, eta_min=0.001)
-
     print('=============================================Start to create data provider')
     # TODO: 
-    data_provider = datasets.ImagenetDataProvider(save_path='/data/hdd3/yugzh/imagenet/',
-                                         train_batch_size=2, #256,
-                                         test_batch_size=2, #500,
-                                         valid_size=None,
-                                         n_worker=0, #32,
-                                         resize_scale=0.08,
-                                         distort_color='normal')
+    data_provider = datasets.ImagenetDataProvider(save_path=args.data_path,
+                                                  train_batch_size=args.train_batch_size, #256,
+                                                  test_batch_size=args.test_batch_size, #500,
+                                                  valid_size=None,
+                                                  n_worker=args.n_worker, #32,
+                                                  resize_scale=args.resize_scale,
+                                                  distort_color=args.distort_color)
     print('=============================================Finish to create data provider')
     train_loader = data_provider.train
     valid_loader = data_provider.valid
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 3afa8cbd0d..3e9ba93de4 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -40,20 +40,17 @@ class ArchGradientFunction(torch.autograd.Function):
     def forward(ctx, x, binary_gates, run_func, backward_func):
         ctx.run_func = run_func
         ctx.backward_func = backward_func
-        #ctx.mutable_key = mutable_key
 
         detached_x = detach_variable(x)
         with torch.enable_grad():
             output = run_func(detached_x)
         ctx.save_for_backward(detached_x, output)
         print('ctx forward: ', ctx.__dict__)
-        #print('mutable key: ', ctx.mutable_key)
         return output.data
 
     @staticmethod
     def backward(ctx, grad_output):
         print('ctx backward: ', ctx.__dict__)
-        #print('mutable key: ', ctx.mutable_key)
         detached_x, output = ctx.saved_tensors
 
         grad_x = torch.autograd.grad(output, detached_x, grad_output, only_inputs=True)
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 913e94fb7c..39d27f09c8 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -82,27 +82,67 @@ def accuracy(output, target, topk=(1,)):
     return res
 
 class ProxylessNasTrainer(Trainer):
-    def __init__(self, model, model_optim, train_loader, valid_loader, device):
+    def __init__(self, model, model_optim, train_loader, valid_loader, device,
+                 n_epochs=150, init_lr=0.05, arch_init_type='normal', arch_init_ratio=1e-3,
+                 arch_optim_lr=1e-3, arch_weight_decay=0, warmup=True, warmup_epochs=25,
+                 arch_valid_frequency=1):
+        """
+        Parameters
+        ----------
+        model : pytorch model
+        model_optim : pytorch optimizer
+        train_loader : pytorch data loader
+        valid_loader : pytorch data loader
+        device : device
+        n_epochs : int
+        init_lr : float
+            init learning rate for training the model
+        arch_init_type : str
+            the way to init architecture parameters
+        arch_init_ratio : float
+            the ratio to init architecture parameters
+        arch_optim_lr : float
+            learning rate of the architecture parameters optimizer
+        arch_weight_decay : float
+            weight decay of the architecture parameters optimizer
+        warmup : bool
+            whether to do warmup
+        warmup_epochs : int
+            the number of epochs to do in warmup
+        """
         self.model = model
         self.model_optim = model_optim
         self.train_loader = train_loader
         self.valid_loader = valid_loader
         self.device = device
-        self.n_epochs = 150
-        self.init_lr = 0.05
+        self.n_epochs = n_epochs
+        self.init_lr = init_lr
+        self.warmup = warmup
+        self.warmup_epochs = warmup_epochs
+        self.arch_valid_frequency = arch_valid_frequency
+
+        self.train_epochs = 120
+        self.lr_max = 0.05
+        self.label_smoothing = 0.1
+        self.valid_batch_size = 500
+        self.arch_grad_valid_batch_size = 2 # 256
+        # update architecture parameters every this number of minibatches
+        self.grad_update_arch_param_every = 5
+        # the number of steps per architecture parameter update
+        self.grad_update_steps = 1
+
         # init mutator
         self.mutator = ProxylessNasMutator(model)
         self._valid_iter = None
 
         # TODO: arch search configs
 
-        self._init_arch_params()
+        self._init_arch_params(arch_init_type, arch_init_ratio)
 
         # build architecture optimizer
-        self.arch_optimizer = torch.optim.Adam(self.mutator.get_architecture_parameters(), 1e-3, weight_decay=0)
-
-        self.warmup = True
-        self.warmup_epoch = 0
+        self.arch_optimizer = torch.optim.Adam(self.mutator.get_architecture_parameters(),
+                                               arch_optim_lr,
+                                               weight_decay=arch_weight_decay)
 
         self.criterion = nn.CrossEntropyLoss()
 
@@ -116,7 +156,7 @@ def _init_arch_params(self, init_type='normal', init_ratio=1e-3):
                 raise NotImplementedError
 
     def _validate(self):
-        self.valid_loader.batch_sampler.batch_size = 500
+        self.valid_loader.batch_sampler.batch_size = self.valid_batch_size
         self.valid_loader.batch_sampler.drop_last = False
 
         self.mutator.set_chosen_op_active()
@@ -151,13 +191,12 @@ def _validate(self):
                     print(test_log)
         return losses.avg, top1.avg, top5.avg
 
-    def _warm_up(self, warmup_epochs=25):
-        lr_max = 0.05
+    def _warm_up(self):
         data_loader = self.train_loader
         nBatch = len(data_loader)
-        T_total = warmup_epochs * nBatch # total num of batches
+        T_total = self.warmup_epochs * nBatch # total num of batches
 
-        for epoch in range(self.warmup_epoch, warmup_epochs):
+        for epoch in range(self.warmup_epochs):
             print('\n', '-' * 30, 'Warmup epoch: %d' % (epoch + 1), '-' * 30, '\n')
             batch_time = AverageMeter()
             data_time = AverageMeter()
@@ -174,7 +213,7 @@ def _warm_up(self, warmup_epochs=25):
                 data_time.update(time.time() - end)
                 # lr
                 T_cur = epoch * nBatch + i
-                warmup_lr = 0.5 * lr_max * (1 + math.cos(math.pi * T_cur / T_total))
+                warmup_lr = 0.5 * self.lr_max * (1 + math.cos(math.pi * T_cur / T_total))
                 for param_group in self.model_optim.param_groups:
                     param_group['lr'] = warmup_lr
                 images, labels = images.to(self.device), labels.to(self.device)
@@ -182,9 +221,8 @@ def _warm_up(self, warmup_epochs=25):
                 self.mutator.reset_binary_gates() # random sample binary gates
                 with self.mutator.forward_pass():
                     output = self.model(images)
-                label_smoothing = 0.1
-                if label_smoothing > 0:
-                    loss = cross_entropy_with_label_smoothing(output, labels, label_smoothing)
+                if self.label_smoothing > 0:
+                    loss = cross_entropy_with_label_smoothing(output, labels, self.label_smoothing)
                 else:
                     loss = self.criterion(output, labels)
                 # measure accuracy and record loss
@@ -210,19 +248,17 @@ def _warm_up(self, warmup_epochs=25):
                         format(epoch + 1, i, nBatch - 1, batch_time=batch_time, data_time=data_time,
                                losses=losses, top1=top1, top5=top5, lr=warmup_lr)
                     print(batch_log)
-            valid_res, flops, latency = self._validate()
+            val_loss, val_top1, val_top5 = self._validate()
             val_log = 'Warmup Valid [{0}/{1}]\tloss {2:.3f}\ttop-1 acc {3:.3f}\ttop-5 acc {4:.3f}\t' \
-                      'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}\tflops: {5:.1f}M'. \
-                format(epoch + 1, warmup_epochs, *valid_res, flops / 1e6, top1=top1, top5=top5)
+                      'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}M'. \
+                format(epoch + 1, self.warmup_epochs, val_loss, val_top1, val_top5, top1=top1, top5=top5)
             print(val_log)
 
     def _get_update_schedule(self, nBatch):
         schedule = {}
-        grad_update_arch_param_every = 5
-        grad_update_steps = 1
         for i in range(nBatch):
-            if (i + 1) % grad_update_arch_param_every == 0:
-                schedule[i] = grad_update_steps
+            if (i + 1) % self.grad_update_arch_param_every == 0:
+                schedule[i] = self.grad_update_steps
         return schedule
 
     def _calc_learning_rate(self, epoch, batch=0, nBatch=None):
@@ -232,7 +268,9 @@ def _calc_learning_rate(self, epoch, batch=0, nBatch=None):
         return lr
 
     def _adjust_learning_rate(self, optimizer, epoch, batch=0, nBatch=None):
-        """ adjust learning of a given optimizer and return the new learning rate """
+        """
+        Adjust learning of a given optimizer and return the new learning rate
+        """
         new_lr = self._calc_learning_rate(epoch, batch, nBatch)
         print('-----------------------------: ', new_lr)
         for param_group in optimizer.param_groups:
@@ -251,7 +289,7 @@ def _train(self):
 
         update_schedule = self._get_update_schedule(nBatch)
 
-        for epoch in range(0, 120):
+        for epoch in range(self.train_epochs):
             print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n')
             batch_time = AverageMeter()
             data_time = AverageMeter()
@@ -274,9 +312,8 @@ def _train(self):
                 self.mutator.reset_binary_gates()
                 with self.mutator.forward_pass():
                     output = self.model(images)
-                label_smoothing = 0.1
-                if label_smoothing > 0:
-                    loss = cross_entropy_with_label_smoothing(output, labels, label_smoothing)
+                if self.label_smoothing > 0:
+                    loss = cross_entropy_with_label_smoothing(output, labels, self.label_smoothing)
                 else:
                     loss = self.criterion(output, labels)
                 acc1, acc5 = accuracy(output, labels, topk=(1, 5))
@@ -286,7 +323,7 @@ def _train(self):
                 self.model.zero_grad()
                 loss.backward()
                 self.model_optim.step()
-                #if epoch > 0:
+                # TODO: if epoch > 0:
                 if epoch >= 0:
                     for j in range(update_schedule.get(i, 0)):
                         start_time = time.time()
@@ -310,8 +347,16 @@ def _train(self):
                         format(epoch + 1, i, nBatch - 1, batch_time=batch_time, data_time=data_time,
                                losses=losses, entropy=entropy, top1=top1, top5=top5, lr=lr)
                     print(batch_log)
-                # TODO: print current network architecture
-                # TODO: validate
+            # TODO: print current network architecture
+            # validate
+            if (epoch + 1) % self.arch_valid_frequency == 0:
+                val_loss, val_top1, val_top5 = self._validate()
+                val_log = 'Valid [{0}]\tloss {2:.3f}\ttop-1 acc {3:.3f} \ttop-5 acc {5:.3f}\t' \
+                          'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}\t' \
+                          'Entropy {entropy.val:.5f}M'. \
+                    format(epoch + 1, val_loss, val_top1,
+                           val_top5, entropy=entropy, top1=top1, top5=top5)
+                print(val_log)
         # convert to normal network according to architecture parameters
 
     def _valid_next_batch(self):
@@ -325,7 +370,7 @@ def _valid_next_batch(self):
         return data
 
     def _gradient_step(self):
-        self.valid_loader.batch_sampler.batch_size = 2 #256
+        self.valid_loader.batch_sampler.batch_size = self.arch_grad_valid_batch_size
         self.valid_loader.batch_sampler.drop_last = True
         self.model.train()
         time1 = time.time()  # time
@@ -349,7 +394,8 @@ def _gradient_step(self):
         return loss.data.item(), expected_value.item() if expected_value is not None else None
 
     def train(self):
-        #self._warm_up()
+        if self.warmup:
+            self._warm_up()
         self._train()
 
     def export(self):

From 44145e4b3a7c548321db6142da00393a4c77cf44 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Mon, 18 Nov 2019 20:54:48 +0800
Subject: [PATCH 14/60] update

---
 examples/nas/proxylessnas/main.py |  3 +-
 examples/nas/proxylessnas/ops.py  | 67 -------------------------------
 2 files changed, 2 insertions(+), 68 deletions(-)

diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py
index f46844a097..977781df28 100644
--- a/examples/nas/proxylessnas/main.py
+++ b/examples/nas/proxylessnas/main.py
@@ -97,7 +97,8 @@
                                   model_optim=optimizer,
                                   train_loader=train_loader,
                                   valid_loader=valid_loader,
-                                  device=device)
+                                  device=device,
+                                  warmup=False)
 
     print('=============================================Start to train ProxylessNasTrainer')
     trainer.train()
diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py
index a7c3bf1b44..efe9aa6468 100644
--- a/examples/nas/proxylessnas/ops.py
+++ b/examples/nas/proxylessnas/ops.py
@@ -190,73 +190,6 @@ def weight_op(self):
         return weight_dict
 
 
-class DepthConvLayer(My2DLayer):
-
-    def __init__(self, in_channels, out_channels,
-                 kernel_size=3, stride=1, dilation=1, groups=1, bias=False, has_shuffle=False,
-                 use_bn=True, act_func='relu', dropout_rate=0, ops_order='weight_bn_act'):
-        self.kernel_size = kernel_size
-        self.stride = stride
-        self.dilation = dilation
-        self.groups = groups
-        self.bias = bias
-        self.has_shuffle = has_shuffle
-
-        super(DepthConvLayer, self).__init__(
-            in_channels, out_channels, use_bn, act_func, dropout_rate, ops_order
-        )
-
-    def weight_op(self):
-        padding = get_same_padding(self.kernel_size)
-        if isinstance(padding, int):
-            padding *= self.dilation
-        else:
-            padding[0] *= self.dilation
-            padding[1] *= self.dilation
-
-        weight_dict = OrderedDict()
-        weight_dict['depth_conv'] = nn.Conv2d(
-            self.in_channels, self.in_channels, kernel_size=self.kernel_size, stride=self.stride, padding=padding,
-            dilation=self.dilation, groups=self.in_channels, bias=False
-        )
-        weight_dict['point_conv'] = nn.Conv2d(
-            self.in_channels, self.out_channels, kernel_size=1, groups=self.groups, bias=self.bias
-        )
-        if self.has_shuffle and self.groups > 1:
-            weight_dict['shuffle'] = ShuffleLayer(self.groups)
-        return weight_dict
-
-
-class PoolingLayer(My2DLayer):
-
-    def __init__(self, in_channels, out_channels,
-                 pool_type, kernel_size=2, stride=2,
-                 use_bn=False, act_func=None, dropout_rate=0, ops_order='weight_bn_act'):
-        self.pool_type = pool_type
-        self.kernel_size = kernel_size
-        self.stride = stride
-
-        super(PoolingLayer, self).__init__(in_channels, out_channels, use_bn, act_func, dropout_rate, ops_order)
-
-    def weight_op(self):
-        if self.stride == 1:
-            # same padding if `stride == 1`
-            padding = get_same_padding(self.kernel_size)
-        else:
-            padding = 0
-
-        weight_dict = OrderedDict()
-        if self.pool_type == 'avg':
-            weight_dict['pool'] = nn.AvgPool2d(
-                self.kernel_size, stride=self.stride, padding=padding, count_include_pad=False
-            )
-        elif self.pool_type == 'max':
-            weight_dict['pool'] = nn.MaxPool2d(self.kernel_size, stride=self.stride, padding=padding)
-        else:
-            raise NotImplementedError
-        return weight_dict
-
-
 class IdentityLayer(My2DLayer):
 
     def __init__(self, in_channels, out_channels,

From a0febf9932dfe590cafa92858a789848f611f318 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Tue, 19 Nov 2019 09:20:30 +0800
Subject: [PATCH 15/60] update

---
 examples/nas/proxylessnas/model.py            | 32 ++++-----
 examples/nas/proxylessnas/ops.py              |  6 +-
 .../nni/nas/pytorch/proxylessnas/mutator.py   |  4 +-
 .../nni/nas/pytorch/proxylessnas/trainer.py   | 66 ++++++-------------
 4 files changed, 40 insertions(+), 68 deletions(-)

diff --git a/examples/nas/proxylessnas/model.py b/examples/nas/proxylessnas/model.py
index f640e7a916..afabb0acc1 100644
--- a/examples/nas/proxylessnas/model.py
+++ b/examples/nas/proxylessnas/model.py
@@ -69,33 +69,29 @@ def __init__(self,
                     stride = s
                 else:
                     stride = 1
+                op_candidates = [ops.OPS['3x3_MBConv3'](input_channel, width, stride),
+                                 ops.OPS['3x3_MBConv6'](input_channel, width, stride),
+                                 ops.OPS['5x5_MBConv3'](input_channel, width, stride),
+                                 ops.OPS['5x5_MBConv6'](input_channel, width, stride),
+                                 ops.OPS['7x7_MBConv3'](input_channel, width, stride),
+                                 ops.OPS['7x7_MBConv6'](input_channel, width, stride)]
                 if stride == 1 and input_channel == width:
                     # if it is not the first one
-                    conv_op = nas.mutables.LayerChoice([ops.OPS['3x3_MBConv3'](input_channel, width, stride),
-                                               ops.OPS['3x3_MBConv6'](input_channel, width, stride),
-                                               ops.OPS['5x5_MBConv3'](input_channel, width, stride),
-                                               ops.OPS['5x5_MBConv6'](input_channel, width, stride),
-                                               ops.OPS['7x7_MBConv3'](input_channel, width, stride),
-                                               ops.OPS['7x7_MBConv6'](input_channel, width, stride),
-                                               ops.OPS['Zero'](input_channel, width, stride)],
-                                               return_mask=True,
-                                               key="s{}_c{}".format(stage_cnt, i))
+                    op_candidates += [ops.OPS['Zero'](input_channel, width, stride)]
+                    conv_op = nas.mutables.LayerChoice(op_candidates,
+                                                       return_mask=True,
+                                                       key="s{}_c{}".format(stage_cnt, i))
                 else:
-                    conv_op = nas.mutables.LayerChoice([ops.OPS['3x3_MBConv3'](input_channel, width, stride),
-                                               ops.OPS['3x3_MBConv6'](input_channel, width, stride),
-                                               ops.OPS['5x5_MBConv3'](input_channel, width, stride),
-                                               ops.OPS['5x5_MBConv6'](input_channel, width, stride),
-                                               ops.OPS['7x7_MBConv3'](input_channel, width, stride),
-                                               ops.OPS['7x7_MBConv6'](input_channel, width, stride)],
-                                               return_mask=True,
-                                               key="s{}_c{}".format(stage_cnt, i))
+                    conv_op = nas.mutables.LayerChoice(op_candidates,
+                                                       return_mask=True,
+                                                       key="s{}_c{}".format(stage_cnt, i))
                 # shortcut
                 if stride == 1 and input_channel == width:
                     # if not first cell
                     shortcut = ops.IdentityLayer(input_channel, input_channel)
                 else:
                     shortcut = None
-                inverted_residual_block = ops.MobileInvertedResidualBlock(conv_op, shortcut)
+                inverted_residual_block = ops.MobileInvertedResidualBlock(conv_op, shortcut, op_candidates)
                 blocks.append(inverted_residual_block)
                 input_channel = width
             stage_cnt += 1
diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py
index efe9aa6468..8886650739 100644
--- a/examples/nas/proxylessnas/ops.py
+++ b/examples/nas/proxylessnas/ops.py
@@ -51,15 +51,17 @@
 
 class MobileInvertedResidualBlock(nn.Module):
     
-    def __init__(self, mobile_inverted_conv, shortcut):
+    def __init__(self, mobile_inverted_conv, shortcut, op_candidates_list):
         super(MobileInvertedResidualBlock, self).__init__()
 
         self.mobile_inverted_conv = mobile_inverted_conv
         self.shortcut = shortcut
+        self.op_candidates_list = op_candidates_list
 
     def forward(self, x):
         out, idx = self.mobile_inverted_conv(x)
-        if idx == 6:
+        #if idx == 6:
+        if self.op_candidates_list[idx].is_zero_layer():
             res = x
         elif self.shortcut is None:
             res = out
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 3e9ba93de4..2b1d619e99 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -35,7 +35,7 @@ def detach_variable(inputs):
         return x
 
 class ArchGradientFunction(torch.autograd.Function):
-    
+
     @staticmethod
     def forward(ctx, x, binary_gates, run_func, backward_func):
         ctx.run_func = run_func
@@ -70,7 +70,7 @@ def __init__(self, mutable):
         self.inactive_index = None
         self.log_prob = None
         self.current_prob_over_ops = None
-    
+
     def get_AP_path_alpha(self):
         return self.AP_path_alpha
 
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 39d27f09c8..c912815066 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -18,7 +18,6 @@
 # DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 
-import copy
 import math
 import time
 
@@ -26,35 +25,10 @@
 from torch import nn as nn
 
 from nni.nas.pytorch.trainer import Trainer
-from nni.nas.utils import AverageMeterGroup, auto_device
+from nni.nas.utils import AverageMeter
 from .mutator import ProxylessNasMutator
 
 
-class AverageMeter(object):
-    """
-    Computes and stores the average and current value
-    Copied from: https://github.com/pytorch/examples/blob/master/imagenet/main.py
-    """
-
-    def __init__(self):
-        self.val = 0
-        self.avg = 0
-        self.sum = 0
-        self.count = 0
-
-    def reset(self):
-        self.val = 0
-        self.avg = 0
-        self.sum = 0
-        self.count = 0
-
-    def update(self, val, n=1):
-        self.val = val
-        self.sum += val * n
-        self.count += n
-        self.avg = self.sum / self.count
-
-
 def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1):
     logsoftmax = nn.LogSoftmax()
     n_classes = pred.size(1)
@@ -162,10 +136,10 @@ def _validate(self):
         self.mutator.set_chosen_op_active()
         # test on validation set under train mode
         self.model.train()
-        batch_time = AverageMeter()
-        losses = AverageMeter()
-        top1 = AverageMeter()
-        top5 = AverageMeter()
+        batch_time = AverageMeter('batch_time')
+        losses = AverageMeter('losses')
+        top1 = AverageMeter('top1')
+        top5 = AverageMeter('top5')
         end = time.time()
         with torch.no_grad():
             for i, (images, labels) in enumerate(self.valid_loader):
@@ -185,9 +159,9 @@ def _validate(self):
                                         'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'\
                                         'Loss {loss.val:.4f} ({loss.avg:.4f})\t'\
                                         'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})'.\
-                        format(i, len(data_loader) - 1, batch_time=batch_time, loss=losses, top1=top1)
-                    if return_top5:
-                        test_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5)
+                        format(i, len(self.valid_loader) - 1, batch_time=batch_time, loss=losses, top1=top1)
+                    # return top5:
+                    test_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5)
                     print(test_log)
         return losses.avg, top1.avg, top5.avg
 
@@ -198,11 +172,11 @@ def _warm_up(self):
 
         for epoch in range(self.warmup_epochs):
             print('\n', '-' * 30, 'Warmup epoch: %d' % (epoch + 1), '-' * 30, '\n')
-            batch_time = AverageMeter()
-            data_time = AverageMeter()
-            losses = AverageMeter()
-            top1 = AverageMeter()
-            top5 = AverageMeter()
+            batch_time = AverageMeter('batch_time')
+            data_time = AverageMeter('data_time')
+            losses = AverageMeter('losses')
+            top1 = AverageMeter('top1')
+            top5 = AverageMeter('top5')
             # switch to train mode
             self.model.train()
 
@@ -291,12 +265,12 @@ def _train(self):
 
         for epoch in range(self.train_epochs):
             print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n')
-            batch_time = AverageMeter()
-            data_time = AverageMeter()
-            losses = AverageMeter()
-            top1 = AverageMeter()
-            top5 = AverageMeter()
-            entropy = AverageMeter()
+            batch_time = AverageMeter('batch_time')
+            data_time = AverageMeter('data_time')
+            losses = AverageMeter('losses')
+            top1 = AverageMeter('top1')
+            top5 = AverageMeter('top5')
+            entropy = AverageMeter('entropy')
             # switch to train mode
             self.model.train()
 
@@ -325,7 +299,7 @@ def _train(self):
                 self.model_optim.step()
                 # TODO: if epoch > 0:
                 if epoch >= 0:
-                    for j in range(update_schedule.get(i, 0)):
+                    for _ in range(update_schedule.get(i, 0)):
                         start_time = time.time()
                         # GradientArchSearchConfig
                         arch_loss, exp_value = self._gradient_step()

From cc8a1fb0ab262e965e3d0622fae8693fb0c38143 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Tue, 19 Nov 2019 09:55:03 +0800
Subject: [PATCH 16/60] update

---
 .../nni/nas/pytorch/proxylessnas/mutator.py   | 13 ++++++-------
 .../nni/nas/pytorch/proxylessnas/trainer.py   | 19 +++++++++++--------
 2 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 2b1d619e99..3421cd394a 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -24,7 +24,7 @@
 import numpy as np
 
 from nni.nas.pytorch.mutables import LayerChoice
-from nni.nas.pytorch.mutator import PyTorchMutator
+from nni.nas.pytorch.base_mutator import BaseMutator
 
 def detach_variable(inputs):
     if isinstance(inputs, tuple):
@@ -170,13 +170,12 @@ def set_arch_param_grad(self):
                 self.AP_path_alpha.grad.data[i] += binary_grads[j] * probs[j] * (self._delta_ij(i, j) - probs[i])
 
 
-class ProxylessNasMutator(PyTorchMutator):
-
-    def before_build(self, model):
+class ProxylessNasMutator(BaseMutator):
+    def __init__(self, model):
+        super(ProxylessNasMutator, self).__init__(model)
         self.mixed_ops = {}
-
-    def on_init_layer_choice(self, mutable: LayerChoice):
-        self.mixed_ops[mutable.key] = MixedOp(mutable)
+        for _, mutable, _ in self.named_mutables(distinct=False):
+            self.mixed_ops[mutable.key] = MixedOp(mutable)
 
     def on_forward_layer_choice(self, mutable, *inputs):
         """
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index c912815066..3e93ed326a 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -24,7 +24,7 @@
 import torch
 from torch import nn as nn
 
-from nni.nas.pytorch.trainer import Trainer
+from nni.nas.pytorch.base_trainer import BaseTrainer
 from nni.nas.utils import AverageMeter
 from .mutator import ProxylessNasMutator
 
@@ -55,7 +55,7 @@ def accuracy(output, target, topk=(1,)):
         res.append(correct_k.mul_(100.0 / batch_size))
     return res
 
-class ProxylessNasTrainer(Trainer):
+class ProxylessNasTrainer(BaseTrainer):
     def __init__(self, model, model_optim, train_loader, valid_loader, device,
                  n_epochs=150, init_lr=0.05, arch_init_type='normal', arch_init_ratio=1e-3,
                  arch_optim_lr=1e-3, arch_weight_decay=0, warmup=True, warmup_epochs=25,
@@ -193,8 +193,7 @@ def _warm_up(self):
                 images, labels = images.to(self.device), labels.to(self.device)
                 # compute output
                 self.mutator.reset_binary_gates() # random sample binary gates
-                with self.mutator.forward_pass():
-                    output = self.model(images)
+                output = self.model(images)
                 if self.label_smoothing > 0:
                     loss = cross_entropy_with_label_smoothing(output, labels, self.label_smoothing)
                 else:
@@ -284,8 +283,7 @@ def _train(self):
                 # train weight parameters
                 images, labels = images.to(self.device), labels.to(self.device)
                 self.mutator.reset_binary_gates()
-                with self.mutator.forward_pass():
-                    output = self.model(images)
+                output = self.model(images)
                 if self.label_smoothing > 0:
                     loss = cross_entropy_with_label_smoothing(output, labels, self.label_smoothing)
                 else:
@@ -353,8 +351,7 @@ def _gradient_step(self):
         images, labels = images.to(self.device), labels.to(self.device)
         time2 = time.time()  # time
         self.mutator.reset_binary_gates()
-        with self.mutator.forward_pass():
-            output = self.model(images)
+        output = self.model(images)
         time3 = time.time()
         ce_loss = self.criterion(output, labels)
         expected_value = None
@@ -374,3 +371,9 @@ def train(self):
 
     def export(self):
         pass
+
+    def validate(self):
+        raise NotImplementedError
+
+    def train_and_validate(self):
+        raise NotImplementedError

From dacbdf727893f8336e530add300e53172ed78a51 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Tue, 19 Nov 2019 17:58:10 +0800
Subject: [PATCH 17/60] update

---
 .../nni/nas/pytorch/proxylessnas/mutator.py   | 68 +++++++++++++++----
 .../nni/nas/pytorch/proxylessnas/trainer.py   | 26 ++++++-
 2 files changed, 78 insertions(+), 16 deletions(-)

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 3421cd394a..3387838934 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -23,7 +23,6 @@
 from torch.nn import functional as F
 import numpy as np
 
-from nni.nas.pytorch.mutables import LayerChoice
 from nni.nas.pytorch.base_mutator import BaseMutator
 
 def detach_variable(inputs):
@@ -45,23 +44,29 @@ def forward(ctx, x, binary_gates, run_func, backward_func):
         with torch.enable_grad():
             output = run_func(detached_x)
         ctx.save_for_backward(detached_x, output)
-        print('ctx forward: ', ctx.__dict__)
         return output.data
 
     @staticmethod
     def backward(ctx, grad_output):
-        print('ctx backward: ', ctx.__dict__)
         detached_x, output = ctx.saved_tensors
 
         grad_x = torch.autograd.grad(output, detached_x, grad_output, only_inputs=True)
         # compute gradients w.r.t. binary_gates
         binary_grads = ctx.backward_func(detached_x.data, output.data, grad_output.data)
-        print('++++++++++++++++++++++++++++: ', binary_grads)
 
         return grad_x[0], binary_grads, None, None
 
 class MixedOp(nn.Module):
+    """
+    This class is to instantiate and manage info of one LayerChoice
+    """
     def __init__(self, mutable):
+        """
+        Parameters
+        ----------
+        mutable : LayerChoice
+            A LayerChoice in user model
+        """
         super(MixedOp, self).__init__()
         self.mutable = mutable
         self.AP_path_alpha = nn.Parameter(torch.Tensor(mutable.length))
@@ -78,13 +83,11 @@ def forward(self, x):
         # only full_v2
         def run_function(key, candidate_ops, active_id):
             def forward(_x):
-                print('key forward: ', key)
                 return candidate_ops[active_id](_x)
             return forward
 
         def backward_function(key, candidate_ops, active_id, binary_gates):
             def backward(_x, _output, grad_output):
-                print('key backward: ', key)
                 binary_grads = torch.zeros_like(binary_gates.data)
                 with torch.no_grad():
                     for k in range(len(candidate_ops)):
@@ -103,11 +106,20 @@ def backward(_x, _output, grad_output):
 
     @property
     def probs_over_ops(self):
+        """
+        Apply softmax on alpha to generate probability distribution
+
+        Returns
+        -------
+        pytorch tensor
+            probability distribution
+        """
         probs = F.softmax(self.AP_path_alpha, dim=0)  # softmax to probability
         return probs
 
     @property
     def chosen_index(self):
+        """ choose the max one """
         probs = self.probs_over_ops.data.cpu().numpy()
         index = int(np.argmax(probs))
         return index, probs[index]
@@ -119,24 +131,25 @@ def active_op(self):
 
     @property
     def active_op_index(self):
+        """ return active op's index """
         return self.active_index[0]
 
     def set_chosen_op_active(self):
+        """ set chosen index, active and inactive indexes """
         chosen_idx, _ = self.chosen_index
         self.active_index = [chosen_idx]
         self.inactive_index = [_i for _i in range(0, chosen_idx)] + \
                               [_i for _i in range(chosen_idx + 1, self.n_choices)]
 
     def binarize(self):
+        """
+        Sample based on alpha, and set binary weights accordingly
+        """
         self.log_prob = None
         # reset binary gates
         self.AP_path_wb.data.zero_()
         probs = self.probs_over_ops
-        print('probs: ', probs.data)
-        print('probs type: ', probs.type())
         sample = torch.multinomial(probs, 1)[0].item()
-        print('sample: ', sample)
-        print('mutable key: ', self.mutable.key)
         self.active_index = [sample]
         self.inactive_index = [_i for _i in range(0, sample)] + \
                               [_i for _i in range(sample + 1, len(self.mutable.choices))]
@@ -147,7 +160,6 @@ def binarize(self):
         for choice in self.mutable.choices:
             for _, param in choice.named_parameters():
                 param.grad = None
-        print('binarize: ', self.AP_path_wb.grad)
 
     def _delta_ij(self, i, j):
         if i == j:
@@ -156,8 +168,9 @@ def _delta_ij(self, i, j):
             return 0
 
     def set_arch_param_grad(self):
-        print('mutable key: ', self.mutable.key)
-        print('set_arch_param_grad: ', self.AP_path_wb.grad)
+        """
+        Calculate alpha gradient for this LayerChoice
+        """
         binary_grads = self.AP_path_wb.grad.data
         if self.active_op.is_zero_layer():
             self.AP_path_alpha.grad = None
@@ -172,6 +185,14 @@ def set_arch_param_grad(self):
 
 class ProxylessNasMutator(BaseMutator):
     def __init__(self, model):
+        """
+        Init a MixedOp instance for each named mutable i.e., LayerChoice
+
+        Parameters
+        ----------
+        model : pytorch model
+            The model that users want to tune, it includes search space defined with nni nas apis
+        """
         super(ProxylessNasMutator, self).__init__(model)
         self.mixed_ops = {}
         for _, mutable, _ in self.named_mutables(distinct=False):
@@ -192,26 +213,45 @@ def on_forward_layer_choice(self, mutable, *inputs):
         Returns
         -------
         torch.Tensor
+        index of the chosen op
         """
+        # FIXME: return mask, to be consistent with other algorithms
         idx = self.mixed_ops[mutable.key].active_op_index
         return self.mixed_ops[mutable.key].forward(*inputs), idx
 
     def reset_binary_gates(self):
+        """
+        For each LayerChoice, binarize based on alpha to only activate one op
+        """
         for k in self.mixed_ops.keys():
-            print('+++++++++++++++++++k: ', k)
             self.mixed_ops[k].binarize()
 
     def set_chosen_op_active(self):
+        """
+        For each LayerChoice, set the op with highest alpha as the chosen op
+        Usually used for validation.
+        """
         for k in self.mixed_ops.keys():
             self.mixed_ops[k].set_chosen_op_active()
 
     def num_arch_params(self):
+        """
+        Returns
+        -------
+        The number of LayerChoice in user model
+        """
         return len(self.mixed_ops)
 
     def set_arch_param_grad(self):
+        """
+        For each LayerChoice, calculate gradients for architecture weights, i.e., alpha
+        """
         for k in self.mixed_ops.keys():
             self.mixed_ops[k].set_arch_param_grad()
 
     def get_architecture_parameters(self):
+        """
+        Return architecture weights of each LayerChoice, for arch optimizer
+        """
         for k in self.mixed_ops.keys():
             yield self.mixed_ops[k].get_AP_path_alpha()
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 3e93ed326a..66ad841cef 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -30,6 +30,16 @@
 
 
 def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1):
+    """
+    Parameters
+    ----------
+    pred :
+    target :
+    label_smoothing :
+
+    Returns
+    -------
+    """
     logsoftmax = nn.LogSoftmax()
     n_classes = pred.size(1)
     # convert to one-hot
@@ -41,7 +51,18 @@ def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1):
     return torch.mean(torch.sum(- soft_target * logsoftmax(pred), 1))
 
 def accuracy(output, target, topk=(1,)):
-    """ Computes the precision@k for the specified values of k """
+    """
+    Computes the precision@k for the specified values of k
+
+    Parameters
+    ----------
+    output :
+    target :
+    topk :
+
+    Returns
+    -------
+    """
     maxk = max(topk)
     batch_size = target.size(0)
 
@@ -83,6 +104,8 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device,
             whether to do warmup
         warmup_epochs : int
             the number of epochs to do in warmup
+        arch_valid_frequency : int
+            frequency of printing validation result
         """
         self.model = model
         self.model_optim = model_optim
@@ -245,7 +268,6 @@ def _adjust_learning_rate(self, optimizer, epoch, batch=0, nBatch=None):
         Adjust learning of a given optimizer and return the new learning rate
         """
         new_lr = self._calc_learning_rate(epoch, batch, nBatch)
-        print('-----------------------------: ', new_lr)
         for param_group in optimizer.param_groups:
             param_group['lr'] = new_lr
         return new_lr

From 007e0434fd9047add20b1a284bba455c50e9d3fa Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Wed, 20 Nov 2019 10:52:15 +0800
Subject: [PATCH 18/60] update

---
 examples/nas/proxylessnas/datasets.py | 5 +++++
 examples/nas/proxylessnas/model.py    | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/examples/nas/proxylessnas/datasets.py b/examples/nas/proxylessnas/datasets.py
index ebd756045c..b0a9731429 100644
--- a/examples/nas/proxylessnas/datasets.py
+++ b/examples/nas/proxylessnas/datasets.py
@@ -24,6 +24,11 @@
 import torchvision.transforms as transforms
 import torchvision.datasets as datasets
 
+def get_split_list(in_dim, child_num):
+    in_dim_list = [in_dim // child_num] * child_num
+    for _i in range(in_dim % child_num):
+        in_dim_list[_i] += 1
+    return in_dim_list
 
 class DataProvider:
     VALID_SEED = 0  # random seed for the validation set
diff --git a/examples/nas/proxylessnas/model.py b/examples/nas/proxylessnas/model.py
index afabb0acc1..1b5483f4a3 100644
--- a/examples/nas/proxylessnas/model.py
+++ b/examples/nas/proxylessnas/model.py
@@ -97,7 +97,7 @@ def __init__(self,
             stage_cnt += 1
 
         # feature mix layer
-        last_channel = utils.make_devisible(1280 * width_mult, 8) if width_mult > 1.0 else 1280
+        last_channel = putils.make_devisible(1280 * width_mult, 8) if width_mult > 1.0 else 1280
         feature_mix_layer = ops.ConvLayer(input_channel, last_channel, kernel_size=1, use_bn=True, act_func='relu6', ops_order='weight_bn_act', )
         classifier = ops.LinearLayer(last_channel, n_classes, dropout_rate=dropout_rate)
 

From 098fe3d4f43d007f87e429aec5730962f3a0cb23 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Tue, 10 Dec 2019 19:04:25 +0800
Subject: [PATCH 19/60] fix bug

---
 src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 3387838934..e134cdd91f 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -75,6 +75,7 @@ def __init__(self, mutable):
         self.inactive_index = None
         self.log_prob = None
         self.current_prob_over_ops = None
+        self.n_choices = mutable.length
 
     def get_AP_path_alpha(self):
         return self.AP_path_alpha

From ca9ec6cc4a5b32a8f41961065af98860c17574e7 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Wed, 11 Dec 2019 09:01:46 +0800
Subject: [PATCH 20/60] update

---
 src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 66ad841cef..91f4820766 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -345,7 +345,7 @@ def _train(self):
             # validate
             if (epoch + 1) % self.arch_valid_frequency == 0:
                 val_loss, val_top1, val_top5 = self._validate()
-                val_log = 'Valid [{0}]\tloss {2:.3f}\ttop-1 acc {3:.3f} \ttop-5 acc {5:.3f}\t' \
+                val_log = 'Valid [{0}]\tloss {1:.3f}\ttop-1 acc {2:.3f} \ttop-5 acc {3:.3f}\t' \
                           'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}\t' \
                           'Entropy {entropy.val:.5f}M'. \
                     format(epoch + 1, val_loss, val_top1,

From 3d2159e104b6aaf7d9879c41656a0475baacea68 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Wed, 11 Dec 2019 16:45:15 +0800
Subject: [PATCH 21/60] update

---
 src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 91f4820766..5f76773e84 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -206,7 +206,7 @@ def _warm_up(self):
             end = time.time()
             print('=====================_warm_up, epoch: ', epoch)
             for i, (images, labels) in enumerate(data_loader):
-                print('=====================_warm_up, minibatch i: ', i)
+                #print('=====================_warm_up, minibatch i: ', i)
                 data_time.update(time.time() - end)
                 # lr
                 T_cur = epoch * nBatch + i

From 181f9c06f161ec4336d493d03f03e21af0900cfe Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Thu, 12 Dec 2019 15:08:23 +0800
Subject: [PATCH 22/60] update

---
 examples/nas/proxylessnas/main.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py
index 977781df28..8432d7ab30 100644
--- a/examples/nas/proxylessnas/main.py
+++ b/examples/nas/proxylessnas/main.py
@@ -63,6 +63,7 @@
     # data parallelism not supported yet
     if torch.cuda.is_available():
         device = torch.device('cuda:0')
+        model = torch.nn.DataParallel(model)
         model.to(device)
     else:
         device = torch.device('cpu')

From 5578542f8d92f73d4ba53a852a2d381931cd9c10 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Thu, 12 Dec 2019 15:17:52 +0800
Subject: [PATCH 23/60] update

---
 src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 5f76773e84..b22acd6409 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -214,6 +214,7 @@ def _warm_up(self):
                 for param_group in self.model_optim.param_groups:
                     param_group['lr'] = warmup_lr
                 images, labels = images.to(self.device), labels.to(self.device)
+                print(images, labels)
                 # compute output
                 self.mutator.reset_binary_gates() # random sample binary gates
                 output = self.model(images)

From 55c75f5da7d7e2f8889bdd3a55166f29b6bd584f Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Thu, 12 Dec 2019 15:22:39 +0800
Subject: [PATCH 24/60] update

---
 src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index e134cdd91f..8c37449477 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -218,7 +218,7 @@ def on_forward_layer_choice(self, mutable, *inputs):
         """
         # FIXME: return mask, to be consistent with other algorithms
         idx = self.mixed_ops[mutable.key].active_op_index
-        return self.mixed_ops[mutable.key].forward(*inputs), idx
+        return self.mixed_ops[mutable.key](*inputs), idx
 
     def reset_binary_gates(self):
         """

From 5a403ec67c820397518a2f2357e15a4d8e3c3af0 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Thu, 12 Dec 2019 15:36:47 +0800
Subject: [PATCH 25/60] update

---
 examples/nas/proxylessnas/main.py                     | 4 ++--
 src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py
index 8432d7ab30..114478fa2d 100644
--- a/examples/nas/proxylessnas/main.py
+++ b/examples/nas/proxylessnas/main.py
@@ -63,8 +63,8 @@
     # data parallelism not supported yet
     if torch.cuda.is_available():
         device = torch.device('cuda:0')
-        model = torch.nn.DataParallel(model)
-        model.to(device)
+        #model = torch.nn.DataParallel(model)
+        #model.to(device)
     else:
         device = torch.device('cpu')
 
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index b22acd6409..6e143c416b 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -132,6 +132,9 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device,
         self.mutator = ProxylessNasMutator(model)
         self._valid_iter = None
 
+        self.model = torch.nn.DataParallel(self.model)
+        self.model.to(self.device)
+
         # TODO: arch search configs
 
         self._init_arch_params(arch_init_type, arch_init_ratio)

From 3e2ee564320a84e29e04491f74a59f34c37d27e2 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Thu, 12 Dec 2019 15:41:48 +0800
Subject: [PATCH 26/60] update

---
 src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 6e143c416b..646dc97424 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -130,6 +130,8 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device,
 
         # init mutator
         self.mutator = ProxylessNasMutator(model)
+        self.mutator = torch.nn.DataParallel(self.mutator)
+        self.mutator.to(self.device)
         self._valid_iter = None
 
         self.model = torch.nn.DataParallel(self.model)

From ed27d476a388e20eaba135cee0b5a9b18892d9b1 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Thu, 12 Dec 2019 16:19:12 +0800
Subject: [PATCH 27/60] update

---
 examples/nas/proxylessnas/main.py                     | 4 ++--
 src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py | 8 ++++----
 src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py | 5 -----
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py
index 114478fa2d..8432d7ab30 100644
--- a/examples/nas/proxylessnas/main.py
+++ b/examples/nas/proxylessnas/main.py
@@ -63,8 +63,8 @@
     # data parallelism not supported yet
     if torch.cuda.is_available():
         device = torch.device('cuda:0')
-        #model = torch.nn.DataParallel(model)
-        #model.to(device)
+        model = torch.nn.DataParallel(model)
+        model.to(device)
     else:
         device = torch.device('cpu')
 
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 8c37449477..c3e5769695 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -80,7 +80,7 @@ def __init__(self, mutable):
     def get_AP_path_alpha(self):
         return self.AP_path_alpha
 
-    def forward(self, x):
+    def forward(self, mutable, x):
         # only full_v2
         def run_function(key, candidate_ops, active_id):
             def forward(_x):
@@ -101,8 +101,8 @@ def backward(_x, _output, grad_output):
                 return binary_grads
             return backward
         output = ArchGradientFunction.apply(
-            x, self.AP_path_wb, run_function(self.mutable.key, self.mutable.choices, self.active_index[0]),
-            backward_function(self.mutable.key, self.mutable.choices, self.active_index[0], self.AP_path_wb))
+            x, self.AP_path_wb, run_function(mutable.key, mutable.choices, self.active_index[0]),
+            backward_function(mutable.key, mutable.choices, self.active_index[0], self.AP_path_wb))
         return output
 
     @property
@@ -218,7 +218,7 @@ def on_forward_layer_choice(self, mutable, *inputs):
         """
         # FIXME: return mask, to be consistent with other algorithms
         idx = self.mixed_ops[mutable.key].active_op_index
-        return self.mixed_ops[mutable.key](*inputs), idx
+        return self.mixed_ops[mutable.key](mutable, *inputs), idx
 
     def reset_binary_gates(self):
         """
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 646dc97424..b22acd6409 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -130,13 +130,8 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device,
 
         # init mutator
         self.mutator = ProxylessNasMutator(model)
-        self.mutator = torch.nn.DataParallel(self.mutator)
-        self.mutator.to(self.device)
         self._valid_iter = None
 
-        self.model = torch.nn.DataParallel(self.model)
-        self.model.to(self.device)
-
         # TODO: arch search configs
 
         self._init_arch_params(arch_init_type, arch_init_ratio)

From 80eafc4b78bab62e1896a366c33fde3c10de4105 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Thu, 12 Dec 2019 16:21:04 +0800
Subject: [PATCH 28/60] update

---
 src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index b22acd6409..5f76773e84 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -214,7 +214,6 @@ def _warm_up(self):
                 for param_group in self.model_optim.param_groups:
                     param_group['lr'] = warmup_lr
                 images, labels = images.to(self.device), labels.to(self.device)
-                print(images, labels)
                 # compute output
                 self.mutator.reset_binary_gates() # random sample binary gates
                 output = self.model(images)

From b8e29e8ca9a05705d5b3a237e724f504d3b26568 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Thu, 12 Dec 2019 17:11:02 +0800
Subject: [PATCH 29/60] update

---
 .../nni/nas/pytorch/proxylessnas/mutator.py   | 31 ++++++++++---------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index c3e5769695..7e4e5e93d8 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -68,7 +68,7 @@ def __init__(self, mutable):
             A LayerChoice in user model
         """
         super(MixedOp, self).__init__()
-        self.mutable = mutable
+        #self.mutable = mutable
         self.AP_path_alpha = nn.Parameter(torch.Tensor(mutable.length))
         self.AP_path_wb = nn.Parameter(torch.Tensor(mutable.length))
         self.active_index = [0]
@@ -125,10 +125,9 @@ def chosen_index(self):
         index = int(np.argmax(probs))
         return index, probs[index]
 
-    @property
-    def active_op(self):
+    def active_op(self, mutable):
         """ assume only one path is active """
-        return self.mutable.choices[self.active_index[0]]
+        return mutable.choices[self.active_index[0]]
 
     @property
     def active_op_index(self):
@@ -142,7 +141,7 @@ def set_chosen_op_active(self):
         self.inactive_index = [_i for _i in range(0, chosen_idx)] + \
                               [_i for _i in range(chosen_idx + 1, self.n_choices)]
 
-    def binarize(self):
+    def binarize(self, mutable):
         """
         Sample based on alpha, and set binary weights accordingly
         """
@@ -153,12 +152,12 @@ def binarize(self):
         sample = torch.multinomial(probs, 1)[0].item()
         self.active_index = [sample]
         self.inactive_index = [_i for _i in range(0, sample)] + \
-                              [_i for _i in range(sample + 1, len(self.mutable.choices))]
+                              [_i for _i in range(sample + 1, len(mutable.choices))]
         self.log_prob = torch.log(probs[sample])
         self.current_prob_over_ops = probs
         self.AP_path_wb.data[sample] = 1.0
         # avoid over-regularization
-        for choice in self.mutable.choices:
+        for choice in mutable.choices:
             for _, param in choice.named_parameters():
                 param.grad = None
 
@@ -168,19 +167,19 @@ def _delta_ij(self, i, j):
         else:
             return 0
 
-    def set_arch_param_grad(self):
+    def set_arch_param_grad(self, mutable):
         """
         Calculate alpha gradient for this LayerChoice
         """
         binary_grads = self.AP_path_wb.grad.data
-        if self.active_op.is_zero_layer():
+        if self.active_op(mutable).is_zero_layer():
             self.AP_path_alpha.grad = None
             return
         if self.AP_path_alpha.grad is None:
             self.AP_path_alpha.grad = torch.zeros_like(self.AP_path_alpha.data)
         probs = self.probs_over_ops.data
-        for i in range(len(self.mutable.choices)):
-            for j in range(len(self.mutable.choices)):
+        for i in range(len(mutable.choices)):
+            for j in range(len(mutable.choices)):
                 self.AP_path_alpha.grad.data[i] += binary_grads[j] * probs[j] * (self._delta_ij(i, j) - probs[i])
 
 
@@ -224,8 +223,9 @@ def reset_binary_gates(self):
         """
         For each LayerChoice, binarize based on alpha to only activate one op
         """
-        for k in self.mixed_ops.keys():
-            self.mixed_ops[k].binarize()
+        for _, mutable, _ in self.named_mutables(distinct=False):
+            k = mutable.key
+            self.mixed_ops[k].binarize(mutable)
 
     def set_chosen_op_active(self):
         """
@@ -247,8 +247,9 @@ def set_arch_param_grad(self):
         """
         For each LayerChoice, calculate gradients for architecture weights, i.e., alpha
         """
-        for k in self.mixed_ops.keys():
-            self.mixed_ops[k].set_arch_param_grad()
+        for _, mutable, _ in self.named_mutables(distinct=False):
+            k = mutable.key
+            self.mixed_ops[k].set_arch_param_grad(mutable)
 
     def get_architecture_parameters(self):
         """

From 135402568d76adec64509951429b9f5fd725a379 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Fri, 13 Dec 2019 09:42:17 +0800
Subject: [PATCH 30/60] update

---
 examples/nas/proxylessnas/main.py             |   6 +-
 .../nni/nas/pytorch/proxylessnas/mutator.py   | 173 ++++++++++++++----
 .../nni/nas/pytorch/proxylessnas/trainer.py   |  23 ++-
 3 files changed, 162 insertions(+), 40 deletions(-)

diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py
index 8432d7ab30..1156408765 100644
--- a/examples/nas/proxylessnas/main.py
+++ b/examples/nas/proxylessnas/main.py
@@ -41,9 +41,9 @@
     parser.add_argument("--no_decay_keys", default='bn', type=str, choices=[None, 'bn', 'bn#bias'])
     # configurations of imagenet dataset
     parser.add_argument("--data_path", default='/data/hdd3/yugzh/imagenet/', type=str)
-    parser.add_argument("--train_batch_size", default=2, type=int)
-    parser.add_argument("--test_batch_size", default=2, type=int)
-    parser.add_argument("--n_worker", default=0, type=int)
+    parser.add_argument("--train_batch_size", default=256, type=int)
+    parser.add_argument("--test_batch_size", default=500, type=int)
+    parser.add_argument("--n_worker", default=32, type=int)
     parser.add_argument("--resize_scale", default=0.08, type=float)
     parser.add_argument("--distort_color", default='normal', type=str, choices=['normal', 'strong', 'None'])
     #parser.add_argument("--log-frequency", default=1, type=int)
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 7e4e5e93d8..80507429c8 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -60,7 +60,7 @@ class MixedOp(nn.Module):
     """
     This class is to instantiate and manage info of one LayerChoice
     """
-    def __init__(self, mutable):
+    def __init__(self, mutable, forward_mode=None):
         """
         Parameters
         ----------
@@ -76,33 +76,51 @@ def __init__(self, mutable):
         self.log_prob = None
         self.current_prob_over_ops = None
         self.n_choices = mutable.length
+        self.forward_mode = forward_mode
 
     def get_AP_path_alpha(self):
         return self.AP_path_alpha
 
+    def set_forward_mode(self, mode):
+        self.forward_mode = mode
+
+    def get_forward_mode():
+        return self.forward_mode
+
     def forward(self, mutable, x):
-        # only full_v2
-        def run_function(key, candidate_ops, active_id):
-            def forward(_x):
-                return candidate_ops[active_id](_x)
-            return forward
-
-        def backward_function(key, candidate_ops, active_id, binary_gates):
-            def backward(_x, _output, grad_output):
-                binary_grads = torch.zeros_like(binary_gates.data)
-                with torch.no_grad():
-                    for k in range(len(candidate_ops)):
-                        if k != active_id:
-                            out_k = candidate_ops[k](_x.data)
-                        else:
-                            out_k = _output.data
-                        grad_k = torch.sum(out_k * grad_output)
-                        binary_grads[k] = grad_k
-                return binary_grads
-            return backward
-        output = ArchGradientFunction.apply(
-            x, self.AP_path_wb, run_function(mutable.key, mutable.choices, self.active_index[0]),
-            backward_function(mutable.key, mutable.choices, self.active_index[0], self.AP_path_wb))
+        if self.forward_mode == 'full' or self.forward_mode == 'two':
+            output = 0
+            for _i in self.active_index:
+                oi = self.candidate_ops[_i](x)
+                output = output + self.AP_path_wb[_i] * oi
+            for _i in self.inactive_index:
+                oi = self.candidate_ops[_i](x)
+                output = output + self.AP_path_wb[_i] * oi.detach()
+        elif self.forward_mode == 'full_v2':
+            # does not work in DataParallel, possible memory leak
+            def run_function(key, candidate_ops, active_id):
+                def forward(_x):
+                    return candidate_ops[active_id](_x)
+                return forward
+
+            def backward_function(key, candidate_ops, active_id, binary_gates):
+                def backward(_x, _output, grad_output):
+                    binary_grads = torch.zeros_like(binary_gates.data)
+                    with torch.no_grad():
+                        for k in range(len(candidate_ops)):
+                            if k != active_id:
+                                out_k = candidate_ops[k](_x.data)
+                            else:
+                                out_k = _output.data
+                            grad_k = torch.sum(out_k * grad_output)
+                            binary_grads[k] = grad_k
+                    return binary_grads
+                return backward
+            output = ArchGradientFunction.apply(
+                x, self.AP_path_wb, run_function(mutable.key, mutable.choices, self.active_index[0]),
+                backward_function(mutable.key, mutable.choices, self.active_index[0], self.AP_path_wb))
+        else:
+            output = self.active_op(mutable)(x)
         return output
 
     @property
@@ -149,13 +167,31 @@ def binarize(self, mutable):
         # reset binary gates
         self.AP_path_wb.data.zero_()
         probs = self.probs_over_ops
-        sample = torch.multinomial(probs, 1)[0].item()
-        self.active_index = [sample]
-        self.inactive_index = [_i for _i in range(0, sample)] + \
-                              [_i for _i in range(sample + 1, len(mutable.choices))]
-        self.log_prob = torch.log(probs[sample])
-        self.current_prob_over_ops = probs
-        self.AP_path_wb.data[sample] = 1.0
+        if self.forward_mode == 'two':
+            # sample two ops according to probs
+            sample_op = torch.multinomial(probs.data, 2, replacement=False)
+            probs_slice = F.softmax(torch.stack([
+                self.AP_path_alpha[idx] for idx in sample_op
+            ]), dim=0)
+            self.current_prob_over_ops = torch.zeros_like(probs)
+            for i, idx in enumerate(sample_op):
+                self.current_prob_over_ops[idx] = probs_slice[i]
+            # choose one to be active and the other to be inactive according to probs_slice
+            c = torch.multinomial(probs_slice.data, 1)[0] # 0 or 1
+            active_op = sample_op[c].item()
+            inactive_op = sample_op[1-c].item()
+            self.active_index = [active_op]
+            self.inactive_index = [inactive_op]
+            # set binary gate
+            self.AP_path_wb.data[active_op] = 1.0
+        else:
+            sample = torch.multinomial(probs, 1)[0].item()
+            self.active_index = [sample]
+            self.inactive_index = [_i for _i in range(0, sample)] + \
+                                [_i for _i in range(sample + 1, len(mutable.choices))]
+            self.log_prob = torch.log(probs[sample])
+            self.current_prob_over_ops = probs
+            self.AP_path_wb.data[sample] = 1.0
         # avoid over-regularization
         for choice in mutable.choices:
             for _, param in choice.named_parameters():
@@ -177,10 +213,42 @@ def set_arch_param_grad(self, mutable):
             return
         if self.AP_path_alpha.grad is None:
             self.AP_path_alpha.grad = torch.zeros_like(self.AP_path_alpha.data)
-        probs = self.probs_over_ops.data
-        for i in range(len(mutable.choices)):
-            for j in range(len(mutable.choices)):
-                self.AP_path_alpha.grad.data[i] += binary_grads[j] * probs[j] * (self._delta_ij(i, j) - probs[i])
+        if self.forward_mode == 'two':
+            involved_idx = self.active_index + self.inactive_index
+            probs_slice = F.softmax(torch.stack([
+                self.AP_path_alpha[idx] for idx in involved_idx
+            ]), dim=0).data
+            for i in range(2):
+                for j in range(2):
+                    origin_i = involved_idx[i]
+                    origin_j = involved_idx[j]
+                    self.AP_path_alpha.grad.data[origin_i] += \
+                        binary_grads[origin_j] * probs_slice[j] * (self._delta_ij(i, j) - probs_slice[i])
+            for _i, idx in enumerate(self.active_index):
+                self.active_index[_i] = (idx, self.AP_path_alpha.data[idx].item())
+            for _i, idx in enumerate(self.inactive_index):
+                self.inactive_index[_i] = (idx, self.AP_path_alpha.data[idx].item())
+        else:
+            probs = self.probs_over_ops.data
+            for i in range(self.n_choices):
+                for j in range(self.n_choices):
+                    self.AP_path_alpha.grad.data[i] += binary_grads[j] * probs[j] * (self._delta_ij(i, j) - probs[i])
+        return
+
+    def rescale_updated_arch_param(self):
+        if not isinstance(self.active_index[0], tuple):
+            assert self.active_op.is_zero_layer()
+            return
+        involved_idx = [idx for idx, _ in (self.active_index + self.inactive_index)]
+        old_alphas = [alpha for _, alpha in (self.active_index + self.inactive_index)]
+        new_alphas = [self.AP_path_alpha.data[idx] for idx in involved_idx]
+
+        offset = math.log(
+            sum([math.exp(alpha) for alpha in new_alphas]) / sum([math.exp(alpha) for alpha in old_alphas])
+        )
+
+        for idx in involved_idx:
+            self.AP_path_alpha.data[idx] -= offset
 
 
 class ProxylessNasMutator(BaseMutator):
@@ -194,6 +262,7 @@ def __init__(self, model):
             The model that users want to tune, it includes search space defined with nni nas apis
         """
         super(ProxylessNasMutator, self).__init__(model)
+        self._unused_modules = None
         self.mixed_ops = {}
         for _, mutable, _ in self.named_mutables(distinct=False):
             self.mixed_ops[mutable.key] = MixedOp(mutable)
@@ -257,3 +326,39 @@ def get_architecture_parameters(self):
         """
         for k in self.mixed_ops.keys():
             yield self.mixed_ops[k].get_AP_path_alpha()
+
+    def change_forward_mode(self, mode):
+        for k in self.mixed_ops.keys():
+            self.mixed_ops[k].set_forward_mode(mode)
+
+    def get_forward_mode(self):
+        for k in self.mixed_ops.keys():
+            return self.mixed_ops[k].get_forward_mode()
+
+    def rescale_updated_arch_param(self):
+        for k in self.mixed_ops.keys():
+            self.mixed_ops[k].rescale_updated_arch_param()
+
+    def unused_modules_off(self):
+        self._unused_modules = []
+        for _, mutable, _ in self.named_mutables(distinct=False):
+            k = mutable.key
+            mixed_op = self.mixed_ops[k]
+            unused = {}
+            if self.get_forward_mode() in ['full', 'two', 'full_v2']:
+                involved_index = mixed_op.active_index + mixed_op.inactive_index
+            else:
+                involved_index = mixed_op.active_index
+            for i in range(mixed_op.n_choices):
+                if i not in involved_index:
+                    unused[i] = mutable.choices[i]
+                    mutable.choices[i] = None
+            self._unused_modules.append(unused)
+
+    def unused_modules_back(self):
+        if self._unused_modules is None:
+            return
+        for m, unused in zip(self.named_mutables(distinct=False), self._unused_modules):
+            for i in unused:
+                m.choices[i] = unused[i]
+        self._unused_modules = None
\ No newline at end of file
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 5f76773e84..5ff4a93b4f 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -122,11 +122,12 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device,
         self.lr_max = 0.05
         self.label_smoothing = 0.1
         self.valid_batch_size = 500
-        self.arch_grad_valid_batch_size = 2 # 256
+        self.arch_grad_train_batch_size = 256
         # update architecture parameters every this number of minibatches
         self.grad_update_arch_param_every = 5
         # the number of steps per architecture parameter update
         self.grad_update_steps = 1
+        self.binary_mode = 'full_v2'
 
         # init mutator
         self.mutator = ProxylessNasMutator(model)
@@ -157,6 +158,8 @@ def _validate(self):
         self.valid_loader.batch_sampler.drop_last = False
 
         self.mutator.set_chosen_op_active()
+        # remove unused modules to save memory
+        self.mutator.unused_modules_off()
         # test on validation set under train mode
         self.model.train()
         batch_time = AverageMeter('batch_time')
@@ -186,6 +189,7 @@ def _validate(self):
                     # return top5:
                     test_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5)
                     print(test_log)
+        self.mutator.unused_modules_back()
         return losses.avg, top1.avg, top5.avg
 
     def _warm_up(self):
@@ -216,6 +220,8 @@ def _warm_up(self):
                 images, labels = images.to(self.device), labels.to(self.device)
                 # compute output
                 self.mutator.reset_binary_gates() # random sample binary gates
+                # remove unused module for speedup
+                self.mutator.unused_modules_off()
                 output = self.model(images)
                 if self.label_smoothing > 0:
                     loss = cross_entropy_with_label_smoothing(output, labels, self.label_smoothing)
@@ -230,6 +236,8 @@ def _warm_up(self):
                 self.model.zero_grad()
                 loss.backward()
                 self.model_optim.step()
+                # unused modules back
+                self.mutator.unused_modules_back()
                 # measure elapsed time
                 batch_time.update(time.time() - end)
                 end = time.time()
@@ -305,6 +313,8 @@ def _train(self):
                 # train weight parameters
                 images, labels = images.to(self.device), labels.to(self.device)
                 self.mutator.reset_binary_gates()
+                # TODO: remove unused module for speedup
+                self.mutator.unused_modules_off()
                 output = self.model(images)
                 if self.label_smoothing > 0:
                     loss = cross_entropy_with_label_smoothing(output, labels, self.label_smoothing)
@@ -317,8 +327,9 @@ def _train(self):
                 self.model.zero_grad()
                 loss.backward()
                 self.model_optim.step()
+                self.mutator.unused_modules_back()
                 # TODO: if epoch > 0:
-                if epoch >= 0:
+                if epoch > 0:
                     for _ in range(update_schedule.get(i, 0)):
                         start_time = time.time()
                         # GradientArchSearchConfig
@@ -364,15 +375,17 @@ def _valid_next_batch(self):
         return data
 
     def _gradient_step(self):
-        self.valid_loader.batch_sampler.batch_size = self.arch_grad_valid_batch_size
+        self.valid_loader.batch_sampler.batch_size = self.arch_grad_train_batch_size
         self.valid_loader.batch_sampler.drop_last = True
         self.model.train()
+        self.mutator.change_forward_mode(self.binary_mode)
         time1 = time.time()  # time
         # sample a batch of data from validation set
         images, labels = self._valid_next_batch()
         images, labels = images.to(self.device), labels.to(self.device)
         time2 = time.time()  # time
         self.mutator.reset_binary_gates()
+        self.mutator.unused_modules_off()
         output = self.model(images)
         time3 = time.time()
         ce_loss = self.criterion(output, labels)
@@ -382,6 +395,10 @@ def _gradient_step(self):
         loss.backward()
         self.mutator.set_arch_param_grad()
         self.arch_optimizer.step()
+        if self.mutator.get_forward_mode() == 'two':
+            self.mutator.rescale_updated_arch_param()
+        self.mutator.unused_modules_back()
+        self.mutator.change_forward_mode(None)
         time4 = time.time()
         print('(%.4f, %.4f, %.4f)' % (time2 - time1, time3 - time2, time4 - time3))
         return loss.data.item(), expected_value.item() if expected_value is not None else None

From 4b611dbbd9b097e6816381908b30914b59f73fa1 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Fri, 13 Dec 2019 09:59:48 +0800
Subject: [PATCH 31/60] update

---
 examples/nas/proxylessnas/main.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py
index 1156408765..a66b085800 100644
--- a/examples/nas/proxylessnas/main.py
+++ b/examples/nas/proxylessnas/main.py
@@ -40,7 +40,8 @@
     parser.add_argument("--dropout_rate", default=0, type=float)
     parser.add_argument("--no_decay_keys", default='bn', type=str, choices=[None, 'bn', 'bn#bias'])
     # configurations of imagenet dataset
-    parser.add_argument("--data_path", default='/data/hdd3/yugzh/imagenet/', type=str)
+    #parser.add_argument("--data_path", default='/data/hdd3/yugzh/imagenet/', type=str)
+    parser.add_argument("--data_path", default='/mnt/v-yugzh/imagenet/', type=str)
     parser.add_argument("--train_batch_size", default=256, type=int)
     parser.add_argument("--test_batch_size", default=500, type=int)
     parser.add_argument("--n_worker", default=32, type=int)

From a624c12715de57bbda13908ae5091feb05571c4f Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Fri, 13 Dec 2019 10:02:21 +0800
Subject: [PATCH 32/60] fix bug

---
 src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 80507429c8..9fd670de98 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -84,7 +84,7 @@ def get_AP_path_alpha(self):
     def set_forward_mode(self, mode):
         self.forward_mode = mode
 
-    def get_forward_mode():
+    def get_forward_mode(self):
         return self.forward_mode
 
     def forward(self, mutable, x):

From 393d8377dbc72f28f13fcb7a4ed51a18e518e73f Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Fri, 13 Dec 2019 10:05:49 +0800
Subject: [PATCH 33/60] update

---
 src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 9fd670de98..8993fe5d63 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -264,8 +264,10 @@ def __init__(self, model):
         super(ProxylessNasMutator, self).__init__(model)
         self._unused_modules = None
         self.mixed_ops = {}
+        self.mutable_list = []
         for _, mutable, _ in self.named_mutables(distinct=False):
             self.mixed_ops[mutable.key] = MixedOp(mutable)
+            self.mutable_list.append(mutable)
 
     def on_forward_layer_choice(self, mutable, *inputs):
         """
@@ -358,7 +360,7 @@ def unused_modules_off(self):
     def unused_modules_back(self):
         if self._unused_modules is None:
             return
-        for m, unused in zip(self.named_mutables(distinct=False), self._unused_modules):
+        for m, unused in zip(self.mutable_list, self._unused_modules):
             for i in unused:
                 m.choices[i] = unused[i]
         self._unused_modules = None
\ No newline at end of file

From f768b5a123617e6db377e3b12d554d5dea30df26 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Fri, 13 Dec 2019 10:14:20 +0800
Subject: [PATCH 34/60] update

---
 examples/nas/proxylessnas/main.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py
index a66b085800..7a6a3c4fa0 100644
--- a/examples/nas/proxylessnas/main.py
+++ b/examples/nas/proxylessnas/main.py
@@ -100,7 +100,7 @@
                                   train_loader=train_loader,
                                   valid_loader=valid_loader,
                                   device=device,
-                                  warmup=False)
+                                  warmup=True)
 
     print('=============================================Start to train ProxylessNasTrainer')
     trainer.train()

From 8bc69a8eeb29316639bc75b10fd86fef0cb82d90 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Fri, 13 Dec 2019 13:44:29 +0800
Subject: [PATCH 35/60] update

---
 src/sdk/pynni/nni/nas/pytorch/mutables.py     |  1 +
 .../nni/nas/pytorch/proxylessnas/mutator.py   | 37 +++++++++++++++++--
 .../nni/nas/pytorch/proxylessnas/trainer.py   |  2 +
 3 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/src/sdk/pynni/nni/nas/pytorch/mutables.py b/src/sdk/pynni/nni/nas/pytorch/mutables.py
index 16b73b903d..a1d448a646 100644
--- a/src/sdk/pynni/nni/nas/pytorch/mutables.py
+++ b/src/sdk/pynni/nni/nas/pytorch/mutables.py
@@ -92,6 +92,7 @@ def __init__(self, op_candidates, reduction="mean", return_mask=False, key=None)
         self.choices = nn.ModuleList(op_candidates)
         self.reduction = reduction
         self.return_mask = return_mask
+        self.registered_module = None
 
     def __len__(self):
         return len(self.choices)
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 8993fe5d63..75ba5f4dec 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -71,6 +71,8 @@ def __init__(self, mutable, forward_mode=None):
         #self.mutable = mutable
         self.AP_path_alpha = nn.Parameter(torch.Tensor(mutable.length))
         self.AP_path_wb = nn.Parameter(torch.Tensor(mutable.length))
+        self.AP_path_alpha.requires_grad = False
+        self.AP_path_wb.requires_grad = False
         self.active_index = [0]
         self.inactive_index = None
         self.log_prob = None
@@ -87,6 +89,14 @@ def set_forward_mode(self, mode):
     def get_forward_mode(self):
         return self.forward_mode
 
+    def to_requires_grad(self):
+        self.AP_path_alpha.requires_grad = True
+        self.AP_path_wb.requires_grad = True
+
+    def disable_grad(self):
+        self.AP_path_alpha.requires_grad = False
+        self.AP_path_wb.requires_grad = False
+
     def forward(self, mutable, x):
         if self.forward_mode == 'full' or self.forward_mode == 'two':
             output = 0
@@ -266,8 +276,10 @@ def __init__(self, model):
         self.mixed_ops = {}
         self.mutable_list = []
         for _, mutable, _ in self.named_mutables(distinct=False):
-            self.mixed_ops[mutable.key] = MixedOp(mutable)
+            mo = MixedOp(mutable)
+            self.mixed_ops[mutable.key] = mo
             self.mutable_list.append(mutable)
+            mutable.registered_module = mo
 
     def on_forward_layer_choice(self, mutable, *inputs):
         """
@@ -287,8 +299,10 @@ def on_forward_layer_choice(self, mutable, *inputs):
         index of the chosen op
         """
         # FIXME: return mask, to be consistent with other algorithms
-        idx = self.mixed_ops[mutable.key].active_op_index
-        return self.mixed_ops[mutable.key](mutable, *inputs), idx
+        #idx = self.mixed_ops[mutable.key].active_op_index
+        #return self.mixed_ops[mutable.key](mutable, *inputs), idx
+        idx = mutable.registered_module.active_op_index
+        return mutable.registered_module(mutable, *inputs), idx
 
     def reset_binary_gates(self):
         """
@@ -363,4 +377,19 @@ def unused_modules_back(self):
         for m, unused in zip(self.mutable_list, self._unused_modules):
             for i in unused:
                 m.choices[i] = unused[i]
-        self._unused_modules = None
\ No newline at end of file
+        self._unused_modules = None
+
+    def arch_requires_grad(self):
+        for _, mutable, _ in self.named_mutables(distinct=False):
+            mutable.registered_module.to_requires_grad()
+
+    def arch_disable_grad(self):
+        for _, mutable, _ in self.named_mutables(distinct=False):
+            mutable.registered_module.disable_grad()
+
+    '''def get_arch_parameters(self):
+        params = []
+        for _, mutable, _ in self.named_mutables(distinct=False):
+            par = mutable.registered_module.Parameters()
+            params = params + list(par)
+        return params'''
\ No newline at end of file
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 5ff4a93b4f..298f0474aa 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -333,7 +333,9 @@ def _train(self):
                     for _ in range(update_schedule.get(i, 0)):
                         start_time = time.time()
                         # GradientArchSearchConfig
+                        self.mutator.arch_requires_grad()
                         arch_loss, exp_value = self._gradient_step()
+                        self.mutator.arch_disable_grad()
                         used_time = time.time() - start_time
                         log_str = 'Architecture [%d-%d]\t Time %.4f\t Loss %.4f\t null %s' % \
                                     (epoch + 1, i, used_time, arch_loss, exp_value)

From 640103d2778eb09aa13b0074123b87c979338fd6 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Fri, 13 Dec 2019 13:52:23 +0800
Subject: [PATCH 36/60] update

---
 examples/nas/proxylessnas/main.py                     | 4 ++--
 src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py
index 7a6a3c4fa0..ec8465d6b7 100644
--- a/examples/nas/proxylessnas/main.py
+++ b/examples/nas/proxylessnas/main.py
@@ -64,8 +64,8 @@
     # data parallelism not supported yet
     if torch.cuda.is_available():
         device = torch.device('cuda:0')
-        model = torch.nn.DataParallel(model)
-        model.to(device)
+        #model = torch.nn.DataParallel(model)
+        #model.to(device)
     else:
         device = torch.device('cpu')
 
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 298f0474aa..f9d24aa52b 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -133,6 +133,9 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device,
         self.mutator = ProxylessNasMutator(model)
         self._valid_iter = None
 
+        self.model = torch.nn.DataParallel(self.model)
+        self.model.to(self.device)
+
         # TODO: arch search configs
 
         self._init_arch_params(arch_init_type, arch_init_ratio)

From 810ea958bd5e3fcea1f3b0e4d78d59f6eb9c3c69 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Fri, 13 Dec 2019 19:34:34 +0800
Subject: [PATCH 37/60] update

---
 examples/nas/proxylessnas/main.py             |  2 +-
 .../nni/nas/pytorch/proxylessnas/mutator.py   | 32 +++++--------------
 .../nni/nas/pytorch/proxylessnas/trainer.py   |  2 +-
 3 files changed, 10 insertions(+), 26 deletions(-)

diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py
index ec8465d6b7..efc1809355 100644
--- a/examples/nas/proxylessnas/main.py
+++ b/examples/nas/proxylessnas/main.py
@@ -100,7 +100,7 @@
                                   train_loader=train_loader,
                                   valid_loader=valid_loader,
                                   device=device,
-                                  warmup=True)
+                                  warmup=False)
 
     print('=============================================Start to train ProxylessNasTrainer')
     trainer.train()
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 75ba5f4dec..bc60ad1322 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -60,7 +60,8 @@ class MixedOp(nn.Module):
     """
     This class is to instantiate and manage info of one LayerChoice
     """
-    def __init__(self, mutable, forward_mode=None):
+    forward_mode = None
+    def __init__(self, mutable):
         """
         Parameters
         ----------
@@ -68,7 +69,6 @@ def __init__(self, mutable, forward_mode=None):
             A LayerChoice in user model
         """
         super(MixedOp, self).__init__()
-        #self.mutable = mutable
         self.AP_path_alpha = nn.Parameter(torch.Tensor(mutable.length))
         self.AP_path_wb = nn.Parameter(torch.Tensor(mutable.length))
         self.AP_path_alpha.requires_grad = False
@@ -78,17 +78,10 @@ def __init__(self, mutable, forward_mode=None):
         self.log_prob = None
         self.current_prob_over_ops = None
         self.n_choices = mutable.length
-        self.forward_mode = forward_mode
 
     def get_AP_path_alpha(self):
         return self.AP_path_alpha
 
-    def set_forward_mode(self, mode):
-        self.forward_mode = mode
-
-    def get_forward_mode(self):
-        return self.forward_mode
-
     def to_requires_grad(self):
         self.AP_path_alpha.requires_grad = True
         self.AP_path_wb.requires_grad = True
@@ -98,7 +91,7 @@ def disable_grad(self):
         self.AP_path_wb.requires_grad = False
 
     def forward(self, mutable, x):
-        if self.forward_mode == 'full' or self.forward_mode == 'two':
+        if MixedOp.forward_mode == 'full' or MixedOp.forward_mode == 'two':
             output = 0
             for _i in self.active_index:
                 oi = self.candidate_ops[_i](x)
@@ -106,7 +99,7 @@ def forward(self, mutable, x):
             for _i in self.inactive_index:
                 oi = self.candidate_ops[_i](x)
                 output = output + self.AP_path_wb[_i] * oi.detach()
-        elif self.forward_mode == 'full_v2':
+        elif MixedOp.forward_mode == 'full_v2':
             # does not work in DataParallel, possible memory leak
             def run_function(key, candidate_ops, active_id):
                 def forward(_x):
@@ -177,7 +170,7 @@ def binarize(self, mutable):
         # reset binary gates
         self.AP_path_wb.data.zero_()
         probs = self.probs_over_ops
-        if self.forward_mode == 'two':
+        if MixedOp.forward_mode == 'two':
             # sample two ops according to probs
             sample_op = torch.multinomial(probs.data, 2, replacement=False)
             probs_slice = F.softmax(torch.stack([
@@ -223,7 +216,7 @@ def set_arch_param_grad(self, mutable):
             return
         if self.AP_path_alpha.grad is None:
             self.AP_path_alpha.grad = torch.zeros_like(self.AP_path_alpha.data)
-        if self.forward_mode == 'two':
+        if MixedOp.forward_mode == 'two':
             involved_idx = self.active_index + self.inactive_index
             probs_slice = F.softmax(torch.stack([
                 self.AP_path_alpha[idx] for idx in involved_idx
@@ -344,12 +337,10 @@ def get_architecture_parameters(self):
             yield self.mixed_ops[k].get_AP_path_alpha()
 
     def change_forward_mode(self, mode):
-        for k in self.mixed_ops.keys():
-            self.mixed_ops[k].set_forward_mode(mode)
+        MixedOp.forward_mode = mode
 
     def get_forward_mode(self):
-        for k in self.mixed_ops.keys():
-            return self.mixed_ops[k].get_forward_mode()
+        return MixedOp.forward_mode
 
     def rescale_updated_arch_param(self):
         for k in self.mixed_ops.keys():
@@ -386,10 +377,3 @@ def arch_requires_grad(self):
     def arch_disable_grad(self):
         for _, mutable, _ in self.named_mutables(distinct=False):
             mutable.registered_module.disable_grad()
-
-    '''def get_arch_parameters(self):
-        params = []
-        for _, mutable, _ in self.named_mutables(distinct=False):
-            par = mutable.registered_module.Parameters()
-            params = params + list(par)
-        return params'''
\ No newline at end of file
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index f9d24aa52b..971ab55ef6 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -332,7 +332,7 @@ def _train(self):
                 self.model_optim.step()
                 self.mutator.unused_modules_back()
                 # TODO: if epoch > 0:
-                if epoch > 0:
+                if epoch >= 0:
                     for _ in range(update_schedule.get(i, 0)):
                         start_time = time.time()
                         # GradientArchSearchConfig

From b890fced1b1c99da2b415489ef3526dbeb77c582 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Fri, 13 Dec 2019 19:42:02 +0800
Subject: [PATCH 38/60] update

---
 .../nni/nas/pytorch/proxylessnas/mutator.py   | 27 +++++++------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index bc60ad1322..188a5b8d27 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -266,11 +266,9 @@ def __init__(self, model):
         """
         super(ProxylessNasMutator, self).__init__(model)
         self._unused_modules = None
-        self.mixed_ops = {}
         self.mutable_list = []
         for _, mutable, _ in self.named_mutables(distinct=False):
             mo = MixedOp(mutable)
-            self.mixed_ops[mutable.key] = mo
             self.mutable_list.append(mutable)
             mutable.registered_module = mo
 
@@ -292,8 +290,6 @@ def on_forward_layer_choice(self, mutable, *inputs):
         index of the chosen op
         """
         # FIXME: return mask, to be consistent with other algorithms
-        #idx = self.mixed_ops[mutable.key].active_op_index
-        #return self.mixed_ops[mutable.key](mutable, *inputs), idx
         idx = mutable.registered_module.active_op_index
         return mutable.registered_module(mutable, *inputs), idx
 
@@ -302,16 +298,15 @@ def reset_binary_gates(self):
         For each LayerChoice, binarize based on alpha to only activate one op
         """
         for _, mutable, _ in self.named_mutables(distinct=False):
-            k = mutable.key
-            self.mixed_ops[k].binarize(mutable)
+            mutable.registered_module.binarize(mutable)
 
     def set_chosen_op_active(self):
         """
         For each LayerChoice, set the op with highest alpha as the chosen op
         Usually used for validation.
         """
-        for k in self.mixed_ops.keys():
-            self.mixed_ops[k].set_chosen_op_active()
+        for _, mutable, _ in self.named_mutables(distinct=False):
+            mutable.registered_module.set_chosen_op_active()
 
     def num_arch_params(self):
         """
@@ -319,22 +314,21 @@ def num_arch_params(self):
         -------
         The number of LayerChoice in user model
         """
-        return len(self.mixed_ops)
+        return len(self.mutable_list)
 
     def set_arch_param_grad(self):
         """
         For each LayerChoice, calculate gradients for architecture weights, i.e., alpha
         """
         for _, mutable, _ in self.named_mutables(distinct=False):
-            k = mutable.key
-            self.mixed_ops[k].set_arch_param_grad(mutable)
+            mutable.registered_module.set_arch_param_grad(mutable)
 
     def get_architecture_parameters(self):
         """
         Return architecture weights of each LayerChoice, for arch optimizer
         """
-        for k in self.mixed_ops.keys():
-            yield self.mixed_ops[k].get_AP_path_alpha()
+        for _, mutable, _ in self.named_mutables(distinct=False):
+            yield mutable.registered_module.get_AP_path_alpha()
 
     def change_forward_mode(self, mode):
         MixedOp.forward_mode = mode
@@ -343,14 +337,13 @@ def get_forward_mode(self):
         return MixedOp.forward_mode
 
     def rescale_updated_arch_param(self):
-        for k in self.mixed_ops.keys():
-            self.mixed_ops[k].rescale_updated_arch_param()
+        for _, mutable, _ in self.named_mutables(distinct=False):
+            mutable.registered_module.rescale_updated_arch_param()
 
     def unused_modules_off(self):
         self._unused_modules = []
         for _, mutable, _ in self.named_mutables(distinct=False):
-            k = mutable.key
-            mixed_op = self.mixed_ops[k]
+            mixed_op = mutable.registered_module
             unused = {}
             if self.get_forward_mode() in ['full', 'two', 'full_v2']:
                 involved_index = mixed_op.active_index + mixed_op.inactive_index

From 5996d4fd2e6adeceaf56f1655b9878139dc5b8eb Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Fri, 13 Dec 2019 19:50:20 +0800
Subject: [PATCH 39/60] update

---
 examples/nas/proxylessnas/main.py                     | 2 +-
 src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py
index efc1809355..ec8465d6b7 100644
--- a/examples/nas/proxylessnas/main.py
+++ b/examples/nas/proxylessnas/main.py
@@ -100,7 +100,7 @@
                                   train_loader=train_loader,
                                   valid_loader=valid_loader,
                                   device=device,
-                                  warmup=False)
+                                  warmup=True)
 
     print('=============================================Start to train ProxylessNasTrainer')
     trainer.train()
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 971ab55ef6..f9d24aa52b 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -332,7 +332,7 @@ def _train(self):
                 self.model_optim.step()
                 self.mutator.unused_modules_back()
                 # TODO: if epoch > 0:
-                if epoch >= 0:
+                if epoch > 0:
                     for _ in range(update_schedule.get(i, 0)):
                         start_time = time.time()
                         # GradientArchSearchConfig

From 51128bb32afb03b67f3ea32875e20b0a2f16560e Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Mon, 16 Dec 2019 09:44:50 +0800
Subject: [PATCH 40/60] update

---
 .../pynni/nni/nas/pytorch/proxylessnas/trainer.py | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index f9d24aa52b..8713fa963a 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -78,7 +78,7 @@ def accuracy(output, target, topk=(1,)):
 
 class ProxylessNasTrainer(BaseTrainer):
     def __init__(self, model, model_optim, train_loader, valid_loader, device,
-                 n_epochs=150, init_lr=0.05, arch_init_type='normal', arch_init_ratio=1e-3,
+                 n_epochs=120, init_lr=0.025, arch_init_type='normal', arch_init_ratio=1e-3,
                  arch_optim_lr=1e-3, arch_weight_decay=0, warmup=True, warmup_epochs=25,
                  arch_valid_frequency=1):
         """
@@ -117,10 +117,8 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device,
         self.warmup = warmup
         self.warmup_epochs = warmup_epochs
         self.arch_valid_frequency = arch_valid_frequency
-
-        self.train_epochs = 120
-        self.lr_max = 0.05
         self.label_smoothing = 0.1
+
         self.valid_batch_size = 500
         self.arch_grad_train_batch_size = 256
         # update architecture parameters every this number of minibatches
@@ -143,7 +141,9 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device,
         # build architecture optimizer
         self.arch_optimizer = torch.optim.Adam(self.mutator.get_architecture_parameters(),
                                                arch_optim_lr,
-                                               weight_decay=arch_weight_decay)
+                                               weight_decay=arch_weight_decay,
+                                               betas=(0, 0.999),
+                                               eps=1e-8)
 
         self.criterion = nn.CrossEntropyLoss()
 
@@ -196,6 +196,7 @@ def _validate(self):
         return losses.avg, top1.avg, top5.avg
 
     def _warm_up(self):
+        lr_max = 0.05
         data_loader = self.train_loader
         nBatch = len(data_loader)
         T_total = self.warmup_epochs * nBatch # total num of batches
@@ -217,7 +218,7 @@ def _warm_up(self):
                 data_time.update(time.time() - end)
                 # lr
                 T_cur = epoch * nBatch + i
-                warmup_lr = 0.5 * self.lr_max * (1 + math.cos(math.pi * T_cur / T_total))
+                warmup_lr = 0.5 * lr_max * (1 + math.cos(math.pi * T_cur / T_total))
                 for param_group in self.model_optim.param_groups:
                     param_group['lr'] = warmup_lr
                 images, labels = images.to(self.device), labels.to(self.device)
@@ -295,7 +296,7 @@ def _train(self):
 
         update_schedule = self._get_update_schedule(nBatch)
 
-        for epoch in range(self.train_epochs):
+        for epoch in range(self.n_epochs):
             print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n')
             batch_time = AverageMeter('batch_time')
             data_time = AverageMeter('data_time')

From 3b3aba4bffe0574833cfa928812c9a1dade9a47f Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Mon, 16 Dec 2019 14:47:14 +0800
Subject: [PATCH 41/60] update

---
 examples/nas/proxylessnas/main.py             |  14 +-
 src/sdk/pynni/nni/nas/pytorch/mutables.py     |   1 -
 .../nni/nas/pytorch/proxylessnas/mutator.py   |   9 +-
 .../nni/nas/pytorch/proxylessnas/trainer.py   | 122 ++++++++----------
 4 files changed, 61 insertions(+), 85 deletions(-)

diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py
index ec8465d6b7..277faba192 100644
--- a/examples/nas/proxylessnas/main.py
+++ b/examples/nas/proxylessnas/main.py
@@ -40,8 +40,8 @@
     parser.add_argument("--dropout_rate", default=0, type=float)
     parser.add_argument("--no_decay_keys", default='bn', type=str, choices=[None, 'bn', 'bn#bias'])
     # configurations of imagenet dataset
-    #parser.add_argument("--data_path", default='/data/hdd3/yugzh/imagenet/', type=str)
-    parser.add_argument("--data_path", default='/mnt/v-yugzh/imagenet/', type=str)
+    parser.add_argument("--data_path", default='/data/hdd3/yugzh/imagenet/', type=str)
+    #parser.add_argument("--data_path", default='/mnt/v-yugzh/imagenet/', type=str)
     parser.add_argument("--train_batch_size", default=256, type=int)
     parser.add_argument("--test_batch_size", default=500, type=int)
     parser.add_argument("--n_worker", default=32, type=int)
@@ -61,11 +61,8 @@
     print('=============================================SearchMobileNet model init done')
 
     # move network to GPU if available
-    # data parallelism not supported yet
     if torch.cuda.is_available():
         device = torch.device('cuda:0')
-        #model = torch.nn.DataParallel(model)
-        #model.to(device)
     else:
         device = torch.device('cpu')
 
@@ -82,12 +79,11 @@
         optimizer = torch.optim.SGD(model, get_parameters(), lr=0.05, momentum=momentum, nesterov=nesterov, weight_decay=4e-5)
 
     print('=============================================Start to create data provider')
-    # TODO: 
     data_provider = datasets.ImagenetDataProvider(save_path=args.data_path,
-                                                  train_batch_size=args.train_batch_size, #256,
-                                                  test_batch_size=args.test_batch_size, #500,
+                                                  train_batch_size=args.train_batch_size,
+                                                  test_batch_size=args.test_batch_size,
                                                   valid_size=None,
-                                                  n_worker=args.n_worker, #32,
+                                                  n_worker=args.n_worker,
                                                   resize_scale=args.resize_scale,
                                                   distort_color=args.distort_color)
     print('=============================================Finish to create data provider')
diff --git a/src/sdk/pynni/nni/nas/pytorch/mutables.py b/src/sdk/pynni/nni/nas/pytorch/mutables.py
index a1d448a646..16b73b903d 100644
--- a/src/sdk/pynni/nni/nas/pytorch/mutables.py
+++ b/src/sdk/pynni/nni/nas/pytorch/mutables.py
@@ -92,7 +92,6 @@ def __init__(self, op_candidates, reduction="mean", return_mask=False, key=None)
         self.choices = nn.ModuleList(op_candidates)
         self.reduction = reduction
         self.return_mask = return_mask
-        self.registered_module = None
 
     def __len__(self):
         return len(self.choices)
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 188a5b8d27..0b590265e1 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -24,14 +24,7 @@
 import numpy as np
 
 from nni.nas.pytorch.base_mutator import BaseMutator
-
-def detach_variable(inputs):
-    if isinstance(inputs, tuple):
-        return tuple([detach_variable(x) for x in inputs])
-    else:
-        x = inputs.detach()
-        x.requires_grad = inputs.requires_grad
-        return x
+from .utils import detach_variable
 
 class ArchGradientFunction(torch.autograd.Function):
 
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 8713fa963a..4ab88df41c 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -27,60 +27,18 @@
 from nni.nas.pytorch.base_trainer import BaseTrainer
 from nni.nas.utils import AverageMeter
 from .mutator import ProxylessNasMutator
-
-
-def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1):
-    """
-    Parameters
-    ----------
-    pred :
-    target :
-    label_smoothing :
-
-    Returns
-    -------
-    """
-    logsoftmax = nn.LogSoftmax()
-    n_classes = pred.size(1)
-    # convert to one-hot
-    target = torch.unsqueeze(target, 1)
-    soft_target = torch.zeros_like(pred)
-    soft_target.scatter_(1, target, 1)
-    # label smoothing
-    soft_target = soft_target * (1 - label_smoothing) + label_smoothing / n_classes
-    return torch.mean(torch.sum(- soft_target * logsoftmax(pred), 1))
-
-def accuracy(output, target, topk=(1,)):
-    """
-    Computes the precision@k for the specified values of k
-
-    Parameters
-    ----------
-    output :
-    target :
-    topk :
-
-    Returns
-    -------
-    """
-    maxk = max(topk)
-    batch_size = target.size(0)
-
-    _, pred = output.topk(maxk, 1, True, True)
-    pred = pred.t()
-    correct = pred.eq(target.view(1, -1).expand_as(pred))
-
-    res = []
-    for k in topk:
-        correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
-        res.append(correct_k.mul_(100.0 / batch_size))
-    return res
+from .utils import cross_entropy_with_label_smoothing, accuracy
 
 class ProxylessNasTrainer(BaseTrainer):
-    def __init__(self, model, model_optim, train_loader, valid_loader, device,
-                 n_epochs=120, init_lr=0.025, arch_init_type='normal', arch_init_ratio=1e-3,
-                 arch_optim_lr=1e-3, arch_weight_decay=0, warmup=True, warmup_epochs=25,
-                 arch_valid_frequency=1):
+    def __init__(self, model, model_optim, device,
+                 train_loader, valid_loader, label_smoothing=0.1,
+                 n_epochs=120, init_lr=0.025, binary_mode='full_v2',
+                 arch_init_type='normal', arch_init_ratio=1e-3,
+                 arch_optim_lr=1e-3, arch_weight_decay=0,
+                 grad_update_arch_param_every=5, grad_update_steps=1,
+                 warmup=True, warmup_epochs=25,
+                 arch_valid_frequency=1,
+                 load_ckpt=False, ckpt_path=None):
         """
         Parameters
         ----------
@@ -117,27 +75,30 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device,
         self.warmup = warmup
         self.warmup_epochs = warmup_epochs
         self.arch_valid_frequency = arch_valid_frequency
-        self.label_smoothing = 0.1
+        self.label_smoothing = label_smoothing
 
-        self.valid_batch_size = 500
-        self.arch_grad_train_batch_size = 256
+        self.train_batch_size = train_loader.batch_sampler.batch_size
+        self.valid_batch_size = valid_loader.batch_sampler.batch_size
         # update architecture parameters every this number of minibatches
-        self.grad_update_arch_param_every = 5
+        self.grad_update_arch_param_every = grad_update_arch_param_every
         # the number of steps per architecture parameter update
-        self.grad_update_steps = 1
-        self.binary_mode = 'full_v2'
+        self.grad_update_steps = grad_update_steps
+        self.binary_mode = binary_mode
+
+        self.load_ckpt = load_ckpt
+        self.ckpt_path = ckpt_path
 
         # init mutator
         self.mutator = ProxylessNasMutator(model)
-        self._valid_iter = None
 
+        # DataParallel should be put behind the init of mutator
         self.model = torch.nn.DataParallel(self.model)
         self.model.to(self.device)
 
-        # TODO: arch search configs
-
+        # iter of valid dataset for training architecture weights
+        self._valid_iter = None
+        # init architecture weights
         self._init_arch_params(arch_init_type, arch_init_ratio)
-
         # build architecture optimizer
         self.arch_optimizer = torch.optim.Adam(self.mutator.get_architecture_parameters(),
                                                arch_optim_lr,
@@ -146,6 +107,8 @@ def __init__(self, model, model_optim, train_loader, valid_loader, device,
                                                eps=1e-8)
 
         self.criterion = nn.CrossEntropyLoss()
+        self.warmup_curr_epoch = 0
+        self.train_curr_epoch = 0
 
     def _init_arch_params(self, init_type='normal', init_ratio=1e-3):
         for param in self.mutator.get_architecture_parameters():
@@ -201,7 +164,7 @@ def _warm_up(self):
         nBatch = len(data_loader)
         T_total = self.warmup_epochs * nBatch # total num of batches
 
-        for epoch in range(self.warmup_epochs):
+        for epoch in range(self.warmup_curr_epoch, self.warmup_epochs):
             print('\n', '-' * 30, 'Warmup epoch: %d' % (epoch + 1), '-' * 30, '\n')
             batch_time = AverageMeter('batch_time')
             data_time = AverageMeter('data_time')
@@ -261,6 +224,8 @@ def _warm_up(self):
                       'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}M'. \
                 format(epoch + 1, self.warmup_epochs, val_loss, val_top1, val_top5, top1=top1, top5=top5)
             print(val_log)
+            self.save_checkpoint()
+            self.warmup_curr_epoch += 1
 
     def _get_update_schedule(self, nBatch):
         schedule = {}
@@ -296,7 +261,7 @@ def _train(self):
 
         update_schedule = self._get_update_schedule(nBatch)
 
-        for epoch in range(self.n_epochs):
+        for epoch in range(self.train_curr_epoch, self.n_epochs):
             print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n')
             batch_time = AverageMeter('batch_time')
             data_time = AverageMeter('data_time')
@@ -317,7 +282,6 @@ def _train(self):
                 # train weight parameters
                 images, labels = images.to(self.device), labels.to(self.device)
                 self.mutator.reset_binary_gates()
-                # TODO: remove unused module for speedup
                 self.mutator.unused_modules_off()
                 output = self.model(images)
                 if self.label_smoothing > 0:
@@ -332,7 +296,6 @@ def _train(self):
                 loss.backward()
                 self.model_optim.step()
                 self.mutator.unused_modules_back()
-                # TODO: if epoch > 0:
                 if epoch > 0:
                     for _ in range(update_schedule.get(i, 0)):
                         start_time = time.time()
@@ -368,6 +331,8 @@ def _train(self):
                     format(epoch + 1, val_loss, val_top1,
                            val_top5, entropy=entropy, top1=top1, top5=top5)
                 print(val_log)
+            self.save_checkpoint()
+            self.train_curr_epoch += 1
         # convert to normal network according to architecture parameters
 
     def _valid_next_batch(self):
@@ -381,7 +346,8 @@ def _valid_next_batch(self):
         return data
 
     def _gradient_step(self):
-        self.valid_loader.batch_sampler.batch_size = self.arch_grad_train_batch_size
+        # use the same batch size as train batch size for architecture weights
+        self.valid_loader.batch_sampler.batch_size = self.train_batch_size
         self.valid_loader.batch_sampler.drop_last = True
         self.model.train()
         self.mutator.change_forward_mode(self.binary_mode)
@@ -409,7 +375,29 @@ def _gradient_step(self):
         print('(%.4f, %.4f, %.4f)' % (time2 - time1, time3 - time2, time4 - time3))
         return loss.data.item(), expected_value.item() if expected_value is not None else None
 
+    def save_checkpoint(self):
+        if self.ckpt_path:
+            state = {
+                'warmup_curr_epoch': self.warmup_curr_epoch,
+                'train_curr_epoch': self.train_curr_epoch,
+                'model': self.model.state_dict(),
+                'optim': self.model_optim.state_dict(),
+                'arch_optim': self.arch_optimizer.state_dict()
+            }
+            torch.save(state, self.ckpt_path)
+
+    def load_checkpoint(self):
+        assert self.ckpt_path is not None, "If load_ckpt is not None, ckpt_path should not be None"
+        ckpt = torch.load(self.ckpt_path)
+        self.warmup_curr_epoch = ckpt['warmup_curr_epoch']
+        self.train_curr_epoch = ckpt['train_curr_epoch']
+        self.model.load_state_dict(ckpt['model'])
+        self.model_optim.load_state_dict(ckpt['optim'])
+        self.arch_optimizer.load_state_dict(ckpt['arch_optim'])
+
     def train(self):
+        if self.load_ckpt:
+            load_checkpoint()
         if self.warmup:
             self._warm_up()
         self._train()

From 14f3f1da607380c2cdee32a873302e9ad05d6e87 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Mon, 16 Dec 2019 17:23:15 +0800
Subject: [PATCH 42/60] add retrain

---
 examples/nas/proxylessnas/main.py             |  77 ++++----
 examples/nas/proxylessnas/retrain.py          | 177 ++++++++++++++++++
 .../nni/nas/pytorch/proxylessnas/mutator.py   |  21 +--
 .../nni/nas/pytorch/proxylessnas/trainer.py   |  21 +--
 4 files changed, 217 insertions(+), 79 deletions(-)
 create mode 100644 examples/nas/proxylessnas/retrain.py

diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py
index 277faba192..532b053cec 100644
--- a/examples/nas/proxylessnas/main.py
+++ b/examples/nas/proxylessnas/main.py
@@ -1,23 +1,7 @@
-# Copyright (c) Microsoft Corporation
-# All rights reserved.
-#
-# MIT License
-#
-# Permission is hereby granted, free of charge,
-# to any person obtaining a copy of this software and associated
-# documentation files (the "Software"), to deal in the Software without restriction,
-# including without limitation the rights to use, copy, modify, merge, publish,
-# distribute, sublicense, and/or sell copies of the Software, and
-# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
-# The above copyright notice and this permission notice shall be included
-# in all copies or substantial portions of the Software.
-#
-# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
-# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
-# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
 
+import os
 from argparse import ArgumentParser
 
 import datasets
@@ -27,6 +11,7 @@
 from putils import get_parameters
 from model import SearchMobileNet
 from nni.nas.pytorch.proxylessnas import ProxylessNasTrainer
+from .retrain import retrain
 
 
 if __name__ == "__main__":
@@ -47,7 +32,9 @@
     parser.add_argument("--n_worker", default=32, type=int)
     parser.add_argument("--resize_scale", default=0.08, type=float)
     parser.add_argument("--distort_color", default='normal', type=str, choices=['normal', 'strong', 'None'])
-    #parser.add_argument("--log-frequency", default=1, type=int)
+    # configurations for retain
+    parser.add_argument("--retrain", default=False, type=bool)
+    parser.add_argument("--exported_arch_path", default=None, type=str)
     args = parser.parse_args()
 
     model = SearchMobileNet(width_stages=[int(i) for i in args.width_stages.split(',')],
@@ -67,17 +54,6 @@
         device = torch.device('cpu')
 
     # TODO: net info
-
-    if args.no_decay_keys:
-        keys = args.no_decay_keys
-        momentum, nesterov = 0.9, True
-        optimizer = torch.optim.SGD([
-            {'params': get_parameters(model, keys, mode='exclude'), 'weight_decay': 4e-5},
-            {'params': get_parameters(model, keys, mode='include'), 'weight_decay': 0},
-        ], lr=0.05, momentum=momentum, nesterov=nesterov)
-    else:
-        optimizer = torch.optim.SGD(model, get_parameters(), lr=0.05, momentum=momentum, nesterov=nesterov, weight_decay=4e-5)
-
     print('=============================================Start to create data provider')
     data_provider = datasets.ImagenetDataProvider(save_path=args.data_path,
                                                   train_batch_size=args.train_batch_size,
@@ -90,14 +66,33 @@
     train_loader = data_provider.train
     valid_loader = data_provider.valid
 
-    print('=============================================Start to create ProxylessNasTrainer')
-    trainer = ProxylessNasTrainer(model,
-                                  model_optim=optimizer,
-                                  train_loader=train_loader,
-                                  valid_loader=valid_loader,
-                                  device=device,
-                                  warmup=True)
+    if args.no_decay_keys:
+        keys = args.no_decay_keys
+        momentum, nesterov = 0.9, True
+        optimizer = torch.optim.SGD([
+            {'params': get_parameters(model, keys, mode='exclude'), 'weight_decay': 4e-5},
+            {'params': get_parameters(model, keys, mode='include'), 'weight_decay': 0},
+        ], lr=0.05, momentum=momentum, nesterov=nesterov)
+    else:
+        optimizer = torch.optim.SGD(model, get_parameters(), lr=0.05, momentum=momentum, nesterov=nesterov, weight_decay=4e-5)
 
-    print('=============================================Start to train ProxylessNasTrainer')
-    trainer.train()
-    trainer.export()
+    if not args.retrain:
+        # this is architecture search
+        print('=============================================Start to create ProxylessNasTrainer')
+        trainer = ProxylessNasTrainer(model,
+                                    model_optim=optimizer,
+                                    train_loader=train_loader,
+                                    valid_loader=valid_loader,
+                                    device=device,
+                                    warmup=True)
+
+        print('=============================================Start to train ProxylessNasTrainer')
+        trainer.train()
+        trainer.export()
+    else:
+        # this is retrain
+        from nni.nas.pytorch.fixed import apply_fixed_architecture
+        assert os.path.isfile(args.exported_arch_path), \
+            "exported_arch_path {} should be a file.".format(args.exported_arch_path)
+        apply_fixed_architecture(model, args.exported_arch_path, device=device)
+        retrain(model, optimizer, device, data_provider, n_epochs=300)
diff --git a/examples/nas/proxylessnas/retrain.py b/examples/nas/proxylessnas/retrain.py
new file mode 100644
index 0000000000..67c9b0ec90
--- /dev/null
+++ b/examples/nas/proxylessnas/retrain.py
@@ -0,0 +1,177 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import time
+from datetime import timedelta
+import torch
+from torch import nn as nn
+from nni.nas.utils import AverageMeter
+
+criterion = nn.CrossEntropyLoss()
+
+def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1):
+    logsoftmax = nn.LogSoftmax()
+    n_classes = pred.size(1)
+    # convert to one-hot
+    target = torch.unsqueeze(target, 1)
+    soft_target = torch.zeros_like(pred)
+    soft_target.scatter_(1, target, 1)
+    # label smoothing
+    soft_target = soft_target * (1 - label_smoothing) + label_smoothing / n_classes
+    return torch.mean(torch.sum(- soft_target * logsoftmax(pred), 1))
+
+def accuracy(output, target, topk=(1,)):
+    maxk = max(topk)
+    batch_size = target.size(0)
+
+    _, pred = output.topk(maxk, 1, True, True)
+    pred = pred.t()
+    correct = pred.eq(target.view(1, -1).expand_as(pred))
+
+    res = []
+    for k in topk:
+        correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
+        res.append(correct_k.mul_(100.0 / batch_size))
+    return res
+
+def validate(model, device, valid_loader, test_loader, is_test=True):
+    if is_test:
+        data_loader = test_loader
+    else:
+        data_loader = valid_loader
+    model.eval()
+    batch_time = AverageMeter()
+    losses = AverageMeter()
+    top1 = AverageMeter()
+    top5 = AverageMeter()
+
+    end = time.time()
+    with torch.no_grad():
+        for i, (images, labels) in enumerate(data_loader):
+            images, labels = images.to(device), labels.to(device)
+            # compute output
+            output = model(images)
+            loss = criterion(output, labels)
+            # measure accuracy and record loss
+            acc1, acc5 = accuracy(output, labels, topk=(1, 5))
+            losses.update(loss, images.size(0))
+            top1.update(acc1[0], images.size(0))
+            top5.update(acc5[0], images.size(0))
+            # measure elapsed time
+            batch_time.update(time.time() - end)
+            end = time.time()
+
+            if i % 10 == 0 or i + 1 == len(data_loader):
+                if is_test:
+                    prefix = 'Test'
+                else:
+                    prefix = 'Valid'
+                test_log = prefix + ': [{0}/{1}]\t'\
+                                    'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'\
+                                    'Loss {loss.val:.4f} ({loss.avg:.4f})\t'\
+                                    'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})'.\
+                    format(i, len(data_loader) - 1, batch_time=batch_time, loss=losses, top1=top1)
+                test_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5)
+                print(test_log)
+    return losses.avg, top1.avg, top5.avg
+
+def train_one_epoch(model, train_loader, device, optimizer, adjust_lr_func, train_log_func, label_smoothing=0.1):
+        batch_time = AverageMeter()
+        data_time = AverageMeter()
+        losses = AverageMeter()
+        top1 = AverageMeter()
+        top5 = AverageMeter()
+        model.train()
+        end = time.time()
+        for i, (images, labels) in enumerate(train_loader):
+            data_time.update(time.time() - end)
+            new_lr = adjust_lr_func(i)
+            images, labels = images.to(device), labels.to(device)
+            output = model(images)
+            if label_smoothing > 0:
+                loss = cross_entropy_with_label_smoothing(output, labels, self.run_config.label_smoothing)
+            else:
+                loss = criterion(output, labels)
+            acc1, acc5 = accuracy(output, labels, topk=(1, 5))
+            losses.update(loss, images.size(0))
+            top1.update(acc1[0], images.size(0))
+            top5.update(acc5[0], images.size(0))
+
+            # compute gradient and do SGD step
+            model.zero_grad()  # or self.optimizer.zero_grad()
+            loss.backward()
+            optimizer.step()
+
+            # measure elapsed time
+            batch_time.update(time.time() - end)
+            end = time.time()
+
+            if i % 10 == 0 or i + 1 == len(train_loader):
+                batch_log = train_log_func(i, batch_time, data_time, losses, top1, top5, new_lr)
+                print(batch_log)
+        return top1, top5
+
+def train(model, optimizer, device, train_loader, valid_loader, test_loader, n_epochs, validation_frequency=1):
+    best_acc = 0
+    nBatch = len(train_loader)
+
+    def train_log_func(epoch_, i, batch_time, data_time, losses, top1, top5, lr):
+            batch_log = 'Train [{0}][{1}/{2}]\t' \
+                        'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' \
+                        'Data {data_time.val:.3f} ({data_time.avg:.3f})\t' \
+                        'Loss {losses.val:.4f} ({losses.avg:.4f})\t' \
+                        'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})'. \
+                format(epoch_ + 1, i, nBatch - 1,
+                       batch_time=batch_time, data_time=data_time, losses=losses, top1=top1)
+            if print_top5:
+                batch_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5)
+            batch_log += '\tlr {lr:.5f}'.format(lr=lr)
+            return batch_log
+    
+    def adjust_learning_rate(n_epochs, optimizer, epoch, batch=0, nBatch=None):
+        """ adjust learning of a given optimizer and return the new learning rate """
+        # cosine
+        T_total = n_epochs * nBatch
+        T_cur = epoch * nBatch + batch
+        # init_lr = 0.05
+        new_lr = 0.5 * 0.05 * (1 + math.cos(math.pi * T_cur / T_total))
+        for param_group in optimizer.param_groups:
+            param_group['lr'] = new_lr
+        return new_lr
+
+    for epoch in range(n_epochs):
+        print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n')
+        end = time.time()
+        train_top1, train_top5 = train_one_epoch(model, train_loader, device, optimizer
+            lambda i: adjust_learning_rate(n_epochs, optimizer, epoch, i, nBatch),
+            lambda i, batch_time, data_time, losses, top1, top5, new_lr:
+            train_log_func(epoch, i, batch_time, data_time, losses, top1, top5, new_lr),
+        )
+        time_per_epoch = time.time() - end
+        seconds_left = int((n_epochs - epoch - 1) * time_per_epoch)
+        print('Time per epoch: %s, Est. complete in: %s' % (
+            str(timedelta(seconds=time_per_epoch)),
+            str(timedelta(seconds=seconds_left))))
+        
+        if (epoch + 1) % validation_frequency == 0:
+            val_loss, val_acc, val_acc5 = validate(model, device, valid_loader, test_loader, is_test=False)
+            is_best = val_acc > best_acc
+            best_acc = max(best_acc, val_acc)
+            val_log = 'Valid [{0}/{1}]\tloss {2:.3f}\ttop-1 acc {3:.3f} ({4:.3f})'.\
+                format(epoch + 1, n_epochs, val_loss, val_acc, best_acc)
+            val_log += '\ttop-5 acc {0:.3f}\tTrain top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}'.\
+                format(val_acc5, top1=train_top1, top5=train_top5)
+            print(val_log)
+        else:
+            is_best = False
+
+def retrain(model, optimizer, device, data_provider, n_epochs):
+    train_loader = data_provider.train
+    valid_loader = data_provider.valid
+    test_loader = data_provider.test
+    # train
+    train(model, optimizer, device, train_loader, valid_loader, test_loader, n_epochs)
+    # validate
+    validate(model, device, valid_loader, test_loader, is_test=False)
+    # test
+    validate(model, device, valid_loader, test_loader, is_test=True)
\ No newline at end of file
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 0b590265e1..9307ba175c 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -1,22 +1,5 @@
-# Copyright (c) Microsoft Corporation
-# All rights reserved.
-#
-# MIT License
-#
-# Permission is hereby granted, free of charge,
-# to any person obtaining a copy of this software and associated
-# documentation files (the "Software"), to deal in the Software without restriction,
-# including without limitation the rights to use, copy, modify, merge, publish,
-# distribute, sublicense, and/or sell copies of the Software, and
-# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
-# The above copyright notice and this permission notice shall be included
-# in all copies or substantial portions of the Software.
-#
-# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
-# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
-# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
 
 import torch
 from torch import nn as nn
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 4ab88df41c..30fc12de87 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -1,22 +1,5 @@
-# Copyright (c) Microsoft Corporation
-# All rights reserved.
-#
-# MIT License
-#
-# Permission is hereby granted, free of charge,
-# to any person obtaining a copy of this software and associated
-# documentation files (the "Software"), to deal in the Software without restriction,
-# including without limitation the rights to use, copy, modify, merge, publish,
-# distribute, sublicense, and/or sell copies of the Software, and
-# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
-# The above copyright notice and this permission notice shall be included
-# in all copies or substantial portions of the Software.
-#
-# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
-# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
-# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
 
 import math
 import time

From 346e5a476624f770c1148cb3169b0eaab3bb306f Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Mon, 16 Dec 2019 20:19:57 +0800
Subject: [PATCH 43/60] update

---
 examples/nas/proxylessnas/main.py             |  6 +++--
 examples/nas/proxylessnas/retrain.py          |  2 +-
 .../nni/nas/pytorch/proxylessnas/mutator.py   | 11 +++++++-
 .../nni/nas/pytorch/proxylessnas/trainer.py   | 26 ++++++++++++++++---
 4 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py
index 532b053cec..845fa315be 100644
--- a/examples/nas/proxylessnas/main.py
+++ b/examples/nas/proxylessnas/main.py
@@ -11,7 +11,7 @@
 from putils import get_parameters
 from model import SearchMobileNet
 from nni.nas.pytorch.proxylessnas import ProxylessNasTrainer
-from .retrain import retrain
+from retrain import retrain
 
 
 if __name__ == "__main__":
@@ -84,7 +84,9 @@
                                     train_loader=train_loader,
                                     valid_loader=valid_loader,
                                     device=device,
-                                    warmup=True)
+                                    warmup=True,
+                                    ckpt_path='./search_mobile_net.pt',
+                                    arch_path='./arch_path.pt')
 
         print('=============================================Start to train ProxylessNasTrainer')
         trainer.train()
diff --git a/examples/nas/proxylessnas/retrain.py b/examples/nas/proxylessnas/retrain.py
index 67c9b0ec90..5278f405f0 100644
--- a/examples/nas/proxylessnas/retrain.py
+++ b/examples/nas/proxylessnas/retrain.py
@@ -142,7 +142,7 @@ def adjust_learning_rate(n_epochs, optimizer, epoch, batch=0, nBatch=None):
     for epoch in range(n_epochs):
         print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n')
         end = time.time()
-        train_top1, train_top5 = train_one_epoch(model, train_loader, device, optimizer
+        train_top1, train_top5 = train_one_epoch(model, train_loader, device, optimizer,
             lambda i: adjust_learning_rate(n_epochs, optimizer, epoch, i, nBatch),
             lambda i, batch_time, data_time, losses, top1, top5, new_lr:
             train_log_func(epoch, i, batch_time, data_time, losses, top1, top5, new_lr),
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 9307ba175c..029ddd7fbb 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -6,8 +6,9 @@
 from torch.nn import functional as F
 import numpy as np
 
-from nni.nas.pytorch.base_mutator import BaseMutator
 from .utils import detach_variable
+from nni.nas.pytorch.base_mutator import BaseMutator
+from nni.nas.pytorch.mutables import LayerChoice
 
 class ArchGradientFunction(torch.autograd.Function):
 
@@ -346,3 +347,11 @@ def arch_requires_grad(self):
     def arch_disable_grad(self):
         for _, mutable, _ in self.named_mutables(distinct=False):
             mutable.registered_module.disable_grad()
+
+    def sample_final(self):
+        result = dict()
+        for _, mutable, _ in self.named_mutables(distinct=False):
+            assert isinstance(mutable, LayerChoice)
+            index, _ = mutable.registered_module.chosen_index
+            result[mutable.key] = F.one_hot(torch.tensor(index), num_classes=mutable.length).view(-1)#.bool()
+        return result
\ No newline at end of file
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 30fc12de87..16bbaf0593 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -3,15 +3,27 @@
 
 import math
 import time
+import json
 
 import torch
 from torch import nn as nn
 
 from nni.nas.pytorch.base_trainer import BaseTrainer
+#from nni.nas.pytorch.trainer import TorchTensorEncoder
 from nni.nas.utils import AverageMeter
 from .mutator import ProxylessNasMutator
 from .utils import cross_entropy_with_label_smoothing, accuracy
 
+class TorchTensorEncoder(json.JSONEncoder):
+    def default(self, o):  # pylint: disable=method-hidden
+        if isinstance(o, torch.Tensor):
+            olist = o.tolist()
+            if "bool" not in o.type().lower() and all(map(lambda d: d == 0 or d == 1, olist)):
+                print("Every element in %s is either 0 or 1. "
+                                "You might consider convert it into bool.", olist)
+            return olist
+        return super().default(o)
+
 class ProxylessNasTrainer(BaseTrainer):
     def __init__(self, model, model_optim, device,
                  train_loader, valid_loader, label_smoothing=0.1,
@@ -21,7 +33,7 @@ def __init__(self, model, model_optim, device,
                  grad_update_arch_param_every=5, grad_update_steps=1,
                  warmup=True, warmup_epochs=25,
                  arch_valid_frequency=1,
-                 load_ckpt=False, ckpt_path=None):
+                 load_ckpt=False, ckpt_path=None, arch_path=None):
         """
         Parameters
         ----------
@@ -70,6 +82,7 @@ def __init__(self, model, model_optim, device,
 
         self.load_ckpt = load_ckpt
         self.ckpt_path = ckpt_path
+        self.arch_path = arch_path
 
         # init mutator
         self.mutator = ProxylessNasMutator(model)
@@ -202,12 +215,13 @@ def _warm_up(self):
                         format(epoch + 1, i, nBatch - 1, batch_time=batch_time, data_time=data_time,
                                losses=losses, top1=top1, top5=top5, lr=warmup_lr)
                     print(batch_log)
+                self.save_checkpoint()
             val_loss, val_top1, val_top5 = self._validate()
             val_log = 'Warmup Valid [{0}/{1}]\tloss {2:.3f}\ttop-1 acc {3:.3f}\ttop-5 acc {4:.3f}\t' \
                       'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}M'. \
                 format(epoch + 1, self.warmup_epochs, val_loss, val_top1, val_top5, top1=top1, top5=top5)
             print(val_log)
-            self.save_checkpoint()
+            #self.save_checkpoint()
             self.warmup_curr_epoch += 1
 
     def _get_update_schedule(self, nBatch):
@@ -368,6 +382,8 @@ def save_checkpoint(self):
                 'arch_optim': self.arch_optimizer.state_dict()
             }
             torch.save(state, self.ckpt_path)
+        if self.arch_path:
+            self.export(self.arch_path)
 
     def load_checkpoint(self):
         assert self.ckpt_path is not None, "If load_ckpt is not None, ckpt_path should not be None"
@@ -385,8 +401,10 @@ def train(self):
             self._warm_up()
         self._train()
 
-    def export(self):
-        pass
+    def export(self, file_name):
+        exported_arch = self.mutator.sample_final()
+        with open(file_name, 'w') as f:
+            json.dump(exported_arch, f, indent=2, sort_keys=True, cls=TorchTensorEncoder)
 
     def validate(self):
         raise NotImplementedError

From 0eddd52fb12c48f670e348e227680de7a72d1d21 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Mon, 16 Dec 2019 20:54:11 +0800
Subject: [PATCH 44/60] update

---
 examples/nas/proxylessnas/ops.py              |  4 +++-
 examples/nas/proxylessnas/retrain.py          | 23 +++++++++++--------
 .../nni/nas/pytorch/proxylessnas/trainer.py   |  2 +-
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py
index 8886650739..3bfc66a8bd 100644
--- a/examples/nas/proxylessnas/ops.py
+++ b/examples/nas/proxylessnas/ops.py
@@ -60,7 +60,9 @@ def __init__(self, mobile_inverted_conv, shortcut, op_candidates_list):
 
     def forward(self, x):
         out, idx = self.mobile_inverted_conv(x)
-        #if idx == 6:
+        # TODO: unify idx format
+        if not isinstance(idx, int):
+            idx = (idx == 1).nonzero()
         if self.op_candidates_list[idx].is_zero_layer():
             res = x
         elif self.shortcut is None:
diff --git a/examples/nas/proxylessnas/retrain.py b/examples/nas/proxylessnas/retrain.py
index 5278f405f0..5013b50a1c 100644
--- a/examples/nas/proxylessnas/retrain.py
+++ b/examples/nas/proxylessnas/retrain.py
@@ -2,10 +2,11 @@
 # Licensed under the MIT license.
 
 import time
+import math
 from datetime import timedelta
 import torch
 from torch import nn as nn
-from nni.nas.utils import AverageMeter
+from nni.nas.pytorch.utils import AverageMeter
 
 criterion = nn.CrossEntropyLoss()
 
@@ -40,10 +41,10 @@ def validate(model, device, valid_loader, test_loader, is_test=True):
     else:
         data_loader = valid_loader
     model.eval()
-    batch_time = AverageMeter()
-    losses = AverageMeter()
-    top1 = AverageMeter()
-    top5 = AverageMeter()
+    batch_time = AverageMeter('batch_time')
+    losses = AverageMeter('losses')
+    top1 = AverageMeter('top1')
+    top5 = AverageMeter('top5')
 
     end = time.time()
     with torch.no_grad():
@@ -76,11 +77,11 @@ def validate(model, device, valid_loader, test_loader, is_test=True):
     return losses.avg, top1.avg, top5.avg
 
 def train_one_epoch(model, train_loader, device, optimizer, adjust_lr_func, train_log_func, label_smoothing=0.1):
-        batch_time = AverageMeter()
-        data_time = AverageMeter()
-        losses = AverageMeter()
-        top1 = AverageMeter()
-        top5 = AverageMeter()
+        batch_time = AverageMeter('batch_time')
+        data_time = AverageMeter('data_time')
+        losses = AverageMeter('losses')
+        top1 = AverageMeter('top1')
+        top5 = AverageMeter('top5')
         model.train()
         end = time.time()
         for i, (images, labels) in enumerate(train_loader):
@@ -169,6 +170,8 @@ def retrain(model, optimizer, device, data_provider, n_epochs):
     train_loader = data_provider.train
     valid_loader = data_provider.valid
     test_loader = data_provider.test
+    model = torch.nn.DataParallel(model)
+    model.to(device)
     # train
     train(model, optimizer, device, train_loader, valid_loader, test_loader, n_epochs)
     # validate
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 16bbaf0593..5391d82637 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -10,7 +10,7 @@
 
 from nni.nas.pytorch.base_trainer import BaseTrainer
 #from nni.nas.pytorch.trainer import TorchTensorEncoder
-from nni.nas.utils import AverageMeter
+from nni.nas.pytorch.utils import AverageMeter
 from .mutator import ProxylessNasMutator
 from .utils import cross_entropy_with_label_smoothing, accuracy
 

From 8d499ec0b9d6f6112c476f195c50fb1d022942ee Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Tue, 17 Dec 2019 16:06:20 +0800
Subject: [PATCH 45/60] retrain tested

---
 examples/nas/proxylessnas/main.py             |  2 +-
 examples/nas/proxylessnas/retrain.py          |  7 +++---
 src/sdk/pynni/nni/nas/pytorch/base_mutator.py |  4 ++++
 src/sdk/pynni/nni/nas/pytorch/fixed.py        |  2 +-
 .../nni/nas/pytorch/proxylessnas/mutator.py   | 22 +++++++++----------
 .../nni/nas/pytorch/proxylessnas/trainer.py   | 14 ++++--------
 6 files changed, 24 insertions(+), 27 deletions(-)

diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py
index 845fa315be..6b601f261e 100644
--- a/examples/nas/proxylessnas/main.py
+++ b/examples/nas/proxylessnas/main.py
@@ -25,7 +25,7 @@
     parser.add_argument("--dropout_rate", default=0, type=float)
     parser.add_argument("--no_decay_keys", default='bn', type=str, choices=[None, 'bn', 'bn#bias'])
     # configurations of imagenet dataset
-    parser.add_argument("--data_path", default='/data/hdd3/yugzh/imagenet/', type=str)
+    parser.add_argument("--data_path", default='/data/ssd1/v-yugzh/imagenet/', type=str)
     #parser.add_argument("--data_path", default='/mnt/v-yugzh/imagenet/', type=str)
     parser.add_argument("--train_batch_size", default=256, type=int)
     parser.add_argument("--test_batch_size", default=500, type=int)
diff --git a/examples/nas/proxylessnas/retrain.py b/examples/nas/proxylessnas/retrain.py
index 5013b50a1c..ef84b6634a 100644
--- a/examples/nas/proxylessnas/retrain.py
+++ b/examples/nas/proxylessnas/retrain.py
@@ -90,7 +90,7 @@ def train_one_epoch(model, train_loader, device, optimizer, adjust_lr_func, trai
             images, labels = images.to(device), labels.to(device)
             output = model(images)
             if label_smoothing > 0:
-                loss = cross_entropy_with_label_smoothing(output, labels, self.run_config.label_smoothing)
+                loss = cross_entropy_with_label_smoothing(output, labels, label_smoothing)
             else:
                 loss = criterion(output, labels)
             acc1, acc5 = accuracy(output, labels, topk=(1, 5))
@@ -124,8 +124,7 @@ def train_log_func(epoch_, i, batch_time, data_time, losses, top1, top5, lr):
                         'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})'. \
                 format(epoch_ + 1, i, nBatch - 1,
                        batch_time=batch_time, data_time=data_time, losses=losses, top1=top1)
-            if print_top5:
-                batch_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5)
+            batch_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5)
             batch_log += '\tlr {lr:.5f}'.format(lr=lr)
             return batch_log
     
@@ -173,7 +172,7 @@ def retrain(model, optimizer, device, data_provider, n_epochs):
     model = torch.nn.DataParallel(model)
     model.to(device)
     # train
-    train(model, optimizer, device, train_loader, valid_loader, test_loader, n_epochs)
+    #train(model, optimizer, device, train_loader, valid_loader, test_loader, n_epochs)
     # validate
     validate(model, device, valid_loader, test_loader, is_test=False)
     # test
diff --git a/src/sdk/pynni/nni/nas/pytorch/base_mutator.py b/src/sdk/pynni/nni/nas/pytorch/base_mutator.py
index be169fae4a..0a9105e4a0 100644
--- a/src/sdk/pynni/nni/nas/pytorch/base_mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/base_mutator.py
@@ -54,6 +54,10 @@ def _parse_search_space(self, module, root=None, prefix="", memo=None, nested_de
     def mutables(self):
         return self._structured_mutables
 
+    @property
+    def undedup_mutables(self):
+        return self._structured_mutables.traverse(deduplicate=False)
+
     def forward(self, *inputs):
         raise RuntimeError("Forward is undefined for mutators.")
 
diff --git a/src/sdk/pynni/nni/nas/pytorch/fixed.py b/src/sdk/pynni/nni/nas/pytorch/fixed.py
index 6840097579..125e848fb2 100644
--- a/src/sdk/pynni/nni/nas/pytorch/fixed.py
+++ b/src/sdk/pynni/nni/nas/pytorch/fixed.py
@@ -77,6 +77,6 @@ def apply_fixed_architecture(model, fixed_arc_path, device=None):
             fixed_arc = json.load(f)
     fixed_arc = _encode_tensor(fixed_arc, device)
     architecture = FixedArchitecture(model, fixed_arc)
-    architecture.to(device)
+    #architecture.to(device)
     architecture.reset()
     return architecture
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 029ddd7fbb..59744a382b 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -244,7 +244,7 @@ def __init__(self, model):
         super(ProxylessNasMutator, self).__init__(model)
         self._unused_modules = None
         self.mutable_list = []
-        for _, mutable, _ in self.named_mutables(distinct=False):
+        for mutable in self.undedup_mutables:
             mo = MixedOp(mutable)
             self.mutable_list.append(mutable)
             mutable.registered_module = mo
@@ -274,7 +274,7 @@ def reset_binary_gates(self):
         """
         For each LayerChoice, binarize based on alpha to only activate one op
         """
-        for _, mutable, _ in self.named_mutables(distinct=False):
+        for mutable in self.undedup_mutables:
             mutable.registered_module.binarize(mutable)
 
     def set_chosen_op_active(self):
@@ -282,7 +282,7 @@ def set_chosen_op_active(self):
         For each LayerChoice, set the op with highest alpha as the chosen op
         Usually used for validation.
         """
-        for _, mutable, _ in self.named_mutables(distinct=False):
+        for mutable in self.undedup_mutables:
             mutable.registered_module.set_chosen_op_active()
 
     def num_arch_params(self):
@@ -297,14 +297,14 @@ def set_arch_param_grad(self):
         """
         For each LayerChoice, calculate gradients for architecture weights, i.e., alpha
         """
-        for _, mutable, _ in self.named_mutables(distinct=False):
+        for mutable in self.undedup_mutables:
             mutable.registered_module.set_arch_param_grad(mutable)
 
     def get_architecture_parameters(self):
         """
         Return architecture weights of each LayerChoice, for arch optimizer
         """
-        for _, mutable, _ in self.named_mutables(distinct=False):
+        for mutable in self.undedup_mutables:
             yield mutable.registered_module.get_AP_path_alpha()
 
     def change_forward_mode(self, mode):
@@ -314,12 +314,12 @@ def get_forward_mode(self):
         return MixedOp.forward_mode
 
     def rescale_updated_arch_param(self):
-        for _, mutable, _ in self.named_mutables(distinct=False):
+        for mutable in self.undedup_mutables:
             mutable.registered_module.rescale_updated_arch_param()
 
     def unused_modules_off(self):
         self._unused_modules = []
-        for _, mutable, _ in self.named_mutables(distinct=False):
+        for mutable in self.undedup_mutables:
             mixed_op = mutable.registered_module
             unused = {}
             if self.get_forward_mode() in ['full', 'two', 'full_v2']:
@@ -341,17 +341,17 @@ def unused_modules_back(self):
         self._unused_modules = None
 
     def arch_requires_grad(self):
-        for _, mutable, _ in self.named_mutables(distinct=False):
+        for mutable in self.undedup_mutables:
             mutable.registered_module.to_requires_grad()
 
     def arch_disable_grad(self):
-        for _, mutable, _ in self.named_mutables(distinct=False):
+        for mutable in self.undedup_mutables:
             mutable.registered_module.disable_grad()
 
     def sample_final(self):
         result = dict()
-        for _, mutable, _ in self.named_mutables(distinct=False):
+        for mutable in self.undedup_mutables:
             assert isinstance(mutable, LayerChoice)
             index, _ = mutable.registered_module.chosen_index
-            result[mutable.key] = F.one_hot(torch.tensor(index), num_classes=mutable.length).view(-1)#.bool()
+            result[mutable.key] = F.one_hot(torch.tensor(index), num_classes=mutable.length).view(-1).bool()
         return result
\ No newline at end of file
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 5391d82637..43274dc15c 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -9,20 +9,11 @@
 from torch import nn as nn
 
 from nni.nas.pytorch.base_trainer import BaseTrainer
-#from nni.nas.pytorch.trainer import TorchTensorEncoder
+from nni.nas.pytorch.trainer import TorchTensorEncoder
 from nni.nas.pytorch.utils import AverageMeter
 from .mutator import ProxylessNasMutator
 from .utils import cross_entropy_with_label_smoothing, accuracy
 
-class TorchTensorEncoder(json.JSONEncoder):
-    def default(self, o):  # pylint: disable=method-hidden
-        if isinstance(o, torch.Tensor):
-            olist = o.tolist()
-            if "bool" not in o.type().lower() and all(map(lambda d: d == 0 or d == 1, olist)):
-                print("Every element in %s is either 0 or 1. "
-                                "You might consider convert it into bool.", olist)
-            return olist
-        return super().default(o)
 
 class ProxylessNasTrainer(BaseTrainer):
     def __init__(self, model, model_optim, device,
@@ -411,3 +402,6 @@ def validate(self):
 
     def train_and_validate(self):
         raise NotImplementedError
+
+    def checkpoint(self):
+        raise NotImplementedError

From cb0c2e951eec3b12862e6bc1f44027ee1049d77d Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Wed, 18 Dec 2019 16:26:19 +0800
Subject: [PATCH 46/60] update

---
 examples/nas/proxylessnas/main.py             |  63 +++--
 examples/nas/proxylessnas/retrain.py          | 234 +++++++++---------
 .../nni/nas/pytorch/proxylessnas/mutator.py   |   6 +-
 .../nni/nas/pytorch/proxylessnas/trainer.py   |  54 ++--
 4 files changed, 178 insertions(+), 179 deletions(-)

diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py
index 6b601f261e..33351f30fe 100644
--- a/examples/nas/proxylessnas/main.py
+++ b/examples/nas/proxylessnas/main.py
@@ -2,17 +2,18 @@
 # Licensed under the MIT license.
 
 import os
+import sys
+import logging
 from argparse import ArgumentParser
-
-import datasets
 import torch
-import torch.nn as nn
+import datasets
 
 from putils import get_parameters
 from model import SearchMobileNet
 from nni.nas.pytorch.proxylessnas import ProxylessNasTrainer
-from retrain import retrain
+from retrain import Retrain
 
+logger = logging.getLogger('nni_proxylessnas')
 
 if __name__ == "__main__":
     parser = ArgumentParser("proxylessnas")
@@ -32,10 +33,18 @@
     parser.add_argument("--n_worker", default=32, type=int)
     parser.add_argument("--resize_scale", default=0.08, type=float)
     parser.add_argument("--distort_color", default='normal', type=str, choices=['normal', 'strong', 'None'])
-    # configurations for retain
-    parser.add_argument("--retrain", default=False, type=bool)
+    # configurations for training mode
+    parser.add_argument("--train_mode", default='search', type=str, choices=['search', 'retrain'])
+    # configurations for search
+    parser.add_argument("--checkpoint_path", default='./search_mobile_net.pt', type=str)
+    parser.add_argument("--arch_path", default='./arch_path.pt', type=str)
+    # configurations for retrain
     parser.add_argument("--exported_arch_path", default=None, type=str)
+
     args = parser.parse_args()
+    if args.train_mode == 'retrain' and args.exported_arch_path is None:
+        logger.error('When --train_mode is retrain, --exported_arch_path must be specified.')
+        sys.exit(-1)
 
     model = SearchMobileNet(width_stages=[int(i) for i in args.width_stages.split(',')],
                             n_cell_stages=[int(i) for i in args.n_cell_stages.split(',')],
@@ -43,9 +52,9 @@
                             n_classes=1000,
                             dropout_rate=args.dropout_rate,
                             bn_param=(args.bn_momentum, args.bn_eps))
-    print('=============================================SearchMobileNet model create done')
+    logger.info('SearchMobileNet model create done')
     model.init_model()
-    print('=============================================SearchMobileNet model init done')
+    logger.info('SearchMobileNet model init done')
 
     # move network to GPU if available
     if torch.cuda.is_available():
@@ -53,8 +62,7 @@
     else:
         device = torch.device('cpu')
 
-    # TODO: net info
-    print('=============================================Start to create data provider')
+    logger.info('Creating data provider...')
     data_provider = datasets.ImagenetDataProvider(save_path=args.data_path,
                                                   train_batch_size=args.train_batch_size,
                                                   test_batch_size=args.test_batch_size,
@@ -62,9 +70,7 @@
                                                   n_worker=args.n_worker,
                                                   resize_scale=args.resize_scale,
                                                   distort_color=args.distort_color)
-    print('=============================================Finish to create data provider')
-    train_loader = data_provider.train
-    valid_loader = data_provider.valid
+    logger.info('Creating data provider done')
 
     if args.no_decay_keys:
         keys = args.no_decay_keys
@@ -74,27 +80,30 @@
             {'params': get_parameters(model, keys, mode='include'), 'weight_decay': 0},
         ], lr=0.05, momentum=momentum, nesterov=nesterov)
     else:
-        optimizer = torch.optim.SGD(model, get_parameters(), lr=0.05, momentum=momentum, nesterov=nesterov, weight_decay=4e-5)
+        optimizer = torch.optim.SGD(get_parameters(model), lr=0.05, momentum=momentum, nesterov=nesterov, weight_decay=4e-5)
 
-    if not args.retrain:
+    if args.train_mode == 'search':
         # this is architecture search
-        print('=============================================Start to create ProxylessNasTrainer')
+        logger.info('Creating ProxylessNasTrainer...')
         trainer = ProxylessNasTrainer(model,
-                                    model_optim=optimizer,
-                                    train_loader=train_loader,
-                                    valid_loader=valid_loader,
-                                    device=device,
-                                    warmup=True,
-                                    ckpt_path='./search_mobile_net.pt',
-                                    arch_path='./arch_path.pt')
+                                      model_optim=optimizer,
+                                      train_loader=data_provider.train,
+                                      valid_loader=data_provider.valid,
+                                      device=device,
+                                      warmup=True,
+                                      ckpt_path=args.checkpoint_path,
+                                      arch_path=args.arch_path)
 
-        print('=============================================Start to train ProxylessNasTrainer')
+        logger.info('Start to train with ProxylessNasTrainer...')
         trainer.train()
-        trainer.export()
-    else:
+        logger.info('Training done')
+        trainer.export(args.arch_path)
+        logger.info('Best architecture exported in %s', args.arch_path)
+    elif args.train_mode == 'retrain':
         # this is retrain
         from nni.nas.pytorch.fixed import apply_fixed_architecture
         assert os.path.isfile(args.exported_arch_path), \
             "exported_arch_path {} should be a file.".format(args.exported_arch_path)
         apply_fixed_architecture(model, args.exported_arch_path, device=device)
-        retrain(model, optimizer, device, data_provider, n_epochs=300)
+        trainer = Retrain(model, optimizer, device, data_provider, n_epochs=300)
+        trainer.run()
\ No newline at end of file
diff --git a/examples/nas/proxylessnas/retrain.py b/examples/nas/proxylessnas/retrain.py
index ef84b6634a..d501fbf53d 100644
--- a/examples/nas/proxylessnas/retrain.py
+++ b/examples/nas/proxylessnas/retrain.py
@@ -8,8 +8,6 @@
 from torch import nn as nn
 from nni.nas.pytorch.utils import AverageMeter
 
-criterion = nn.CrossEntropyLoss()
-
 def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1):
     logsoftmax = nn.LogSoftmax()
     n_classes = pred.size(1)
@@ -35,145 +33,153 @@ def accuracy(output, target, topk=(1,)):
         res.append(correct_k.mul_(100.0 / batch_size))
     return res
 
-def validate(model, device, valid_loader, test_loader, is_test=True):
-    if is_test:
-        data_loader = test_loader
-    else:
-        data_loader = valid_loader
-    model.eval()
-    batch_time = AverageMeter('batch_time')
-    losses = AverageMeter('losses')
-    top1 = AverageMeter('top1')
-    top5 = AverageMeter('top5')
-
-    end = time.time()
-    with torch.no_grad():
-        for i, (images, labels) in enumerate(data_loader):
-            images, labels = images.to(device), labels.to(device)
-            # compute output
-            output = model(images)
-            loss = criterion(output, labels)
-            # measure accuracy and record loss
-            acc1, acc5 = accuracy(output, labels, topk=(1, 5))
-            losses.update(loss, images.size(0))
-            top1.update(acc1[0], images.size(0))
-            top5.update(acc5[0], images.size(0))
-            # measure elapsed time
-            batch_time.update(time.time() - end)
-            end = time.time()
 
-            if i % 10 == 0 or i + 1 == len(data_loader):
-                if is_test:
-                    prefix = 'Test'
-                else:
-                    prefix = 'Valid'
-                test_log = prefix + ': [{0}/{1}]\t'\
-                                    'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'\
-                                    'Loss {loss.val:.4f} ({loss.avg:.4f})\t'\
-                                    'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})'.\
-                    format(i, len(data_loader) - 1, batch_time=batch_time, loss=losses, top1=top1)
-                test_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5)
-                print(test_log)
-    return losses.avg, top1.avg, top5.avg
-
-def train_one_epoch(model, train_loader, device, optimizer, adjust_lr_func, train_log_func, label_smoothing=0.1):
+class Retrain:
+    def __init__(self, model, optimizer, device, data_provider, n_epochs):
+        self.model = model
+        self.optimizer = optimizer
+        self.device = device
+        self.train_loader = data_provider.train
+        self.valid_loader = data_provider.valid
+        self.test_loader = data_provider.test
+        self.criterion = nn.CrossEntropyLoss()
+
+    def run(self):
+        self.model = torch.nn.DataParallel(self.model)
+        self.model.to(self.device)
+        # train
+        self.train()
+        # validate
+        self.validate(is_test=False)
+        # test
+        self.validate(is_test=True)
+
+    def train_one_epoch(self, adjust_lr_func, train_log_func, label_smoothing=0.1):
         batch_time = AverageMeter('batch_time')
         data_time = AverageMeter('data_time')
         losses = AverageMeter('losses')
         top1 = AverageMeter('top1')
         top5 = AverageMeter('top5')
-        model.train()
+        self.model.train()
         end = time.time()
-        for i, (images, labels) in enumerate(train_loader):
+        for i, (images, labels) in enumerate(self.train_loader):
             data_time.update(time.time() - end)
             new_lr = adjust_lr_func(i)
-            images, labels = images.to(device), labels.to(device)
-            output = model(images)
+            images, labels = images.to(self.device), labels.to(self.device)
+            output = self.model(images)
             if label_smoothing > 0:
                 loss = cross_entropy_with_label_smoothing(output, labels, label_smoothing)
             else:
-                loss = criterion(output, labels)
+                loss = self.criterion(output, labels)
             acc1, acc5 = accuracy(output, labels, topk=(1, 5))
             losses.update(loss, images.size(0))
             top1.update(acc1[0], images.size(0))
             top5.update(acc5[0], images.size(0))
 
             # compute gradient and do SGD step
-            model.zero_grad()  # or self.optimizer.zero_grad()
+            self.model.zero_grad()  # or self.optimizer.zero_grad()
             loss.backward()
-            optimizer.step()
+            self.optimizer.step()
 
             # measure elapsed time
             batch_time.update(time.time() - end)
             end = time.time()
 
-            if i % 10 == 0 or i + 1 == len(train_loader):
+            if i % 10 == 0 or i + 1 == len(self.train_loader):
                 batch_log = train_log_func(i, batch_time, data_time, losses, top1, top5, new_lr)
                 print(batch_log)
         return top1, top5
 
-def train(model, optimizer, device, train_loader, valid_loader, test_loader, n_epochs, validation_frequency=1):
-    best_acc = 0
-    nBatch = len(train_loader)
-
-    def train_log_func(epoch_, i, batch_time, data_time, losses, top1, top5, lr):
-            batch_log = 'Train [{0}][{1}/{2}]\t' \
-                        'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' \
-                        'Data {data_time.val:.3f} ({data_time.avg:.3f})\t' \
-                        'Loss {losses.val:.4f} ({losses.avg:.4f})\t' \
-                        'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})'. \
-                format(epoch_ + 1, i, nBatch - 1,
-                       batch_time=batch_time, data_time=data_time, losses=losses, top1=top1)
-            batch_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5)
-            batch_log += '\tlr {lr:.5f}'.format(lr=lr)
-            return batch_log
-    
-    def adjust_learning_rate(n_epochs, optimizer, epoch, batch=0, nBatch=None):
-        """ adjust learning of a given optimizer and return the new learning rate """
-        # cosine
-        T_total = n_epochs * nBatch
-        T_cur = epoch * nBatch + batch
-        # init_lr = 0.05
-        new_lr = 0.5 * 0.05 * (1 + math.cos(math.pi * T_cur / T_total))
-        for param_group in optimizer.param_groups:
-            param_group['lr'] = new_lr
-        return new_lr
-
-    for epoch in range(n_epochs):
-        print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n')
-        end = time.time()
-        train_top1, train_top5 = train_one_epoch(model, train_loader, device, optimizer,
-            lambda i: adjust_learning_rate(n_epochs, optimizer, epoch, i, nBatch),
-            lambda i, batch_time, data_time, losses, top1, top5, new_lr:
-            train_log_func(epoch, i, batch_time, data_time, losses, top1, top5, new_lr),
-        )
-        time_per_epoch = time.time() - end
-        seconds_left = int((n_epochs - epoch - 1) * time_per_epoch)
-        print('Time per epoch: %s, Est. complete in: %s' % (
-            str(timedelta(seconds=time_per_epoch)),
-            str(timedelta(seconds=seconds_left))))
+    def train(self, validation_frequency=1):
+        best_acc = 0
+        nBatch = len(self.train_loader)
+
+        def train_log_func(epoch_, i, batch_time, data_time, losses, top1, top5, lr):
+                batch_log = 'Train [{0}][{1}/{2}]\t' \
+                            'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' \
+                            'Data {data_time.val:.3f} ({data_time.avg:.3f})\t' \
+                            'Loss {losses.val:.4f} ({losses.avg:.4f})\t' \
+                            'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})'. \
+                    format(epoch_ + 1, i, nBatch - 1,
+                        batch_time=batch_time, data_time=data_time, losses=losses, top1=top1)
+                batch_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5)
+                batch_log += '\tlr {lr:.5f}'.format(lr=lr)
+                return batch_log
         
-        if (epoch + 1) % validation_frequency == 0:
-            val_loss, val_acc, val_acc5 = validate(model, device, valid_loader, test_loader, is_test=False)
-            is_best = val_acc > best_acc
-            best_acc = max(best_acc, val_acc)
-            val_log = 'Valid [{0}/{1}]\tloss {2:.3f}\ttop-1 acc {3:.3f} ({4:.3f})'.\
-                format(epoch + 1, n_epochs, val_loss, val_acc, best_acc)
-            val_log += '\ttop-5 acc {0:.3f}\tTrain top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}'.\
-                format(val_acc5, top1=train_top1, top5=train_top5)
-            print(val_log)
+        def adjust_learning_rate(n_epochs, optimizer, epoch, batch=0, nBatch=None):
+            """ adjust learning of a given optimizer and return the new learning rate """
+            # cosine
+            T_total = n_epochs * nBatch
+            T_cur = epoch * nBatch + batch
+            # init_lr = 0.05
+            new_lr = 0.5 * 0.05 * (1 + math.cos(math.pi * T_cur / T_total))
+            for param_group in optimizer.param_groups:
+                param_group['lr'] = new_lr
+            return new_lr
+
+        for epoch in range(self.n_epochs):
+            print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n')
+            end = time.time()
+            train_top1, train_top5 = self.train_one_epoch(
+                lambda i: adjust_learning_rate(self.n_epochs, self.optimizer, epoch, i, nBatch),
+                lambda i, batch_time, data_time, losses, top1, top5, new_lr:
+                train_log_func(epoch, i, batch_time, data_time, losses, top1, top5, new_lr),
+            )
+            time_per_epoch = time.time() - end
+            seconds_left = int((self.n_epochs - epoch - 1) * time_per_epoch)
+            print('Time per epoch: %s, Est. complete in: %s' % (
+                str(timedelta(seconds=time_per_epoch)),
+                str(timedelta(seconds=seconds_left))))
+            
+            if (epoch + 1) % validation_frequency == 0:
+                val_loss, val_acc, val_acc5 = self.validate(is_test=False)
+                is_best = val_acc > best_acc
+                best_acc = max(best_acc, val_acc)
+                val_log = 'Valid [{0}/{1}]\tloss {2:.3f}\ttop-1 acc {3:.3f} ({4:.3f})'.\
+                    format(epoch + 1, self.n_epochs, val_loss, val_acc, best_acc)
+                val_log += '\ttop-5 acc {0:.3f}\tTrain top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}'.\
+                    format(val_acc5, top1=train_top1, top5=train_top5)
+                print(val_log)
+            else:
+                is_best = False
+
+    def validate(self, is_test=True):
+        if is_test:
+            data_loader = self.test_loader
         else:
-            is_best = False
-
-def retrain(model, optimizer, device, data_provider, n_epochs):
-    train_loader = data_provider.train
-    valid_loader = data_provider.valid
-    test_loader = data_provider.test
-    model = torch.nn.DataParallel(model)
-    model.to(device)
-    # train
-    #train(model, optimizer, device, train_loader, valid_loader, test_loader, n_epochs)
-    # validate
-    validate(model, device, valid_loader, test_loader, is_test=False)
-    # test
-    validate(model, device, valid_loader, test_loader, is_test=True)
\ No newline at end of file
+            data_loader = self.valid_loader
+        self.model.eval()
+        batch_time = AverageMeter('batch_time')
+        losses = AverageMeter('losses')
+        top1 = AverageMeter('top1')
+        top5 = AverageMeter('top5')
+
+        end = time.time()
+        with torch.no_grad():
+            for i, (images, labels) in enumerate(data_loader):
+                images, labels = images.to(self.device), labels.to(self.device)
+                # compute output
+                output = self.model(images)
+                loss = self.criterion(output, labels)
+                # measure accuracy and record loss
+                acc1, acc5 = accuracy(output, labels, topk=(1, 5))
+                losses.update(loss, images.size(0))
+                top1.update(acc1[0], images.size(0))
+                top5.update(acc5[0], images.size(0))
+                # measure elapsed time
+                batch_time.update(time.time() - end)
+                end = time.time()
+
+                if i % 10 == 0 or i + 1 == len(data_loader):
+                    if is_test:
+                        prefix = 'Test'
+                    else:
+                        prefix = 'Valid'
+                    test_log = prefix + ': [{0}/{1}]\t'\
+                                        'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'\
+                                        'Loss {loss.val:.4f} ({loss.avg:.4f})\t'\
+                                        'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})'.\
+                        format(i, len(data_loader) - 1, batch_time=batch_time, loss=losses, top1=top1)
+                    test_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5)
+                    print(test_log)
+        return losses.avg, top1.avg, top5.avg
\ No newline at end of file
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 59744a382b..a287b1deed 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -1,14 +1,15 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
 
+import math
 import torch
 from torch import nn as nn
 from torch.nn import functional as F
 import numpy as np
 
-from .utils import detach_variable
 from nni.nas.pytorch.base_mutator import BaseMutator
 from nni.nas.pytorch.mutables import LayerChoice
+from .utils import detach_variable
 
 class ArchGradientFunction(torch.autograd.Function):
 
@@ -245,9 +246,8 @@ def __init__(self, model):
         self._unused_modules = None
         self.mutable_list = []
         for mutable in self.undedup_mutables:
-            mo = MixedOp(mutable)
             self.mutable_list.append(mutable)
-            mutable.registered_module = mo
+            mutable.registered_module = MixedOp(mutable)
 
     def on_forward_layer_choice(self, mutable, *inputs):
         """
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 43274dc15c..ac9e3cb5ed 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -4,6 +4,7 @@
 import math
 import time
 import json
+import logging
 
 import torch
 from torch import nn as nn
@@ -14,6 +15,7 @@
 from .mutator import ProxylessNasMutator
 from .utils import cross_entropy_with_label_smoothing, accuracy
 
+logger = logging.getLogger(__name__)
 
 class ProxylessNasTrainer(BaseTrainer):
     def __init__(self, model, model_optim, device,
@@ -141,7 +143,7 @@ def _validate(self):
                         format(i, len(self.valid_loader) - 1, batch_time=batch_time, loss=losses, top1=top1)
                     # return top5:
                     test_log += '\tTop-5 acc {top5.val:.3f} ({top5.avg:.3f})'.format(top5=top5)
-                    print(test_log)
+                    logger.info(test_log)
         self.mutator.unused_modules_back()
         return losses.avg, top1.avg, top5.avg
 
@@ -152,7 +154,7 @@ def _warm_up(self):
         T_total = self.warmup_epochs * nBatch # total num of batches
 
         for epoch in range(self.warmup_curr_epoch, self.warmup_epochs):
-            print('\n', '-' * 30, 'Warmup epoch: %d' % (epoch + 1), '-' * 30, '\n')
+            logger.info('\n--------Warmup epoch: %d--------\n', epoch + 1)
             batch_time = AverageMeter('batch_time')
             data_time = AverageMeter('data_time')
             losses = AverageMeter('losses')
@@ -162,9 +164,8 @@ def _warm_up(self):
             self.model.train()
 
             end = time.time()
-            print('=====================_warm_up, epoch: ', epoch)
+            logger.info('warm_up epoch: %d', epoch)
             for i, (images, labels) in enumerate(data_loader):
-                #print('=====================_warm_up, minibatch i: ', i)
                 data_time.update(time.time() - end)
                 # lr
                 T_cur = epoch * nBatch + i
@@ -174,8 +175,7 @@ def _warm_up(self):
                 images, labels = images.to(self.device), labels.to(self.device)
                 # compute output
                 self.mutator.reset_binary_gates() # random sample binary gates
-                # remove unused module for speedup
-                self.mutator.unused_modules_off()
+                self.mutator.unused_modules_off() # remove unused module for speedup
                 output = self.model(images)
                 if self.label_smoothing > 0:
                     loss = cross_entropy_with_label_smoothing(output, labels, self.label_smoothing)
@@ -205,14 +205,13 @@ def _warm_up(self):
                                 'Top-5 acc {top5.val:.3f} ({top5.avg:.3f})\tlr {lr:.5f}'. \
                         format(epoch + 1, i, nBatch - 1, batch_time=batch_time, data_time=data_time,
                                losses=losses, top1=top1, top5=top5, lr=warmup_lr)
-                    print(batch_log)
-                self.save_checkpoint()
+                    logger.info(batch_log)
             val_loss, val_top1, val_top5 = self._validate()
             val_log = 'Warmup Valid [{0}/{1}]\tloss {2:.3f}\ttop-1 acc {3:.3f}\ttop-5 acc {4:.3f}\t' \
                       'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}M'. \
                 format(epoch + 1, self.warmup_epochs, val_loss, val_top1, val_top5, top1=top1, top5=top5)
-            print(val_log)
-            #self.save_checkpoint()
+            logger.info(val_log)
+            self.save_checkpoint()
             self.warmup_curr_epoch += 1
 
     def _get_update_schedule(self, nBatch):
@@ -241,22 +240,17 @@ def _train(self):
         nBatch = len(self.train_loader)
         arch_param_num = self.mutator.num_arch_params()
         binary_gates_num = self.mutator.num_arch_params()
-        #weight_param_num = len(list(self.net.weight_parameters()))
-        print(
-            '#arch_params: %d\t#binary_gates: %d\t#weight_params: xx' %
-            (arch_param_num, binary_gates_num)
-        )
+        logger.info('#arch_params: %d\t#binary_gates: %d', arch_param_num, binary_gates_num)
 
         update_schedule = self._get_update_schedule(nBatch)
 
         for epoch in range(self.train_curr_epoch, self.n_epochs):
-            print('\n', '-' * 30, 'Train epoch: %d' % (epoch + 1), '-' * 30, '\n')
+            logger.info('\n--------Train epoch: %d--------\n', epoch + 1)
             batch_time = AverageMeter('batch_time')
             data_time = AverageMeter('data_time')
             losses = AverageMeter('losses')
             top1 = AverageMeter('top1')
             top5 = AverageMeter('top5')
-            entropy = AverageMeter('entropy')
             # switch to train mode
             self.model.train()
 
@@ -264,9 +258,6 @@ def _train(self):
             for i, (images, labels) in enumerate(self.train_loader):
                 data_time.update(time.time() - end)
                 lr = self._adjust_learning_rate(self.model_optim, epoch, batch=i, nBatch=nBatch)
-                # network entropy
-                #net_entropy = self.mutator.entropy()
-                #entropy.update(net_entropy.data.item() / arch_param_num, 1)
                 # train weight parameters
                 images, labels = images.to(self.device), labels.to(self.device)
                 self.mutator.reset_binary_gates()
@@ -294,7 +285,7 @@ def _train(self):
                         used_time = time.time() - start_time
                         log_str = 'Architecture [%d-%d]\t Time %.4f\t Loss %.4f\t null %s' % \
                                     (epoch + 1, i, used_time, arch_loss, exp_value)
-                        print(log_str)
+                        logger.info(log_str)
                 batch_time.update(time.time() - end)
                 end = time.time()
                 # training log
@@ -303,25 +294,21 @@ def _train(self):
                                 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' \
                                 'Data Time {data_time.val:.3f} ({data_time.avg:.3f})\t' \
                                 'Loss {losses.val:.4f} ({losses.avg:.4f})\t' \
-                                'Entropy {entropy.val:.5f} ({entropy.avg:.5f})\t' \
                                 'Top-1 acc {top1.val:.3f} ({top1.avg:.3f})\t' \
                                 'Top-5 acc {top5.val:.3f} ({top5.avg:.3f})\tlr {lr:.5f}'. \
                         format(epoch + 1, i, nBatch - 1, batch_time=batch_time, data_time=data_time,
-                               losses=losses, entropy=entropy, top1=top1, top5=top5, lr=lr)
-                    print(batch_log)
+                               losses=losses, top1=top1, top5=top5, lr=lr)
+                    logger.info(batch_log)
             # TODO: print current network architecture
             # validate
             if (epoch + 1) % self.arch_valid_frequency == 0:
                 val_loss, val_top1, val_top5 = self._validate()
                 val_log = 'Valid [{0}]\tloss {1:.3f}\ttop-1 acc {2:.3f} \ttop-5 acc {3:.3f}\t' \
-                          'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}\t' \
-                          'Entropy {entropy.val:.5f}M'. \
-                    format(epoch + 1, val_loss, val_top1,
-                           val_top5, entropy=entropy, top1=top1, top5=top5)
-                print(val_log)
+                          'Train top-1 {top1.avg:.3f}\ttop-5 {top5.avg:.3f}'. \
+                    format(epoch + 1, val_loss, val_top1, val_top5, top1=top1, top5=top5)
+                logger.info(val_log)
             self.save_checkpoint()
             self.train_curr_epoch += 1
-        # convert to normal network according to architecture parameters
 
     def _valid_next_batch(self):
         if self._valid_iter is None:
@@ -360,7 +347,7 @@ def _gradient_step(self):
         self.mutator.unused_modules_back()
         self.mutator.change_forward_mode(None)
         time4 = time.time()
-        print('(%.4f, %.4f, %.4f)' % (time2 - time1, time3 - time2, time4 - time3))
+        logger.info('(%.4f, %.4f, %.4f)', time2 - time1, time3 - time2, time4 - time3)
         return loss.data.item(), expected_value.item() if expected_value is not None else None
 
     def save_checkpoint(self):
@@ -387,7 +374,7 @@ def load_checkpoint(self):
 
     def train(self):
         if self.load_ckpt:
-            load_checkpoint()
+            self.load_checkpoint()
         if self.warmup:
             self._warm_up()
         self._train()
@@ -400,8 +387,5 @@ def export(self, file_name):
     def validate(self):
         raise NotImplementedError
 
-    def train_and_validate(self):
-        raise NotImplementedError
-
     def checkpoint(self):
         raise NotImplementedError

From 38fab2d881c1e8b209e2cd12cac3bbd3674da5eb Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Wed, 18 Dec 2019 20:41:04 +0800
Subject: [PATCH 47/60] update

---
 examples/nas/proxylessnas/retrain.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/examples/nas/proxylessnas/retrain.py b/examples/nas/proxylessnas/retrain.py
index d501fbf53d..5fc707103c 100644
--- a/examples/nas/proxylessnas/retrain.py
+++ b/examples/nas/proxylessnas/retrain.py
@@ -42,6 +42,7 @@ def __init__(self, model, optimizer, device, data_provider, n_epochs):
         self.train_loader = data_provider.train
         self.valid_loader = data_provider.valid
         self.test_loader = data_provider.test
+        self.n_epochs = n_epochs
         self.criterion = nn.CrossEntropyLoss()
 
     def run(self):

From eab6e224676d1c82fcfff06c35a6a1fcbddab08f Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Thu, 19 Dec 2019 13:43:38 +0800
Subject: [PATCH 48/60] update

---
 .../nni/nas/pytorch/proxylessnas/utils.py     | 60 +++++++++++++++++++
 1 file changed, 60 insertions(+)
 create mode 100644 src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py
new file mode 100644
index 0000000000..bfedfe56d6
--- /dev/null
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py
@@ -0,0 +1,60 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT license.
+
+import torch
+from torch import nn as nn
+
+def detach_variable(inputs):
+    if isinstance(inputs, tuple):
+        return tuple([detach_variable(x) for x in inputs])
+    else:
+        x = inputs.detach()
+        x.requires_grad = inputs.requires_grad
+        return x
+
+def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1):
+    """
+    Parameters
+    ----------
+    pred :
+    target :
+    label_smoothing :
+
+    Returns
+    -------
+    """
+    logsoftmax = nn.LogSoftmax()
+    n_classes = pred.size(1)
+    # convert to one-hot
+    target = torch.unsqueeze(target, 1)
+    soft_target = torch.zeros_like(pred)
+    soft_target.scatter_(1, target, 1)
+    # label smoothing
+    soft_target = soft_target * (1 - label_smoothing) + label_smoothing / n_classes
+    return torch.mean(torch.sum(- soft_target * logsoftmax(pred), 1))
+
+def accuracy(output, target, topk=(1,)):
+    """
+    Computes the precision@k for the specified values of k
+
+    Parameters
+    ----------
+    output :
+    target :
+    topk :
+
+    Returns
+    -------
+    """
+    maxk = max(topk)
+    batch_size = target.size(0)
+
+    _, pred = output.topk(maxk, 1, True, True)
+    pred = pred.t()
+    correct = pred.eq(target.view(1, -1).expand_as(pred))
+
+    res = []
+    for k in topk:
+        correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
+        res.append(correct_k.mul_(100.0 / batch_size))
+    return res
\ No newline at end of file

From a7f59f02436a0cfce46ade393cd1307842c734d0 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Thu, 19 Dec 2019 13:45:47 +0800
Subject: [PATCH 49/60] update

---
 src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index a287b1deed..cbbcc39dd2 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -353,5 +353,6 @@ def sample_final(self):
         for mutable in self.undedup_mutables:
             assert isinstance(mutable, LayerChoice)
             index, _ = mutable.registered_module.chosen_index
+            # pylint: disable=not-callable
             result[mutable.key] = F.one_hot(torch.tensor(index), num_classes=mutable.length).view(-1).bool()
         return result
\ No newline at end of file

From 8ef5f6de8c3a8cf55c806bd89c3078e00aaa5efe Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Sun, 22 Dec 2019 12:11:22 +0800
Subject: [PATCH 50/60] add doc string

---
 .../nni/nas/pytorch/proxylessnas/mutator.py   | 155 +++++++++++++++---
 1 file changed, 134 insertions(+), 21 deletions(-)

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index cbbcc39dd2..2934e08d39 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -12,7 +12,6 @@
 from .utils import detach_variable
 
 class ArchGradientFunction(torch.autograd.Function):
-
     @staticmethod
     def forward(ctx, x, binary_gates, run_func, backward_func):
         ctx.run_func = run_func
@@ -36,8 +35,13 @@ def backward(ctx, grad_output):
 
 class MixedOp(nn.Module):
     """
-    This class is to instantiate and manage info of one LayerChoice
+    This class is to instantiate and manage info of one LayerChoice.
+    It includes architecture weights, binary weights, and member functions
+    operating the weights.
     """
+    # forward/backward mode for LayerChoice: None, two, full, and full_v2.
+    # For training architecture weights, we use full_v2 by default, and for training
+    # model weights, we use None.
     forward_mode = None
     def __init__(self, mutable):
         """
@@ -64,11 +68,26 @@ def to_requires_grad(self):
         self.AP_path_alpha.requires_grad = True
         self.AP_path_wb.requires_grad = True
 
-    def disable_grad(self):
+    def to_disable_grad(self):
         self.AP_path_alpha.requires_grad = False
         self.AP_path_wb.requires_grad = False
 
     def forward(self, mutable, x):
+        """
+        Define forward of LayerChoice. For 'full_v2', backward is also defined.
+
+        Parameters
+        ----------
+        mutable : LayerChoice
+            this layer's mutable
+        x : tensor
+            inputs of this layer, only support one input
+
+        Returns
+        -------
+        output: tensor
+            output of this layer
+        """
         if MixedOp.forward_mode == 'full' or MixedOp.forward_mode == 'two':
             output = 0
             for _i in self.active_index:
@@ -78,7 +97,6 @@ def forward(self, mutable, x):
                 oi = self.candidate_ops[_i](x)
                 output = output + self.AP_path_wb[_i] * oi.detach()
         elif MixedOp.forward_mode == 'full_v2':
-            # does not work in DataParallel, possible memory leak
             def run_function(key, candidate_ops, active_id):
                 def forward(_x):
                     return candidate_ops[active_id](_x)
@@ -119,22 +137,47 @@ def probs_over_ops(self):
 
     @property
     def chosen_index(self):
-        """ choose the max one """
+        """
+        choose the op with max prob
+
+        Returns
+        -------
+        int
+            index of the chosen one
+        numpy.float32
+            prob of the chosen one
+        """
         probs = self.probs_over_ops.data.cpu().numpy()
         index = int(np.argmax(probs))
         return index, probs[index]
 
     def active_op(self, mutable):
-        """ assume only one path is active """
+        """
+        assume only one path is active
+
+        Returns
+        -------
+        PyTorch module
+            the chosen operation
+        """
         return mutable.choices[self.active_index[0]]
 
     @property
     def active_op_index(self):
-        """ return active op's index """
+        """
+        return active op's index, the active op is sampled
+
+        Returns
+        -------
+        int
+            index of the active op
+        """
         return self.active_index[0]
 
     def set_chosen_op_active(self):
-        """ set chosen index, active and inactive indexes """
+        """
+        set chosen index, active and inactive indexes
+        """
         chosen_idx, _ = self.chosen_index
         self.active_index = [chosen_idx]
         self.inactive_index = [_i for _i in range(0, chosen_idx)] + \
@@ -142,7 +185,13 @@ def set_chosen_op_active(self):
 
     def binarize(self, mutable):
         """
-        Sample based on alpha, and set binary weights accordingly
+        Sample based on alpha, and set binary weights accordingly.
+        AP_path_wb is set in this function, which is called binarize.
+
+        Parameters
+        ----------
+        mutable : LayerChoice
+            this layer's mutable
         """
         self.log_prob = None
         # reset binary gates
@@ -186,7 +235,8 @@ def _delta_ij(self, i, j):
 
     def set_arch_param_grad(self, mutable):
         """
-        Calculate alpha gradient for this LayerChoice
+        Calculate alpha gradient for this LayerChoice.
+        It is calculated using gradient of binary gate, probs of ops.
         """
         binary_grads = self.AP_path_wb.grad.data
         if self.active_op(mutable).is_zero_layer():
@@ -217,6 +267,9 @@ def set_arch_param_grad(self, mutable):
         return
 
     def rescale_updated_arch_param(self):
+        """
+        rescale architecture weights for the 'two' mode.
+        """
         if not isinstance(self.active_index[0], tuple):
             assert self.active_op.is_zero_layer()
             return
@@ -233,9 +286,19 @@ def rescale_updated_arch_param(self):
 
 
 class ProxylessNasMutator(BaseMutator):
+    """
+    This mutator initializes and operates all the LayerChoices of the input model.
+    It is for the corresponding trainer to control the training process of LayerChoices,
+    coordinating with whole training process.
+    """
     def __init__(self, model):
         """
-        Init a MixedOp instance for each named mutable i.e., LayerChoice
+        Init a MixedOp instance for each mutable i.e., LayerChoice.
+        And register the instantiated MixedOp in corresponding LayerChoice.
+        If does not register it in LayerChoice, DataParallel does not work then,
+        because architecture weights are not included in the DataParallel model.
+        When MixedOPs are registered, we use ```requires_grad``` to control
+        whether calculate gradients of architecture weights.
 
         Parameters
         ----------
@@ -251,20 +314,23 @@ def __init__(self, model):
 
     def on_forward_layer_choice(self, mutable, *inputs):
         """
-        Callback of layer choice forward. Override if you are an advanced user.
-        On default, this method calls :meth:`on_calc_layer_choice_mask` to get a mask on how to choose between layers
-        (either by switch or by weights), then it will reduce the list of all tensor outputs with the policy speicified
-        in `mutable.reduction`. It will also cache the mask with corresponding `mutable.key`.
+        Callback of layer choice forward. This function defines the forward
+        logic of the input mutable. So mutable is only interface, its real
+        implementation is defined in mutator.
 
         Parameters
         ----------
         mutable: LayerChoice
+            forward logic of this input mutable
         inputs: list of torch.Tensor
+            inputs of this mutable
 
         Returns
         -------
         torch.Tensor
-        index of the chosen op
+            output of this mutable, i.e., LayerChoice
+        int
+            index of the chosen op
         """
         # FIXME: return mask, to be consistent with other algorithms
         idx = mutable.registered_module.active_op_index
@@ -272,14 +338,16 @@ def on_forward_layer_choice(self, mutable, *inputs):
 
     def reset_binary_gates(self):
         """
-        For each LayerChoice, binarize based on alpha to only activate one op
+        For each LayerChoice, binarize binary weights
+        based on alpha to only activate one op.
+        It traverses all the mutables in the model to do this.
         """
         for mutable in self.undedup_mutables:
             mutable.registered_module.binarize(mutable)
 
     def set_chosen_op_active(self):
         """
-        For each LayerChoice, set the op with highest alpha as the chosen op
+        For each LayerChoice, set the op with highest alpha as the chosen op.
         Usually used for validation.
         """
         for mutable in self.undedup_mutables:
@@ -287,9 +355,12 @@ def set_chosen_op_active(self):
 
     def num_arch_params(self):
         """
+        The number of mutables, i.e., LayerChoice
+
         Returns
         -------
-        The number of LayerChoice in user model
+        int
+            the number of LayerChoice in user model
         """
         return len(self.mutable_list)
 
@@ -302,22 +373,46 @@ def set_arch_param_grad(self):
 
     def get_architecture_parameters(self):
         """
-        Return architecture weights of each LayerChoice, for arch optimizer
+        Get all the architecture parameters.
+
+        yield
+        -----
+        PyTorch Parameter
+            Return AP_path_alpha of the traversed mutable
         """
         for mutable in self.undedup_mutables:
             yield mutable.registered_module.get_AP_path_alpha()
 
     def change_forward_mode(self, mode):
+        """
+        Update forward mode of MixedOps, as training architecture weights and
+        model weights use different forward modes.
+        """
         MixedOp.forward_mode = mode
 
     def get_forward_mode(self):
+        """
+        Get forward mode of MixedOp
+
+        Returns
+        -------
+        string
+            the current forward mode of MixedOp
+        """
         return MixedOp.forward_mode
 
     def rescale_updated_arch_param(self):
+        """
+        Rescale architecture weights in 'two' mode.
+        """
         for mutable in self.undedup_mutables:
             mutable.registered_module.rescale_updated_arch_param()
 
     def unused_modules_off(self):
+        """
+        Remove unused modules for each mutables.
+        The removed modules are kept in ```self._unused_modules``` for resume later.
+        """
         self._unused_modules = []
         for mutable in self.undedup_mutables:
             mixed_op = mutable.registered_module
@@ -333,6 +428,9 @@ def unused_modules_off(self):
             self._unused_modules.append(unused)
 
     def unused_modules_back(self):
+        """
+        Resume the removed modules back.
+        """
         if self._unused_modules is None:
             return
         for m, unused in zip(self.mutable_list, self._unused_modules):
@@ -341,14 +439,29 @@ def unused_modules_back(self):
         self._unused_modules = None
 
     def arch_requires_grad(self):
+        """
+        Make architecture weights require gradient
+        """
         for mutable in self.undedup_mutables:
             mutable.registered_module.to_requires_grad()
 
     def arch_disable_grad(self):
+        """
+        Disable gradient of architecture weights, i.e., does not
+        calcuate gradient for them.
+        """
         for mutable in self.undedup_mutables:
-            mutable.registered_module.disable_grad()
+            mutable.registered_module.to_disable_grad()
 
     def sample_final(self):
+        """
+        Generate the final chosen architecture.
+
+        Returns
+        -------
+        dict
+            the choice of each mutable, i.e., LayerChoice
+        """
         result = dict()
         for mutable in self.undedup_mutables:
             assert isinstance(mutable, LayerChoice)

From 477af83f709447b849f008e1d9964c9f65ad0f81 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Sun, 22 Dec 2019 12:20:25 +0800
Subject: [PATCH 51/60] update

---
 .../nni/nas/pytorch/proxylessnas/trainer.py     | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index ac9e3cb5ed..e1cf13021d 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -31,13 +31,23 @@ def __init__(self, model, model_optim, device,
         Parameters
         ----------
         model : pytorch model
+            the user model, which has mutables
         model_optim : pytorch optimizer
+            the user defined optimizer
+        device : pytorch device
+            the devices to train/search the model
         train_loader : pytorch data loader
+            data loader for the training set
         valid_loader : pytorch data loader
-        device : device
+            data loader for the validation set
+        label_smoothing : float
+            for label smoothing
         n_epochs : int
+            number of epochs to train/search
         init_lr : float
             init learning rate for training the model
+        binary_mode : str
+            the forward/backward mode for the binary weights in mutator
         arch_init_type : str
             the way to init architecture parameters
         arch_init_ratio : float
@@ -46,12 +56,17 @@ def __init__(self, model, model_optim, device,
             learning rate of the architecture parameters optimizer
         arch_weight_decay : float
             weight decay of the architecture parameters optimizer
+        grad_update_arch_param_every : int
+        grad_update_steps : int
         warmup : bool
             whether to do warmup
         warmup_epochs : int
             the number of epochs to do in warmup
         arch_valid_frequency : int
             frequency of printing validation result
+        load_ckpt : bool
+        ckpt_path : str
+        arch_path : str
         """
         self.model = model
         self.model_optim = model_optim

From aab28e2ec69798f6b9fb55d1af3ce3ada74fbc5b Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Mon, 23 Dec 2019 09:22:18 +0800
Subject: [PATCH 52/60] add docstring

---
 .../nni/nas/pytorch/proxylessnas/trainer.py   | 97 ++++++++++++++++++-
 1 file changed, 96 insertions(+), 1 deletion(-)

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index e1cf13021d..0887107fb0 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -57,16 +57,21 @@ def __init__(self, model, model_optim, device,
         arch_weight_decay : float
             weight decay of the architecture parameters optimizer
         grad_update_arch_param_every : int
+            update architecture weights every this number of minibatches
         grad_update_steps : int
+            during each update of architecture weights, the number of steps to train
         warmup : bool
             whether to do warmup
         warmup_epochs : int
-            the number of epochs to do in warmup
+            the number of epochs to do during warmup
         arch_valid_frequency : int
             frequency of printing validation result
         load_ckpt : bool
+            whether load checkpoint
         ckpt_path : str
+            checkpoint path, if load_ckpt is True, ckpt_path cannot be None
         arch_path : str
+            the path to store chosen architecture
         """
         self.model = model
         self.model_optim = model_optim
@@ -115,6 +120,9 @@ def __init__(self, model, model_optim, device,
         self.train_curr_epoch = 0
 
     def _init_arch_params(self, init_type='normal', init_ratio=1e-3):
+        """
+        Initialize architecture weights
+        """
         for param in self.mutator.get_architecture_parameters():
             if init_type == 'normal':
                 param.data.normal_(0, init_ratio)
@@ -124,6 +132,14 @@ def _init_arch_params(self, init_type='normal', init_ratio=1e-3):
                 raise NotImplementedError
 
     def _validate(self):
+        """
+        Do validation. During validation, LayerChoices use the chosen active op.
+
+        Returns
+        -------
+        float, float, float
+            average loss, average top1 accuracy, average top5 accuracy
+        """
         self.valid_loader.batch_sampler.batch_size = self.valid_batch_size
         self.valid_loader.batch_sampler.drop_last = False
 
@@ -163,6 +179,9 @@ def _validate(self):
         return losses.avg, top1.avg, top5.avg
 
     def _warm_up(self):
+        """
+        Warm up the model, during warm up, architecture weights are not trained.
+        """
         lr_max = 0.05
         data_loader = self.train_loader
         nBatch = len(data_loader)
@@ -230,6 +249,20 @@ def _warm_up(self):
             self.warmup_curr_epoch += 1
 
     def _get_update_schedule(self, nBatch):
+        """
+        Generate schedule for training architecture weights. Key means after which minibatch
+        to update architecture weights, value means how many steps for the update.
+
+        Parameters
+        ----------
+        nBatch : int
+            the total number of minibatches in one epoch
+
+        Returns
+        -------
+        dict
+            the schedule for updating architecture weights
+        """
         schedule = {}
         for i in range(nBatch):
             if (i + 1) % self.grad_update_arch_param_every == 0:
@@ -237,6 +270,9 @@ def _get_update_schedule(self, nBatch):
         return schedule
 
     def _calc_learning_rate(self, epoch, batch=0, nBatch=None):
+        """
+        Update learning rate.
+        """
         T_total = self.n_epochs * nBatch
         T_cur = epoch * nBatch + batch
         lr = 0.5 * self.init_lr * (1 + math.cos(math.pi * T_cur / T_total))
@@ -245,6 +281,22 @@ def _calc_learning_rate(self, epoch, batch=0, nBatch=None):
     def _adjust_learning_rate(self, optimizer, epoch, batch=0, nBatch=None):
         """
         Adjust learning of a given optimizer and return the new learning rate
+
+        Parameters
+        ----------
+        optimizer : pytorch optimizer
+            the used optimizer
+        epoch : int
+            the current epoch number
+        batch : int
+            the current minibatch
+        nBatch : int
+            the total number of minibatches in one epoch
+
+        Returns
+        -------
+        float
+            the adjusted learning rate
         """
         new_lr = self._calc_learning_rate(epoch, batch, nBatch)
         for param_group in optimizer.param_groups:
@@ -252,6 +304,13 @@ def _adjust_learning_rate(self, optimizer, epoch, batch=0, nBatch=None):
         return new_lr
 
     def _train(self):
+        """
+        Train the model, it trains model weights and architecute weights.
+        Architecture weights are trained according to the schedule.
+        Before updating architecture weights, ```requires_grad``` is enabled.
+        Then, it is disabled after the updating, in order not to update
+        architecture weights when training model weights.
+        """
         nBatch = len(self.train_loader)
         arch_param_num = self.mutator.num_arch_params()
         binary_gates_num = self.mutator.num_arch_params()
@@ -326,6 +385,14 @@ def _train(self):
             self.train_curr_epoch += 1
 
     def _valid_next_batch(self):
+        """
+        Get next one minibatch from validation set
+
+        Returns
+        -------
+        (tensor, tensor)
+            the tuple of images and labels
+        """
         if self._valid_iter is None:
             self._valid_iter = iter(self.valid_loader)
         try:
@@ -336,6 +403,16 @@ def _valid_next_batch(self):
         return data
 
     def _gradient_step(self):
+        """
+        This gradient step is for updating architecture weights.
+        Mutator is intensively used in this function to operate on
+        architecture weights.
+
+        Returns
+        -------
+        float, None
+            loss of the model, None
+        """
         # use the same batch size as train batch size for architecture weights
         self.valid_loader.batch_sampler.batch_size = self.train_batch_size
         self.valid_loader.batch_sampler.drop_last = True
@@ -366,6 +443,10 @@ def _gradient_step(self):
         return loss.data.item(), expected_value.item() if expected_value is not None else None
 
     def save_checkpoint(self):
+        """
+        Save checkpoint of the whole model. Saving model weights and architecture weights in
+        ```ckpt_path```, and saving currently chosen architecture in ```arch_path```.
+        """
         if self.ckpt_path:
             state = {
                 'warmup_curr_epoch': self.warmup_curr_epoch,
@@ -379,6 +460,9 @@ def save_checkpoint(self):
             self.export(self.arch_path)
 
     def load_checkpoint(self):
+        """
+        Load the checkpoint from ```ckpt_path```.
+        """
         assert self.ckpt_path is not None, "If load_ckpt is not None, ckpt_path should not be None"
         ckpt = torch.load(self.ckpt_path)
         self.warmup_curr_epoch = ckpt['warmup_curr_epoch']
@@ -388,6 +472,9 @@ def load_checkpoint(self):
         self.arch_optimizer.load_state_dict(ckpt['arch_optim'])
 
     def train(self):
+        """
+        Train the whole model.
+        """
         if self.load_ckpt:
             self.load_checkpoint()
         if self.warmup:
@@ -395,6 +482,14 @@ def train(self):
         self._train()
 
     def export(self, file_name):
+        """
+        Export the chosen architecture into a file
+
+        Parameters
+        ----------
+        file_name : str
+            the file that stores exported chosen architecture
+        """
         exported_arch = self.mutator.sample_final()
         with open(file_name, 'w') as f:
             json.dump(exported_arch, f, indent=2, sort_keys=True, cls=TorchTensorEncoder)

From d9a778d994d4fd95569d180ca70e767c009d3bd9 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Mon, 23 Dec 2019 09:29:23 +0800
Subject: [PATCH 53/60] update

---
 .../nni/nas/pytorch/proxylessnas/utils.py     | 30 +++++++++++++++----
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py
index bfedfe56d6..e6f7b1533e 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py
@@ -5,6 +5,14 @@
 from torch import nn as nn
 
 def detach_variable(inputs):
+    """
+    Detach variables
+
+    Parameters
+    ----------
+    inputs : pytorch tensors
+        pytorch tensors
+    """
     if isinstance(inputs, tuple):
         return tuple([detach_variable(x) for x in inputs])
     else:
@@ -16,12 +24,17 @@ def cross_entropy_with_label_smoothing(pred, target, label_smoothing=0.1):
     """
     Parameters
     ----------
-    pred :
-    target :
-    label_smoothing :
+    pred : pytorch tensor
+        predicted value
+    target : pytorch tensor
+        label
+    label_smoothing : float
+        the degree of label smoothing
 
     Returns
     -------
+    pytorch tensor
+        cross entropy
     """
     logsoftmax = nn.LogSoftmax()
     n_classes = pred.size(1)
@@ -39,12 +52,17 @@ def accuracy(output, target, topk=(1,)):
 
     Parameters
     ----------
-    output :
-    target :
-    topk :
+    output : pytorch tensor
+        output, e.g., predicted value
+    target : pytorch tensor
+        label
+    topk : tuple
+        specify top1 and top5
 
     Returns
     -------
+    list
+        accuracy of top1 and top5
     """
     maxk = max(topk)
     batch_size = target.size(0)

From e9c7603d748910de89c604f162da799c63601fb2 Mon Sep 17 00:00:00 2001
From: quanlu <zhangql08hit@gmail.com>
Date: Mon, 23 Dec 2019 12:24:28 +0800
Subject: [PATCH 54/60] add doc

---
 docs/en_US/NAS/Overview.md     |  21 +++++++++++
 docs/en_US/NAS/Proxylessnas.md |  63 +++++++++++++++++++++++++++++++++
 docs/en_US/nas.rst             |   1 +
 docs/img/proxylessnas.png      | Bin 0 -> 26933 bytes
 4 files changed, 85 insertions(+)
 create mode 100644 docs/en_US/NAS/Proxylessnas.md
 create mode 100644 docs/img/proxylessnas.png

diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md
index 3426673669..ffa0e5bcb2 100644
--- a/docs/en_US/NAS/Overview.md
+++ b/docs/en_US/NAS/Overview.md
@@ -21,6 +21,7 @@ NNI supports below NAS algorithms now and being adding more. User can reproduce
 | [ENAS](#enas) | Efficient Neural Architecture Search via Parameter Sharing [Reference Paper][1] |
 | [DARTS](#darts) | DARTS: Differentiable Architecture Search [Reference Paper][3] |
 | [P-DARTS](#p-darts) | Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation [Reference Paper](https://arxiv.org/abs/1904.12760)|
+| [ProxylessNAS](#proxylessnas) | ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware [Reference Paper](https://arxiv.org/pdf/1812.00332.pdf)|
 
 Note, these algorithms run **standalone without nnictl**, and supports PyTorch only. Tensorflow 2.0 will be supported in future release.
 
@@ -93,6 +94,26 @@ cd ../darts
 python3 retrain.py --arc-checkpoint ../pdarts/checkpoints/epoch_2.json
 ```
 
+### ProxylessNAS
+
+The paper [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) removes proxy, it directly learn the architectures for large-scale target tasks and target hardware platforms. It addresses high memory consumption issue of differentiable NAS and reduces the computational cost to the same level of regular training while still allowing a large candidate set.
+
+#### Usage
+
+```bash
+# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
+git clone https://github.com/Microsoft/nni.git
+
+# search the best architecture
+cd examples/nas/proxylessnas
+python3 main.py
+
+# train the best architecture after you get the best architecture
+python3 main.py --train_mode='retrain' --exported_arch_path='your_arch_path'
+```
+
+Please refer to [here](Proxylessnas.md) for detailed usage and implementation of ProxylessNAS on NNI.
+
 ## Use NNI API
 
 NOTE, we are trying to support various NAS algorithms with unified programming interface, and it's in very experimental stage. It means the current programing interface may be updated in future.
diff --git a/docs/en_US/NAS/Proxylessnas.md b/docs/en_US/NAS/Proxylessnas.md
new file mode 100644
index 0000000000..3fe24d06b8
--- /dev/null
+++ b/docs/en_US/NAS/Proxylessnas.md
@@ -0,0 +1,63 @@
+# ProxylessNAS on NNI
+
+## Introduction
+
+The paper [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) removes proxy, it directly learn the architectures for large-scale target tasks and target hardware platforms. They address high memory consumption issue of differentiable NAS and reduce the computational cost to the same level of regular training while still allowing a large candidate set. Please refer to the paper for the details.
+
+## Usage
+
+To use ProxylessNAS training/searching approach, users need to specify search space in their model using [NNI NAS interface](NasInterface.md), e.g., `LayerChoice`, `InputChoice`. After defining and instantiating the model, the following work can be leaved to ProxylessNasTrainer by instantiating the trainer and passing the model to it.
+```python
+trainer = ProxylessNasTrainer(model,
+                              model_optim=optimizer,
+                              train_loader=data_provider.train,
+                              valid_loader=data_provider.valid,
+                              device=device,
+                              warmup=True,
+                              ckpt_path=args.checkpoint_path,
+                              arch_path=args.arch_path)
+trainer.train()
+trainer.export(args.arch_path)
+```
+The complete example code can be found [here](https://github.com/microsoft/nni/tree/master/examples/nas/proxylessnas).
+
+**Input arguments of ProxylessNasTrainer**
+
+* **model** (*PyTorch model, required*) - The model that users want to tune/search. It has mutables to specify search space.
+* **model_optim** (*PyTorch optimizer, required*) - The optimizer users want to train the model.
+* **device** (*device, required*) - The devices that users provide to do the train/search. The trainer applies data parallel on the model for users.
+* **train_loader** (*PyTorch data loader, required*) - The data loader for training set.
+* **valid_loader** (*PyTorch data loader, required*) - The data loader for validation set.
+* **label_smoothing** (*float, optional, default = 0.1*) - The degree of label smoothing.
+* **n_epochs** (*int, optional, default = 120*) - The number of epochs to train/search.
+* **init_lr** (*float, optional, default = 0.025*) - The initial learning rate for training the model.
+* **binary_mode** (*'two', 'full', or 'full_v2', optional, default = 'full_v2'*) - The forward/backward mode for the binary weights in mutator. 'full' means forward all the candidate ops, 'two' means only forward two sampled ops, 'full_v2' means recomputing the inactive ops during backward.
+* **arch_init_type** (*'normal' or 'uniform', optional, default = 'normal'*) - The way to init architecture parameters.
+* **arch_init_ratio** (*float, optional, default = 1e-3*) - The ratio to init architecture parameters.
+* **arch_optim_lr** (*float, optional, default = 1e-3*) - The learning rate of the architecture parameters optimizer.
+* **arch_weight_decay** (*float, optional, default = 0*) - Weight decay of the architecture parameters optimizer.
+* **grad_update_arch_param_every** (*int, optional, default = 5*) - Update architecture weights every this number of minibatches.
+* **grad_update_steps** (*int, optional, default = 1*) - During each update of architecture weights, the number of steps to train architecture weights.
+* **warmup** (*bool, optional, default = True*) - Whether to do warmup.
+* **warmup_epochs** (*int, optional, default = 25*) - The number of epochs to do during warmup.
+* **arch_valid_frequency** (*int, optional, default = 1*) - The frequency of printing validation result.
+* **load_ckpt** (*bool, optional, default = False*) - Whether to load checkpoint.
+* **ckpt_path** (*str, optional, default = None*) - checkpoint path, if load_ckpt is True, ckpt_path cannot be None.
+* **arch_path** (*str, optional, default = None*) - The path to store chosen architecture.
+
+
+## Implementation
+
+The implementation on NNI is based on the [offical implementation](https://github.com/mit-han-lab/ProxylessNAS). The offical implementation supports two training approaches: gradient descent and RL based, and support different targeted hardwared, including 'mobile', 'cpu', 'gpu8', 'flops'. In our current implementation on NNI, gradient descent training approach is supported, but has not supported different hardwares. The complete support is ongoing.
+
+Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibily define their own search space and use built-in ProxylessNAS training approach, we put the specified search space in [example code](https://github.com/microsoft/nni/tree/master/examples/nas/proxylessnas) using [NNI NAS interface](NasInterface.md), and put the training approach in [SDK](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch/proxylessnas).
+
+![](../../img/proxylessnas.png)
+
+ProxylessNAS training approach is composed of ProxylessNasMutator and ProxylessNasTrainer. ProxylessNasMutator instantiates MixedOp for each mutable (i.e., LayerChoice), and manage architecture weights in MixedOp. **For DataParallel**, architecture weights should be included in user model. Specifically, in ProxylessNAS implementation, we add MixedOp to the corresponding mutable (i.e., LayerChoice) as a member variable. The mutator also exposes two member functions, i.e., `arch_requires_grad`, `arch_disable_grad`, for the trainer to control the training of architecture weights.
+
+ProxylessNasMutator also implements the forward logic of the mutables (i.e., LayerChoice).
+
+## Reproduce Results
+
+Ongoing...
diff --git a/docs/en_US/nas.rst b/docs/en_US/nas.rst
index 2228e52d76..89fdd48ba7 100644
--- a/docs/en_US/nas.rst
+++ b/docs/en_US/nas.rst
@@ -23,3 +23,4 @@ For details, please refer to the following tutorials:
     ENAS <NAS/ENAS>
     DARTS <NAS/DARTS>
     P-DARTS <NAS/Overview>
+    ProxylessNAS <NAS/Proxylessnas>
diff --git a/docs/img/proxylessnas.png b/docs/img/proxylessnas.png
new file mode 100644
index 0000000000000000000000000000000000000000..274e1dbd5b63e9142783baaf3b2ac7131047c6fb
GIT binary patch
literal 26933
zcmeFZbx>7b+b|3wAs`*nUD6yuLZk&mN~K%ifP_e=bT>$+ASr#0lyui2ly0OE6gYIh
z`|!JO#pikMXXgFqo9~<XW@MbR_FC7vy4Kn|L`6yVE+#o95)#r~c{%ClNJuCkBqZcg
zj9b76d+DGW@E@}Mb6H8G(gDg1;Kyw<2}KDcq>3o4b0bvX_no(LTJ}gtIBkeO<Sv^6
z6C@<_UU_MWmoEB08*IAyr(Ne*c7jqoSs&x^aNd3{omCUZtQm00`#A#Z6z>_q@ajmx
zg5fh_Vq>)!%C0gU>?&4P%{PVYS&OgU-#h(6<4@F@)fpj%K_ywhvGJKDGBh&ESO_v&
zAiO_9kf0`0GhTn>RyH$Oc?sKUFk78;A9UPZGTP?1EYX63OWgI`54T+B;9Uu(TEM?C
z$L;|bgrg6_v8ioh21X1eIC4TjP{dH*7m~^3>mgIUur1j25TsPb3H;C6ae5J`yv@0W
zSLZ$Ba9)uY2}bNa16HxvM8L1kEpe^D5=LMGp9=8n9u&ottSWX3#@r(g{4$2&#F<V!
zg@95SQ97;{tNed`vAv1s3i{k>g4t=i*I`*NvoX#sG%uKM;w8L^J~dn`5rVBF?*x8t
z@*!#SJ89#UatEdlDdaW7<qGMg&=!st(*=gd&frDQx^>+*MW5Npu3e8)*+wjpRs(qh
z$!GpzXJcLx-y&{+;V`N-Uf@qU@)&gPWd`*rzuRsiY6t>AVwHz?{Bi>MYM{DLYdo!5
z;=Z)xmS(XL&2gRG6luJe)_0$8;?gnuZdk7PF!lMr*&fHl;fxL2P|h?fug}!*DOx3C
zpQLJ@s{&0&t{a(edN)Jwq}xi0T`<fOn|iI+UHMz_fn|F6te7T~6=QuM{wt9_LO)>x
zhOS};IEnPfph*Y3iD@VB^lrxX$-w%u!HG|GZ=Pd|%Vng^mB7}z@>STCht04{!ADwn
znsZNoQowJ0Ful`9g<?OIoZbTWY^0nsb%d4iA-hixY&Eu=L=cMNJk7udqyLAUl~lGi
z%QY|%J0JB+nsthd;<yt}-cZGa@)=&9{PamOj_oxpABr%lQG*Lg^uwz3$#>A9WS`@2
zgkDUuNtLX9<?$wF0`h3R1PLN&t5#?&%2(aa7emfhA;&(#{uselH0SnS5a-ENgebr7
zS^6tr2rP=u@1++=Lr`tS+&$)G0w(YkIIkLr?{{%6MiOEu%7`|IzT!5g!|~yi_IzM?
zIE}|aT#%5f!U$`z5;Klk=i1_hNG#!ORV=f7PR-!J>xvCq==g9HEd+1ZaZn|a#>x%e
zfBsTun)}enDs(AA*j*+8Pm10ma0s_R{nO$~^1Wnp9`d%*C;0VLX)i_+WZ51}Pb29h
zMp?i8amSj#>~5?Ot0-RjAhkI9VPK25`TZ#vk^02T2CTRU#kF)dV=&tvnri~T2Urm#
z5mu-P(BX9EX**I%*AHMt6_8aL3$+`>t=JKySUd5U8#Tyonc^LNJ)R>ETq!YeV=aER
z<QbSAyObSG%|0Vp!@!XSVVgKL$wF>0_8B6Mr77<(=YZ5bn)m@#mai<vlWiZ*1~0jf
z-2G@GY#0?2N!l<R`pLpR@MD<m+=$`5G~p^`(JWCrmbLvEl!hw)pcZe|_&PV9&JS)o
z(8>l^BiK!)!Pa!4OTE1smy035^bMaq2K~Dry`6_j!pdzd0;&5yBm_)4ZzYH-Uc4yN
ztSHZJ&-Nvc7ULq|&{BZIz3I&v7tG9A9}<)){D4vg6y9<jZMPj|XXmC9pHX;lZ&8&&
z71NH%uUv%E(}7ke7}8f}Rt`v7z=ujnJx6xDqXLP@NbP`4O5qh@=ng7D#f$pjN9Gik
z;xAO`_zc+z_d@$HC$DUusN8q|wbIZVm?i(Hz}%=Y8YP-qNXx&ZUkc}EPp<eIP$ZL+
zPvhbp8PVsrX{D?J7VnwLo}^_sOS+f06fovP(I-x}kn_YR+`0IfLqGICYq2$wO>-z9
zvym4SwbhMz&(NFXcteQy^mio%?!c0l7z{NH2!?CHBXu8{S{Lb3YXl`QS{x^+qBR&!
zYQh7Aru<Z>*}7rz6!^MM{#M0L7v1N=tXeVK6{9I#<`_~XYIeHeV&0Yf)?GL(Sa-!e
zH-x=EhS3Z9K|-XkBm*X&;YA_~al24;*fZp|THT7yyfpkyA3-0}^v3lrF32BeutY|o
zH?TKlL3ioiCNGl9mfe^NC}?bJ0&OjQh<eN9wf!uLEBooAZ@fx&r%9~zG3aj%4w#Kx
zm5c{Ia<r+(rcW}Gu__q4l=xLPrx8rmSS=G5+~J<s_htB4RV+RuV>3xTWGT=R+nF$Y
z=U{B1{>So?Y0Ikq5V-Wiw?W-c?ib4W{L>8*CV`Bi6W>klE>y?~58^X=G8tBXq)6qI
zrGPhg6OW;wB^wV*+BY|Z^_a)X%I#@R*!0ah*+lQ~B!>SSRc1Yklp5HmMQ`W6^*P^4
zWF^IPxlCu^-2qPM+oryfd|npSKYo;_{^?NJ1}6&kx+4*Zrj&ihEnU7`-U;KJ!C%>d
zq;d(x4`)=%0=y6D*OEEA!_*q2HcOs!ux=Vebz{rBmACbfH~Ymq{3<e2pSYt?k8c>z
z&=y`2o%1j&zojtK@$)`rgYoxKHaUR<!}M>XED;Fg*yCb^rco%-u!0^0pR=-V%xE+e
zGTTYUzJcAp;Z?EApx={(m@xPRB{c=nGyTY7D)h5o-u=elx8;j!zbKha%ebNVbnH7*
zZeG_SQ^nl(qf;RQx(!T8Zhm*)=B-arn0>4LN(s?nL-KOePB`%HJnx7NQ#>&HR%H%4
z1(Ll$mBfVcGG$s>H`3_%9){B0PnZ*<4K42p-HRP1<;&jh3Vd@HZ6;$xMVG$(Vl*=z
zD<G>R2ZXeM-;t5S=euKh^lg|Hy2O_tyvOVhHMmRfLSyyH4V83nl{tw+3c*I;LzKvr
zkeFM%GV2HSj9g#0?%dKXpq-SidaQ*c8Zw$lN@^+Z>7-;+2v)-yhfiWEQ>m7#JebFu
z^3!|pj#Ig1!Q(6GSpQkTeqWS!?ZEp^ftqE1felBv74AMJPVl*7TTL00b!1v}4h0nR
zcgCW1>!S&GIKgUrHhU_Py0ccut~&iTyb_saJ{8~`ffD8n47(=}`;2a!j1BLFzrL94
zVzD7s0*5gcuF>#&RGS#ZXD@`r4A=G}d8;I2=ZvdbK~F|3HzeGMR(*B#SJ<Kn)4$Kz
z@2L!0Wv&U=etd+3p7u`CeP7tM3LQMCq_|ubwV$ID{68WU%Oq5>-1j5ik9$0D1z*Te
zLOwIV$Gyk83CjlQm6%Bo`87h@GW2nT+*vQHKeTcxssECC$z|%k7Lz_atzi{_W^iQ+
zW`E=^{SJ|EbTXZ3zl;X~?wE^3c(@+isbE?_`Afhc+H$O?oynLY%~=wCU$}O13br*_
z7;^q?)cfqy!%^w~DoJEZap|JTp-wfALk515_<9)6#a5*cD$~0=s(7*c%uJg+pzYWm
zEae>W28xjy@*X?l5EZ3*|L3!dnB$^al#Kmqwa(OLi9cRYK<g#t4U`%m1dl)KqGqU=
zM3FG@7Sui-W&BYbq*68@=Wq8aYENn1Tc2g{u=yG%!Q1j{jXxA`x!;Rb>@1f2orWev
zqmbzLjbV>x1&njL==rcIDntDh##9A9x;~6AX9VpnI|guW5pPkQEhK#0TX;>Lsc2O4
zzQfb}R|5hU3-U&fbh~D&fW&atR%QO$uytZL#SV-R>~(u)2vKBQh`-K?ZrX#Y!HjpJ
zZs{|YoEyZHcqg`FjOpKt<`~P>ph6KsQGayZe|3TTkfyxdFcbzUlx;ZXAJ1XZg+$t^
zZc|ZdcN7unHn@GSbH-Tm)orEID%C8nWz0lAvp9jEj|poP=R1MdO@!}EKB#*MMNyU7
z`^5N|a~T&KN<7x(j;6b{CuB+=SC)&fp9M2Z%S~xp@JpWuXZ%KOE8!pE_4g2`f^kEG
z#DeWQb^q{*BzByj{nTw`sTFCoWgn3;{@8waqD7)(FRZuF96U0<%%vr}_f=lRHmBkl
z{KJG*!5ccrZfG3Gm)Z|jy$$F+b^*Q(!gr>$DAK7vyB)|Q>tu3FMV|6#QN%s;nam{@
zZPx!3WKJ~bo>sDxgC8&z-SimMT>fQj-Giuytksx9uOw?XcZ3OSlk6N^&*iLL4eYcx
zEA9@YBG?L3m$#HJTk!4{dkY?_5%~3a{Jt?)&hIeM_uUT1$XfiF?D?(Wh57+xx>Tau
z6?Z1f1!x6t)h`I&x5cx~OjWt9^HDCY_17y)u3KETU|Jd~5*BK5omdS9t59hSPU))d
z%m^8Kqds>DzmG)jxH5~V1i6O#C92W07|}vw<Etqn{+0X`WT{mmTJ?O$SyJ>M{qhVI
zu3IYhISf!M`daV<G{<bBOGwaHMsNm##Ez-A-R5XN+6~-PD76otPxS7)w2ZCN`4f@c
zsUhZmOQ4E-2}!4WNuR5tJXvLEmHF_@?Ssq9fJGrH8-|8A)<f)qwD5;!5|8n}4hlYS
zNH;N7=9(|O8FSuuvpaBFee%=u_SWZ9Yh~`iE`Riv-x~8QmY${Lqas>F@i=ZNrQ)ci
zh4+fqnCg9}7z3>JX^Ax{P0t)TTzw=dmOf@6^8EODtI;Pl2Sdc8-Xc4ZUWapExk?sF
zobezVl7&wvszVd~?#}AVdXj@LU#dyRhAAKU*=w(fE%_g=JKFv7NmFt|xb{4d<)Kd1
z2J)WV7hcOhVev)FRA6x6P8Vw+sY_UU{njXL>nm5o_Aab?Du2BZl5rF}xelE;Nw<cw
zW6f|_tS5mv?vL-(Mnnr0Zu126R64F?DCBD_wz{gf&sy;-<hH)(gKn9$g?2#j;7goN
zr|ek2wg<{5`^VnAgNW-sf8}ZNWI(P<Cua@jJs_p4bT`sme#wmek)!l?9%BpXN<96C
zYpCReWUYxGkhd_;J2q`K)>KJlNm+Oa-2KoUxoD=%(uSBDlsc#7wkxYN-ikjU?qc1L
zaIhO8OXrYnVLq_nS+)CJgS?8oqrRdU)QX!dif;xkMRt=I#DG%$jvl;S%fo|BL;J+O
z7Zsp7Nf$D^6JiNtrQF|IyIQ{E{*YTn#MsQqddII9xAM(42+a8hJWe0?qb99J=d!3t
z3=ya$3Jd)JhEl{|1=y>|XEV%EO|fqTItI3;f(>xbCp!kyc~>9(ffD)b9FIWz5=+ot
z2-@`QfxG0#Ei5qD^3MdIOi!ZvUHOC}%5<j>zmY`L_>(@r{TJ|y_eL4^>i=K-{~m_9
z!y^rj6^*fD?m$4_?iu4mK8xo3E!W<R?gu=p3*EOT$zF%^qRTROll(~%l}*U+*}paM
z-3ouundHYnXoB@yJs>Q6cf*AQ`6Qg4kQ>8V0NAL%HGGmV8Lg9&GMQF{A%gcBw(oC6
z*$R1va*aZIOy@OhD~aC``A|B~=Ng3pIpb^CRe$u&oqo9a8U?c)-fP%8zgt9cqA{-x
zuTl8FCojozb<OL7rpw#)r1GKZa1s=Y8t*3F!4c{S*A)k_{{qR0j@P!r{PAjaHp}kf
zxaZ#89v-A*t-~U(edYO!{k_=lpPFNyyb!w(@uE%bOw-%(zR6Z;tIv?@h8Kc8)fh8R
zHoDeMH=6dM7?lZ4*ikdEaGt<*iJibAw1g(ViNve;+I!s=GX-QZL+=2tKdsymshj#%
zvJQKEn9MDk&hF`BZdQeIQxNzRB^=kCHh`fmoRWSAQ(`A=F1w>*r}?q@cOWvW+zlrY
ztTovn<0AD3QvdsolPH`=noWnPHqrx$2JXWHu>XZUJ=DRTo0VShpjgEZ6z)gEgVV`m
z^#7TdMIV>y$%UYV`Xe!H-N9r!osq+Uhm9G-@{KzrYzJkS((ujOBi;h-N1x~(paCJ4
z8x6~K$JccJc<L&sS^;zH!^=h|jJqlLACw#vE=}3z7T+>X_tV|n4<B}ud2cDcMi_~>
zRFlm<?-x66D3i%3e<f=6M9L3aD80KBIg@gXkWYFb>fLJJWORrj)IuZfE=lMxv1va^
zoZ<%!Vt)q#!8WB*CLtS~n;Jq;Tl`O3^P1W8hrx3gjVxviJX}W7Nk%eeXeyXPbqg{%
zw_%;>ow<1vm=ERa0)1hWKt=C+5lY&Pofy*aF;>o}yi$Xw`h|eR1cNU5^C4>W;Kom7
zdmYSMqScf#Ztpy${?eoe&>Na0XB}m(yKpZlVn}l*d+N)8mD39(8Wa}BJoF(!NxOrv
zpLbf`uarFE!YJEgCW-6(;7Q){rEEy7;*Mhf#@5{F;tbVa3itIdo>ih}%NBBr_)KX&
z3I$PkgJmc6-D!9Ak+DJ(UrD^i7Wp(TcUooDN}t=v=S=e#j^cme_|wObfgUw>Q+S_s
z^0Qtl$QXdr?h@mXsH0>Cz`?C=bRg}%>x%qSdynL^4&a4;y8znk<apET-W%YlYGhKV
zgEgX-oss(cRL2)!3wGsohKDJW9|pRSOs7ZK?}eyrRR?nRpt_nP%yg*1`7k~2S@a)Y
zBMoBjL-eE(KI3iJ5Ua@%k_LXrNrr>I!h4EQn!e;L(YFHUPL=9qqJF}uM-vB_PZkMQ
ze9fKSfX>OVXJ&Q9-rw}1KCpk9*8f<iDDKPo=>eUo70IcOf*?(@s`7k&cQ;4h)rx%p
zOz}cP4il2vZXzb#)|aD_^}9D!Rb?^pUmeD4D&44qH_AktayioQ&iy@*tD8VlDCT`C
zCtZ*CR%RFzepcpdnZ2Fkho;p~em+4sqC@JJM?e2keih&U0&zWmc<I=Pbis3xKZ1Ni
z2%72TDRuE<AY3feY77Q36NCaR4Eh`cVkpRrEdMn(EqPN2w-Rp(0gZYcYmg@K6b@bc
z(7AaUS&#ujiQziRjcF<E!p$*kS#9I4sa%6C%Y*&Z;S}@d>Pp)}&{fo`y$`s*YA19g
z{xjrtV+-8&O_TE<c>2<CZy(T7$;$((m*o;O+S@7M0@u4C)^CJCDRQrIH~l+NP4hWe
ze=bY$s@Dz?t4${B`Tjdf(ZAKwx^Aum(Z)~5WC4cM|DxcvIod$DK6(O|msmFqp+DL{
zvFj#rSXh*VQc}*u?Yn7Yj3O&)!g8fphiPP6%xC$NASA`!QX;jM)u(MpN0HR;6{`bo
zs?u$)o2nF^PR5NTq4@qb+5kr?ogl$TV8f1rc%%8Rlst6Q7-Z8J0xlLEdd=ixoA`3e
z`YO`c!*bpk?$mZE@90>C6<<dYXr>gCISL<oH)ZF8X<79E>9~lr+V{H&$u{`^+1=bi
z!JpNf#zDd*(#e{0c0w^j(`>?U@)yXh-L!IJ)$g}>6R<E*Bl{ZHIf%4Doj=O9D*bba
z;!2-kdDVVEr3$hA9P2-U<JtRTjG(4iPwIMqgL%KRb-<K;1)XjZIJCHLC82x#{oqr|
z@hyBZH=jEL%=bgAx@j^(5*zWR>^$KWgC1I?HY+6)RG$_d`Ii0dLv3(BYQM&v19I;u
zbs44e4$CdBH5QHuOP0f6y)`VG0z8|q3*hv1i(llDeKFAShJP{mkuday-_DoUHhB7+
zX;XNe9(xcs!e!Wtyoa?_AGY9&25RUHQNH8eKpqopT>Zxt{yXPzb?8id4}wkBbGi3R
zLkm~c)|icS*!I0mCL$pUB2fPSun1=b=5nIN_rMj1Tze_D-2+xwCcah@ly!ep|KFJs
z?4v!aS*N9;!kffpMIsJUvno_Ri7ML2*+22---*-eSD`pi8*gVA6E*uA+>1`A7h7HZ
zh;}<QoxB*X2uR_dbA=mO*fs3H@pGYp<24Qk;}i5LC$J9SqOley{)F}4c@lMcWg1!N
z8qHoS(s%1^g*Iogi}Sv+h8|41`X^7fHQuXQpFe;jQXD9)Q^j&nLJV1}<x7IH6|<3B
zO)dUDI*YKO`$RoasS>M`nIKuQWg6#?HcP?usJ$m3bK*m}k0GQk(;^|j5g)*gf86H3
zQ>`S~ATBefe(=Cl^4G4B*t7-X%oE6rmNH$#$xh@%=mmU^+ahF?RV$MpO`T2O*g!}b
zZH^W0Pu#`@pC*5-oKwgX$1nK&^i%P=KRj>L$6VhNCm3T*5{~j8m{JS+sN*}>Y&n8r
zD7!!%9czE<3nd2iEv~z4tTIK&2G0%5LKV9S6L3be@7i{ld#K(IUR0@HX>?26UpJ;l
znVxX~Pcn9(U`<xVPVt$&)qr{XNnWxNqu=wAdXBdv{Q+%+k}7hML(KqZ8J{lS%FW?_
zk$o*_DbaS9f5zKN_Q!0CNjtL()g;jd-2~~Rumao3%t|v(=XBoit*>(T(p|#x4&l)L
zZVp+0KbaTC?2I_E>UU80OeN!9i3QcLbvH0lWz=bKiB2VJA&yV^RBk50)aiZcT8yok
zEA7nnUzqvv$gy@QV=#=kA+PV`zO@sYQGM4F(HNnT0XVkALboeYpLFK&3eED#pOfEu
zn7?#_#ma+Bo&RUqt=XB4)hWGbBX={9fCqBEe#R&Xtuud)$&%Pc72=2N$lyK@zOn9;
zmXW2_>k7tdb(BG4&4(Ipx6)xfeen~<5%ce?_;*$wg%i=emC~QK2&?(wwUj#VR(o~!
z6mODttLRP{aSs>0>P)9ywqQF^O{ttjYI4%>UgJ=?F#b-nrMqkjb=bTdPGfQewu;g+
zwJOh%4l|#4zi)T+a2awktqxw-8`FQz6>k(Bd+WNtV1|!fipXx4q0A4}*QnWX42=z7
z@4SU?Be`&LvV8j`hvN45dGI4V?$)m?cQK!!v9drG2>7u8R@40ttOliN?Pc6AzNmM;
z?|%+2F`eXL8DWSBao~N@7qST8brfDZC3w3wddwcY746e<D+Ya0;t$;7Wep+@b2D(j
z?-c)!H!|l=5P|9s0H*n93kNDuE_l+{AA34v%|Z7xgjZXMD9DbU{0Tc5POQEn_KU&f
zU7)$)hf0szUiv|qoG2XNiSG0AJO2so0D~T%73WjqleMGN6ZKDWK1q;h*2+>4%A>;?
z)o|+sg?FN6RqZS-5i-=lAsm7=5}bVELbg5BspTKilZk)V7JuB;_kipcTx{eSO_TRF
zr*02ya`O*_MrouAN7J<)vo@Ay<{=m$OsSzX_@D`78qnaYD25KsY#(bIfD!x;QZgaj
zv0J;sK-To>2GzhAq%XJVT)f^jMpyZ>I)0*}Q>Bb`5QlcN`yS{yh?*tTm9DvHigAz=
zcA-|p=h9BTe14`({rZ&|IOI29MevMrFV|l`WBvsfI&e?!yDn>!EWP^#UZ#R-kUck9
zjv|nY_jeu~xD8s$2;nq8+Zij}ddXMNbV#Xcf@Ri;uIZmbieu|gQTayv7uiYd!y_F%
z&WCi`t6UuwW;$rLFT|bMQ9%de{WmSAoIkv~q1zIz{YzzjVaHQ%!*}1-wkv}>3iY?D
zpIqlNrT$xJh74~l$`eqqW!%`zJw)%kpteDtMPXF;-bI$gzxR>#)+^_jj4-6t#Qgwi
zTe)w@o%)l9&&>Tykj!<gN+`S+YrUfBPVZ86JJc7HJQpZ*@u3cT{w$=_JBRkL`#mwL
z?elDuuhlLTq*prviIOX-(%=5Oc~qfQ-07M=UniqO`6~Two5J7Fzi&MO?-sk{+u_<L
zfl!SPFM5t9x2&R&Nb6e?%@KhGC^j{24R}_)sN`|AbVr7$7K~CZ%eA|(3u%86OO`&J
z+@lYbilrXHmFSORr7Y-2`=Em+I<)z4XcK)TkbeL9bIcEe_K#jab+WiArc~dzcwJ$v
z+^OPLzKj?8T9Bz8nP9p3{$yT#sVD+STk%p^j!o!wLZwT5kdKi@ew>rn(n~AV_yx_z
z#|@42RmB}+?rjXyU6R}%eRi}KrBu_Boo(0*Vn1x2aC(=MkS%;SA&jAJ4r96q^b}A3
z`UvBqlYW<<{f7}4nJPf9oWp-nPNL<@+DWeUAoFAT{~DjIFvZ>pta%}j<BGAogYhBw
zOfy0}azTZQn5|9-B)wyPC*^+XA@Vzdp!$67xEoo2Fs<<)cO<}+a585$ypAdNZJS_<
zZ(n}zBZy#W?~PJ8eVq_gHi%RUU4udWuY$%m{7*tLfioBJ_3Sw<a1P|@wWA|B0cjaW
z-}XF&?WYmsMlQj`Yh2Ic_kTv;+`h)*&-PV*^UwIViI6LZWA{Z+JlSW^9DX$XRQ;hc
zu!S|~<ySRL#q(EZz^M37XTWLaOhl7#lBiu>?wk8a^&PK+$?cPsj2tal3>W;O%UAep
zC!O4ssP~I)vhKD%S>V=R;&C;gdiqs!2RuDj{RJhPHl1v$Q@;a?Ql`wtuLdJ%@e$=`
z<dTx86zKuMZ1R`EZQt3%!ON60gn^1Dk*ljG1QU9K!jqrz|1(DIF|T$L90<s0<;!~V
z@(Zo_fyV!Ly(HEl_GK)Pi{ku?q788C#|7@*Zx{aN9P7qz^FMHceO0}n)^c0PAfoR^
zW0Ea-v*9HZA9MUp@3Ey(^@OVjSX+M@P(EgI8Kf!&T3<J6!Fzx5+j4wegH%foJjBjM
z*5%|o4_&Yj{S~viCz;5~DzU;*48Co_C6iyWh^*14uxUu@7KjB48Zk&qu@3yLCGU3Q
z?j$Jv()uc25RFPofNDGPRp7jXu5HhBc-TSc(Uag@@dtwb7M2>MsXw3z)>>((BY_sl
ztOg@?vfbf|w!JF;C#q5d$}XoWDgxI2{~5{px1;-affFyr24=3S2W8qu6;V&&PovVG
z&{y%r(3)BGgMIb$xg--&H0V+xUwtTwB9xru4#u>e_L#Lc+TGnUlEY?&WYxVQ{33aX
z6(sHcGBaJSES)Us<^A?eLDbs$;|NAKwZn|*z=57rIjaE1cMLb1e@{Is{<*S+=wMMu
zw2kaPLEv)1X!tZc){6PMAN6YQ3sySvT}5atc1eb=z()h0@z<x=6Cb*fdh(d!V$;hG
z7bU3}GvYfj$n1oBN(k17LcZa`qT1rlk-sp}c<#UD28o((L&&0v%x1@AZWfBIVK>Ys
zcs7kJ_-XsIeJ<F7b<WzkmGvHk^26x+wk**5ZFu+n0qm}Cp9LF~z(sCBC=+6_ECjhA
z*Z7LHT)s!kr@p%0#v#}c($>R|_a{FCnfsp}Ccgd+19^hoNXZz39J{Yo{@J*nS+z$J
z)Uv-NXO8M38`oT<0wE=Vg^<NXU`JbNOA^JU$7UFWOBZac-8z(lT5a>#3JNvcxu8!Y
z6Hy7qj*h#lG&Q^OSBJ!<;!o*9(bC~4V;1O09wxdSLlBa@KpJQ8?l^$CjM#L%b!_12
zVXLnYFKAP;Q(9t@&1j|ez3T!Vp(VttcO!<?-6Ws0;%TvKNRCyrekBOBYL2MhFu4D8
zE)5xKYD$6XCL%w7s&U7OX^yh9jBlliv&u;wUc!X6+K$~+>o>4Nrds~^^M?&s5(aNY
z;HCV18b(Q>iY_2gtarsJ*Z2tw7a~V^(Z2+N!8KSU2=Yumj<E5>BJM3Qt|+YH=IH2?
z{OsI7aiUy#i-UE8On;eBquQTUlSX^U0ql?oVD93(JUu5DC;Eh*QFPv<r;x_}a6=IH
zW}*8x{s|m|2tlOmR=-AUP_K__L&T<vJ0A-zLBe|KhStaf-}ht_qQx8wB6*Nb7A(!m
zf+;d%CLnlVtaJ+Bx4Wsv^!1v5x&sjfWg4T>XmZ~y=_yF8<Z)kF4ma)uVeD!Kx#&GC
zG<yNoT7<b|K6dyt8vHz87$i(l`ygqfHWNVon{F>^74xGxeW+Q;VswtpUuGM^{jZxU
zK;PJq!JHhe&;HtKA^hoN7wY!14tI7)J{weuzN*SN+4M<&A|X%e(lz~1IO>@G?PnpW
zxE}{bKq4sTj*8wzvGF`wnO5o&d}d2>EMwMK8L1quqx3G<-tsS^T5l4i32A(Sph-o3
zndn78^~{Jp9Slz+#BDBY_4V7Bvc^7(EYQby{zKkV<hN=v&5h>BWt1GjCu_+E-<Oq&
zXk17(AhKzSBoYCy2Fjgt)&z#gvo)ANg(9E@iGxTt$N#wTPmE0D1Fp1ob&N1KLQk@;
zc<RfA6|*SHHfGNbk{_JQu_Z_ED8nTW-%f}e5*S7mFjhS+3;=R=$P0`*afL;mrk&35
zT1!@)1UaG6WaON7W{fk*Gh>4XAMglriGmdms9lrqzFOK%G!wdNM1I3zz$`656g40%
zR!%er!7%CrZaDof-jnLH^g%!}Z558RSG0y3{vkoC+aZ%dnXg--1(Lv5vj?c}>0}Z_
zKs-8PRo}wVVmt#uD4zutTd3tPX{!W|SXcS5R-czfeCHI&&tI)wDDhF8M^ybce8=Cx
z^}fR16!rr73?bk0Id{_`GpTzLqg6x)1@p&&w5*~wtVzaZ<z_EK?h|^FxkbEau2qzF
zx{*k0q#qL2li68na`?(5(jd2>C&S5Q8m6A7+vf}pN&=NG5pgX)4VFZ+4c|P{U-;Q9
z`Tc`RJSE5Tp#^=4Y-AS4jex&=3<u<okC{fMAw&-g&=UBGJtOrwXUz~6FJ2b%f;qKH
z>ZYIlD?K67&R#uIKST1nmuU3iG&`~O{vw=Zq~5pY?;vh+!)xa4JSmvdQxw^O@39me
zG-iRur98H4_hi&>TDK<tjTK|<+I4HzQnzgEruO*4h5UAL$^S-a@n*(Ee)$2H+hiDL
zM*}0uvn(}v1xR$1D)tfwZpxy9^@cf61<#{B?oAA7Jh}=D2378%g8u4wXLbJfuFd~b
z$6aK57h-2*v>~e*|D<?(n&(|tKd&y=t~TeRrJ-Vgzx{Itwkmuwdj+3A+1tJ_xN2)U
zh2z$5+g!m7yrQfcZjYSXi(PukKj_>VWXU|wrMavyXnBB6a}m-+uhW0=`{bR-KHX!v
zD^t22#~w|DiELnLSh*3mvSm;H3oqDfhb&}x1`5aceFE5_OIL45(wr+FxYvxnUod9r
z%{z_+7S0zs9%T_@C4;5kfE%!k1%9hqq|0-*(sY@zb!bw=yQ+Ge+;j*{ix)d>G>9!(
z9r%6h7+Nc^(B`@D&MUR$6pjTh;PHA*P`r2;&Pxj4V3|*3SzYslyEFOKviv@9d{r6|
zb=huUl+<|cOBQ<79`!DXm_P(%T<U)=zh!ngHQ1|euY3b>it`eZnjK%QneD+TzDx5r
zuy|UUPLra)Ach*egZl|RF<8uWY4RdW66moBcO~*X`hFMN)1#B>&jogSwe$5e&k*F(
z*lqr0&nxm{0k&7+t{msb;eNkQA&Klo0R}A8_h}!W>&<T)7@L7%2i~t!^YdvkZOd*T
zhI`^WseC>WWV%V<c62r_xKClPcDzd_dqd7oqiF^+Z51uwHhD#CoAR!HipE%&U!P(6
zeZtDCA$i&E(TNJ(y}0DxyyR9ghOGcFoRxo{rNa2V2fHRU(ofZvwtzWK;7(l%fT-CU
zMBQEX>e_D;J7kAAV(aQP&Q(QMUXteXf;pFW$u5r_y#C<)NEoB*By`->+56pT&r$mF
zpy#iyd2arD#?(||kCz&M-!ct#b%3LSJYEH=zQ<<q8Jtw`L|>@W46l_0ICYly`|iN1
zZ44IJ@?8&xSEq1l+R;lKW7ybo*gSyV4Sm|H0oYk?PO8ttP&>tpz@O!Bq3i`W2d&sp
zkAE>_9=)82YHAX&xa4>P8;xBnvw2i=Lvg6_Ph!rsNnviIt2#qU7pMii;D-Q0?l*2+
z;QNTaI@jCuCA+6_R;28?BRhYjP;MUkCcKs!ZVCcueDg<hT{#w_@44<)xRlh%lc;Z|
zYJ}H<0dRGHt{tsbbYC@jbm+*#y&fD{Uv|#0Hl6M4lC6oZ4(U?-kztf%BLwyF#qt?r
zn`?L3mx1Hz_0A;rmZ-rRu|HZ*9pKSo)xoZFLV*`#XZ(g7z~%A}E|ZxNrB)Ze+JeST
zH@<hy2>;DOsMv`@C^2`^N>Bpbhq41~fX~JiU;P|m#(&Hf3d%g|*`8hOC6RQxbZJ-w
z35N4V1p_L3!<M$5H9{&Em-JVR$S$p{Veb~Y%w`Dx(BV#B?OSrJaD4v$mH`iF^^N$@
z!U!6GiQ*q!gR)S}Vpe@c03Y`TfIs|4x1iK73@+MiE(0>p&%(>S_B5JQ1ayltPdf%H
zwClMF6HI3V8uvTLg<6p367D8Ao&oQnP`@aD6<$(q&SPQPW*S>?(Q9b~JTjdPo7=rq
zwvD`s$j{_1{MS^)9V4NtQkNDEibiM89VpW%q*&xz*`>`DY5qxQ<|fDHS5q^vXo>)4
zG5jvu;d24a1V@8jnxBtPL+0557)yxl6t~Y8%*fOpelZMe{;B*e!LclMn(pZ*)7hFW
zSaH|azA{gypb=u5h1qqA=Y5b;#p9Wd9vly!nJ>7&t1?j^k384aI){n3=Vk|XfH!FZ
zB=;BMWM8rY(EAY3B__pu=1&*LdjrENW4=Q<HicKZ%ruW!L_nMPaTmj(xD@Uyl@GNz
zHNPTOSN1v=pYA0&q6140Nw5jqST%Qzj6J^;n^LKGdIUO}A;#4kxlLrQ_H%)gbee=+
zc##1c1@PjI^T{sRm}t)cY)WnVghfnm41&Wh_PFf}3xH*v1;Z7ia^gV{xKB9l{1|&P
z19<c2C(Euf`x9kNUiC1&<0piD!YOcTf<}usv!4OFFXFh9O3oZEBQDduO|`1&5s)YV
zNFwrFNhRx<=MGwZ5L(0_f(at>Z@%j7`c6JkmIVl`VvHKx33{;k1YrG7y5n*H#ER4P
zlhtK_bZ1%-kR`755o>ouMkH8kjPq$}f8xu&FKhrHck_=T`zv$&*VuEl-$uI)D57RH
zE3m@e+BwMc(&Iex$m>&%c7mfJg4~imfLz@ZoC{(vp32z<OF&CX5#)^lTU!@-v|yi)
zkv6FV3itDu9)MXMg7ptM0cbkbM{;4hQ0^qH1Q`S%MFb#~Y0+<zTpdDi?Sy#5%xZRE
zrqqg#?fhAmhQbf>(?WxDO3TAGK8`+uo~v)1fQYdGc~pH%oI5^!67_j!*p#!6yf137
z{Edbacuc)F>a*ri^=9p=BX<WN+lTNaP%%Y(dU9pRGFdshr{QcVxDWD69wFC{033ms
zM<`Amvf5S}P*W{uW(0F(0G@0ra9Nfd%VCXd<@aePR9>q=7%+WIqp>)^yk^mq^F&s;
z(GcLICIXT@@JM|uFR>CnY$|rXJCCq2Eh+?!{eW(0Yrzj%LK7?XU4JT$VXL3ZDg!Jl
z!t=G8*|7;M-b@6LgTOhfH}!e;5X1;eAwcRNc8i-quUfpQg3o9pgi{=VAZ9uin4brK
zXSjL!MD*%YbEX@xM1DGexldaUV1PyJ6<_1xJ41s51gtOO2v|}G^j=)P{POFa;k0OA
z8>HB;8^JK(6|*tnsr{yKo1>yzeGr^$g(F6g&w6$GdKLgQTZYNeCme|ouR5T;CA25s
z8S7t8mIr!lmYQl6T|<!spkPk@3VNUFDQ}@Qmc^LZ?}0q`MG1hDXfaHqv^rRWuq7?h
z-tmPJ1htNZa8#&s{nZQh%eQvJ-0}ea9{|d{xl^??0Il1k??Ht+dI{FfCgvoVQmG>-
zSOF;DdM=<`ZqBaHBc&$fUAlP!AXdCwh;TopPn^KAU)js*;jLKQ^(&k<uH$ucYO!2n
zd8_i-YoQr7X4YT#5=_%MZxeQsl>aKJ$n?mAcR~h$mmXC84Uz4U8ZLg?aLbcgzSwhk
zsGK;Py#{bjnnLvPQU|F)c+aH|uEz&Pt+5RVNF=ms2Yos@-`(v?aeA_;Ug|PrQX8Kc
zEWF(FMh`e>`e^RxRbJ?Nt+eb-2SKLA(|cu2mu?;pLzUN<dLYHa@qRGvs6iL_vi!5d
zM18kH#@P=pL;&Vb<spah06w3;fTn*{h&vxTOgg&HX)JJ964RA03odn;sRVQ4o8VbL
zjjb%(f<<OKYU~5H)~pbJ*K)m$=E{1k{#Vn22NA&aF~{DhLG#{$Ex->u5tY<suVH~V
zJ*SUHgK*{Q{PO#NgPOml#4JW;MtLkVxLowIm%SBlI$jJjIO#=E@Tu<Z#KzNSkb#&s
zT%K;y95<ue#Z)&Qlx(|=*L6Zsb2A>f`lhKaTidu#59ce#G%kiVZd6B{AirL_kVU5C
z!y(MitM_7k+?L&_%<iet;jpinBhKnOuCwofjs23k#P^kfsz4b3L$x(A<8DB4KA$du
zc$!|7_E2fNqz>b=<MjxA<mKHwp@{WOs@Ej4i*A`-4fov0O{B+7c^ONMc{do_>i!fS
zo1#&EO#hUG6PT1OkYn|UXx`raNcmPr4SxqEY)nwZa9-iugC}nGvPsQp1&JcibH6>u
zCTr;r_2<a1ZIe`7uT)SniI~y6zFzH{WIMW3Gvq(yp>@5Sq-9~;JsP1i)9M=e5;fQy
z9;wBgxp>cMCtIIGl5bwB*3jlTL-)vPIA=>(?7j;AFIs{4DhS(_FhonFdti&cBYFiD
z*W*+3gJ4{#<CjFPf|ayZty`fzJ(V=3$wKY)1#3O;SzNHP467`k-(JjTXT{9sb>-g8
zK165PP|&&LRyb)`BO`@<5PVhuF!9iO78m~$-s*>Lzd0$IRdyphC26)DU9%&P90nA0
zH2+18*a+G~-;(eQbCYG<m7(zJ7<6sHWsc3H*gi1nB@PmySl&IbuDlzwl6)r#8fKoj
zYh#dRbW2kt58iz?)QqrysHg-<VYgTYD@uM2A?l`CJB&-8lnMP#(F$WLl)tyBJHlI|
z@Wb9VzJrooe$ZdY`}qM!fZ#ljkis{6+dL!v6-Qx<Fh!N0xJu72t85RhOp|R4M9z?3
zuLur0g6Z}(9~-O>Qa%SDq|fe6gc7EN`A>8UFW^kk2-^ViHv)1Y(7jC4AGTA0jBaM*
zYOxtbTuE)HW5JLtJ4Z7?kVSdVl<c*XxfvDsI+BM|sdc4OJ1h~(8Dl^KPV$!X`W`^~
z<5RSx@mHO%i5QvOdw?goD(ca0Itp*|?>!58l~Z))vs>UQFuA94d0eA9Xm1#IG{|r6
zWqtq%`ly7MCbObF`TMY{{5vjKJEY{4kCD{MQ$4@v{h+oJ7>HufFrqfnUIj;zAeZev
z1`o7Q?7lVe<-~yGy>izL9JkjIXE3fNQLhh`8C(=WcjAw5JLEKuP?XiTt>Bw}lv@Hn
zTm=A|D3v&q;;31Sh8CZVX<uC>eidgX&9~mx=Fk6Gh<eh4cVB+6Jz?1}esr4Q!Gy_A
zhU;ystDi{5Ah+V#gl4+9?V-1_e2P6wrf=Gc%gZ{c#r#36%T;UWB!B)XV6d>5Oc(Ax
zc&#}F5@I~m@kjCZw(R0`E7tO%XS(&@)T=~E1!UaHswOaHL@N`pg@6oY%*_K$>Nc}B
zpd$4V`*S&)@AOkRF@WP~uN)HvH&n7;CQ(C%?<=FWU|i3!=UYN4$Px-N#FB_LrCmWa
za!$e`iGpqDB~<JNJR)a@=NWb@q;riV*DT;)M?W-jH7ImF*lf-4y7&$iu!#2<<aehw
z-XysDs=<`%DO|)Gx;^}XCf2<9^>nI(8L9s<8&@LY5Y(@kpJKSU1aM?AuZ|DL<XHJ&
zn1>3hNMe!)j3Vjj3UX1{eq%WP-IZggSK4;`zgnlq7kRSamsw~R&5t>@+4=kQzwIfD
z3{G|5*UGd!eAGFnH7Yd#i7Y-Zlf~P7`XfYm+Av)Y&mxdkeS%#OLl?^1KLtn{@FX@i
z5&rZjFXy-ky~Kl72=|`8aSv653{Td!lRY;7DQ;KkRfh#mod27~*Rz)36SNp-HC()3
z{9s@^^@93p4oe>(Ot?0o2wv|6=ZBuXZ#x3)YFdXOs~ai)sIPsyDus|iphrn4ae?~j
zH!0gdTy2pd5wT;f&SFRIB#BM^h$XdxKUDEfh{}t(iPf!j9l0Nng8)vlzns%asC<?v
z_Slb3Qu1NG^&ksCRw%a0WhKAZ+z}>UYd3XP3$rDSkcUCey~YN90t?TSWHllmG4^em
z`&GS4LKV>0-NM_XzI3igTLHMnL?!w{ZBO^QLc$#!#mWu0Z3!Nwo`A2Q?w&JBZ#cIz
zv66w8CcM_`0kaufi5dh*>nBOO%MH;lr+ti^kQ4OEKSosYd`Qy<atPdM_=i}!Ij*r(
zYDJltJkXMcA>_rtAgQ+Us3e=dJo)FU{BxuUtju;z%Y$BYu>|SCp_-A4W|e?P{g1;-
zR1Tk;9X)c^5CM&}ZRRntv3+yzyg+PC;_&_8>}ML;$oav;wse?tXA$^=xLO7Ar=O%0
z!!sW%{r=HbpYfLLzJS_?y%pGfQ!x4i4yt<`r7L_g_H5=uQ+Z=<NLDZ4bdK@8dwOmd
z%1n+DVng>Oc+6_8svJ&Y$g=j_GU(KN`O#30P~%5(AX$4MhyU7p2_)bMjwTvJWA7@$
zTM~~{NIG+&*vJwNFXzqYA{o<$fa>HF0&OxLb>EryPR47!9lWAtv0#e(tfc$uTbozP
zol%jM4hjr?<anuW5;n~(;|7}QsxfU`Z|nP4BHt_T%dQ*AwPNj4oRMX36rTCK>sVwB
zfE3VaXu7BDxjjy7C5xi-A)V1ESXNrBpk;v@8%(?wFfs$o8A5wP)m+#_*mWN28dR$u
zn{mg6fr<>!tbP&`&1RD*tw9wux?Cc;SLt|sOPr8hW5%#PYC3%zKRi|yJYGE1KCq=b
zaPyBbau(jmP#g!D2`$%5C?1R}8w%!;O6l{!AmoOGsEGZL#;<?;TK=+x+mo}aP=Ojr
zXHy~d!rmb%101w8@fFuY&rVEZAD2<1gmpyAi|20o_}*d_+g5t&(=o=+S&xM$${;7n
zN>p|UrU6{89-YN4l7T=dRltN#BjLZ2>Cp?}<;03X-p#7+QlUyw`D6vHN<`>VhfNQC
zPzN=LZjdmpL#KGX9G_`e%7~gzjL(>xH=y<QzEkL<38K4UR%tFSqO21y8}kn4a2vFr
zYdrVV?EG;>NktX85PRmM5#Q=KdeJ-=SuJR=Z|34ty`x!CSh(j#Z6=l|Ns*myD|2@v
zM)9^iiQ^OM`qF8q&S)ry0#|CfLl|oRdk_C5HP(~cG6$ISfzraOd~==cQE%}HGbQp^
z+6g-=YUW5ZE*M(HA+nh6Fw%*8VkJ~S{_I#e%g>4Nom=DbtMa-2<c<pg@GXcj-gvQ?
z(#zER_lyHD;oy1<Ej?>DMJ)C3__l1lV~0sK#0Mit98Q)%V?e^nyCTzVp-}o-@0jMj
zh4tzB)E?roy7_FEK_(EMjC;ePN7Wk*old--5g;BzfvQYPeZ95IteELvkE3;YbS@Zx
z0(a|&@}i&);yV)oRhQokvw*acjayP(>cp3<b@&S@OyC_s7gMN#$e}0BYal;!=mAjp
zwiP8@QWuw=oa>7_y-yv+nZaeLl~7rVcZaO7`ODR1I|L2L)nD5AbpQaCMSLAI+l_$d
z*`;ehHIxP5S}1TansZ+M0jMWU^)i1Ciw@;ArOceV1J!=9<4UyYPi~+R@B!v7=gq$d
zrk0ely8?v(E0Et_Jp&NbyoemPIy>JOh5!}R4EdQRP|)Bo0c$ch@;!I$vBirFcnGK0
zL<E2;{f=H6P&i2erP%$cQ(oiLsmH)35&LRDwKoM6jq0kG?~dKY_ATV~<TcI^MRu_k
zqISyw2w{G=5m9DynpAOK9zJTF34F;T4^f1P0$f$>9oR^xIky5<H3k6HnT`g~oCy-U
zoCYcqfoPyetF`Z~!u{Hh+gHx9d0mCqAsPi^LRkW0pS}X~6U#1ifT6AR*tgIx_OV{~
z7q9?P%oYIEmL<((GL;_C(IgpQ14_S#W<U=Gc&Ewa0m@}WZDq_B42Sb}iei&R9L<9h
zA#q^QXT&0OV3D?+^5zdx*mbYqmpt7334%zVL<<^2gp<IeQSPK3&^M2jd7x>da+%TM
zY_6#Tq^mWt?t2<nJBizdIu~TGf$B91Q4AtNVV7FtL$8A}@}?gKNjeFRQL)qZR5Qh+
zKtxtq!2%l|xbIb8_T&SMG426GAOO@b5|sxHC#iESz-LYlf?H)3J0C8jnwK~NE2Mh{
z!_;xvb>^?1)gU<e(*zh^NrGP+==h69zkXhT|7Y)MGL5WRM4)Z*)Zl-*E$ZLKNSA?c
zqm)XVI3rzRSd_2=PhW2aX)R;fUDuaK_YoM&0T_Q9!-Y<q`lbO$#Un~o0l;o5fZhsS
zUhyRl)^#bp#0pGK>@Thawx<Q^%~2XKT_){CIX2K7(7|vEcWJI3)h+@UBi>8;+3>4+
zlOeOSUa@`?0J4&8UTbq6SrhS;b?UosLMC7VV_y+%mZCny(CNR1coF@KO`r}#42_Ll
z{ptZe|8m`lLJZ>lo@c~#6bf{i#{z)*(jVTWFtzBbez|c>^3Q@41Qd2)!DT#m(!aV|
z@&1Eh!P3R^%88X6o92@DOt2m9edYp^^$W{Qk!E<CV3mto34rPWq8VjCrI`qzI#78)
zWoYLYK$BnuxEusfOF;L4(JsLgYagH~bM$H4!0RG_RNtjfXAZs?#+|^}=Z0XDHqa+H
zDyF(@a=E;o=v=Jq34dR1Iy60hc^GS=g^K`&;6_ILHq!LsRhZ4i^sT-x)PSif#Bry-
z+b3+&c+R;Al%}&K%PjA89=+-U5RYU48a*M1&Vn7{LDgl?GSJcy-b4rdlI=cFiPmrG
zZM#iR&ygsh`f9ntmwPtLn)K}+jmSQR<1f1ciT)xdB+F$6N^F|>_#Jtwn=iZMb#<zG
zn|p^O#B@)p=hx#MAKdmvmHXT4>;@;8c8x$~YjsC|!Fu-Iuj)c!F&w;xmdgz_%h_DK
z*Y<|{fHr~vb;yjx0_)5mR~=~NjyYmxFks8R*b!zjAgEWh>MxUds~w|t`|7sN3qCcA
zghehDfoa*`#5(iwlvoRNAv&SuIb=z5?39;qt)rsd?NYq7T_F<QVX?W-Za^-eY&v~A
ztm+gMWb1ip!-%dbAiqQQmQJNH1$A^a9E>Wo1jOC7+k+aIj4cWs>RY$?z2M0z(QgX5
z0!k`E>e{dvv+rw{%E>axxdKISw)V>oi#?`N4SiQsor<Fge*xJ`6#<<Ye&C=p1{u{g
z$wd{SeN;5J1i}-*Pj+QUNlzYqpFVjP%d_Bh2+viuXQ!r<r;dl3g<h7qqM?YP;L{(V
zmh>Ss33w&*1*Sjb)hrrnt*oGA?UaA@XTP^fOb5v(&JccoNVmE{ORKF+%ao)oA$NU9
z`NwYW-c4?&*A^=~C3P<&o4uAc)(l*K_Tze;SMTL(sX(;VfyhIlK4Wn@fa|G#PBv!;
zx3O)(o|-xV`aSx}ZU8YrX0m&lpxf1}5$pDtjW5_T#3AkzC(-`?#xU#XG?hc=te({~
z*DGD<Ksd<@D@zQ0&Z<+$Bz~OV+o;%4$bCS8%HaX*NFPZyb}XLFv<mbLo4!-L#K`9M
zv1OBwH;-q;EqT6ehhG+9I@~4-XP3j@(l)^>YM*Uz8DlWo;O;wHOK}8OI1Y)OD?#h4
ze`(k?$=-cDlTe(X*2B9P$TYywtqhsy7FfyeI~5v7$N#DH&VW>==yrUL*{jLb4#wcT
zwSYCT0)I`+0c^4qAnRt^dj_p5BCZT@yS$T1->?CzA@0aj8^*Kj+<XrEBEsKPQ@mT2
zSM0L|d!wmnmN2<UW*jWyN(!jHM0?qf1^1`V6=KkPA8Pk&=jP=a7c$*9&3K&!lEV7V
zcQR{Q61!ciqLUMqJ_%e5oy}}KJts{RdW2IBf&!tp&In;|EKt4XWrK?ptxJiTOfGlI
zU6IBB;01~(CUrJG@s$g<foF}Cx{$hj<xF^0ZN}5Fwd|uWL&+^z0ye48HyX#$^)v5L
zb<RnRH{SpT(WY(po=qIX{L+69yJc;qXx}q<(Y8I4gbTR(?E%xy-Q|bwQd{=_iWJNC
zUo83t`JTKk0dQ1%9hRa~ayGU#PRf6&X?gb=(mL#xM^4>L<_gks(QeQ;f=q{#ci<D}
zFona3k*!~y^Dv+}WhVD#FR8<MT^j`<*JYYeZe*t)TOw`C<4-Ql8iEMCoRj<C2fMbX
z58XeYuHIcfxgzcsYm;7;do;QU69&1JofaD7RVN$`c68}Km>#yz9kcZxN}*wS5Ga;g
zU962m|FHN)I+@2Plhb{JY$-K>=~;2<vV*%j)j@C~UhUk~J(vo}rfm>)k2;m)Q@%o)
z)|+SYPeb?T?0MaSY^Y>|L$PcGq5tA``Z8=Gm#dzWLpH5xBfpPY`JA9!K$GXej5Y<S
zHWP;FO0QOF6{9F=S#Ha|At}>AmPOISO_<vetfk<qo&*zrq1E~bEX+d@g}jx^4gJ+@
z1pH%|+p!3`B`shxV{j3QIiL^DeI5gL^5Jm>P(?;nKpw<jMR7Zup2smmY0>6wc>Odw
zNM{KZ8;@Wmi}=7feS)EcpqE|@-b^f^ocY+)c2|BTJ?F(|6}<bd>hpeHDP;F+-<xk_
z^G@KCN0&u^HI;OUP*fE<42IcqxTxCc92HH@-!48V;@IQ@LUtP<#>YTQyYW1=3rH%i
zKCPGgZuV-#OP90tTgrTX%ELNYXL8Vp>@T#G7$w@HWR^}gVu=<RiAeFc<Yp)suO^Tp
zORUzruO9Bvl6&Rbet&px>>Us%0}pfz&QoAc-&-mASSRr*jV2ALp1)P)Zak(J#RPox
zpti)T_=Q39D#O(Uf3OXM6#f>yfuUc3B&M#CUc%L8;cLReg7>mM%pG*JR5mfxlhOr*
zx!H1*E*C!;;R_!i0Wy6{1WPsYW9fHd2qgs3m4VyN*E&cpa@~BkRP+QgfiKbuT(7jo
z^gbP>B6&4d{$oHu%GYPbk~pvVq^FJVTy;=b4QO$<#BZbB@hB?gj|QA*(Hcy>wj#W@
z8q&CQk$Jx`p;8k_qXWue;+ViW&b3*W8@PV*Yub75<u#jmZb*7E8RDrK=okom-wmtx
zYQv=icW~?i2*7}I*#ff2tsh5R#~Y9_M5DVH&r8K%PDfoq&+O2~<I2mpG)?S+=EC{2
z;?<njWz#~5Nllv9Cij){Oh?m&)D_f<U-qMP5Mkv`SFAJl#xwh8jb~yfG?y1NC;ajc
zf^wyUf&jN*-3q#>6FdLPX+`EEc82S@KYj%>IE~Ip{ivGl=a`TKir~!yW~SJJ>-Ari
zCTw36WyTR1tDE6T7YEp>@OMT_r~r51I$<CT`Y}nN0&?SS+JdQMGUHXU+xV7QlrJNA
zP4Hd<ciDLNuziCoRd7Ca-s8jg1e`A?xuX4__O3Iksczd+LI}Nx7$87s0#XD4NgxP;
zARvOMp!80H5T!|1TIe863@yY^1f)olrl52pgrcGl3xYuCNLOsU;OBjJ+&^!O_xruy
zIXQcueb!oM?YZZg3)yQc$7<!qDFm>hGYcpP9Mn}Q8&|0s&lyu$n5T*+B6;ft2a#{U
z20-MS5eiss6q;zwF;DVJmB5kc?B~?H|GU$`YW!aJxdnf+{ws3jAlr<782!)cEk9}h
z{lm$i_V-K8XEIAPTFAiYtz`r8@L!&$3($%ii<Cu6lvBjE$iYVNZ-#Q@)+epVFXHWd
ztRce*g7uvGupi%KBb}PjXSVkbWE<8Hc4Og_OWkML$&`kJFB8v_sLDbVDrXGYy0(3f
zcDOeR4Zu2xSq}IHr9Ar<c_2}?9sc{rfdT!tutzXGljmZ*F}#VznIWoa53Gfx3QBzd
z#UAR|wz@J?8Sk#v+mZdwoquKpyB?+=A8xDd)17C?VUWv-`@OLAe&2WU&Up7x7V5d^
z1+8)EPx(mRC3%F6z)6#VxAVxr^HejZ4a}vvvbG3JJg<NfXHi#{#N|_Kfv~{R!lTK?
zGF2Jj*Ge4hWpSh@32HwQo^P`*t}_%x-<N>iI@c<4sC0&Vaufrc88-%QEsmt#FBXij
za6@|a;SaWIf87{wTQBebmYphfZ%4tM7v<$~K`@)^@4&;2^LL!n0!|=t@{s`_2x>T>
z6H_)MT$KCFAkB}z&XmrKi(Y%C>muF$T;VA*5E;mWxpFdNU+ct33%28;FVqh;@rBX=
zUmuP&3PxM&f9~rZ`r&8s?Wy4qwK6cldz{VW`_WS%Nt-5FjzOB+$`(nhY_&{-04pGQ
zb6d!E<tZ;sFh9|<C1+?Sq7R9hI#ju_yyH01ci0H0FW^Hj3r=Pu@ezE(ki*Xl{BYBD
z3t;!HKxS^NuAOV=d*oNmXYGbV0afm=Wl)6~rA`t!DF1S31Ib;hS(&HOZSaOxPtx}1
z-A@Pk*e4f5P+1W=P=Fx^pQh(Xn5s0LI{!>oghlH$-LJ)m^EzfRXO7wx{rF}G9M~DT
zLX$3zgK6wPsz&2lHZ)DiMbQ!7dkWxY!C9{qypKR#F<q*tHUr5k6iy#b=ikG9B7YiA
za2dJz$6|`(;Q2)d1z$G~sq$alOY|`LoH5HKa5Y}LB7#K3%NLW;o6!@-U94%}MZ&DL
zs;Qet13}u?f3qjuzsgH=KSI80zY+8zc*^f4*}(3F>PKzI@%ZNYCb<-z+Fhha-Mx>X
zBE_^8n}cqG$9Ch3^!{FN0ZJ&-+qWM?(^v)yz9x#TH!11yxlbo`&XmBk>|?TH<6%F>
zrXx_}kWdTVsXf~?>_xjT`O%!ro%!MVKta|D32~`$#MwFd0=wiAGAV@Mv<D1Q6+>jI
zaRUspZ^vB15g}uolBNh73wDTI4(l6L56!{{qOXgv5xI;8O!BJoDyLyEK7%>6?JYlY
z5{_`nh-sbW>2z<en2bo*BPpIeGAhlh3<R4)h;K6au&T0$BPU%F<LgprF%2rRKOlCS
z_<gBXwtBX#9R{oUdFJjF^m|b9^eTl|of`WoKdywOMxP6T9BbV!u~oNv#n<H@c1Dse
zyy6%K5W=T3QboBX_TnzhMrfy_Ew~&C)n>_^HlroH$z5iiius>E3)k7+RP`Du03X}X
zJni?`<rrKRnMv-v3~EiGZTfpFLCVzG#uF}>=qiq~4aQ^cinGji7KKgjnv0*SkIeT<
zbC$*r-z$v>5^$VW`YI!<-jm5|wQHN!lIa+13VjMp)X02w1jh%_m5o$__wmJtC07{L
zrT7gGk3sDGpn;>NEsCfZvOnGbT>s^NNh69hj^D1Os?b;%U+2TwjMo<ZYSVADM^JA+
z{G8*2r3PI8z;YQ>_TCvUQE41uA%5oGkC{>ef54V}Mfr8Ag(!E>fK*K!9*PrS#K|ea
z)jT8TGMOb_^8c*SmPjVF=<S%xyvI#+coiItQR`<2PH1ST>=&{>##U*QPNVobx+!EI
zQwrj`!0ZU+Ynp#V(Mc0S0i!tKL1f#JG{NV6mAUt$nVVGYjuSMpt6YaxX6KJ{dI%ta
zQRgP5+cP8)w;-^0M+}R_{jpl|vWWzXc8{KADZ|al*mwR~Q)ji3H>~TNPBUkb?%q+{
z0~@Loe!gX5?Q(a6$p>y8w!nmwWdf*Cnjv&ff2ITLOItBfN7T7a><9VpkOV<HzvP#d
z#zsi=*#uPlB%tp-^s6KBIPAP-Of%y;cenBuC%fwr;hquX!^Mp-6Kf_M`5}>#@#WXa
zGkz`>qT8#43t2l+7fp-JQe~vi!ji!(8+h+2p;_J@#eT8uUxHh{mweaVf-|$qP2w+V
zUG3{oi}O9Q7ss5Q7YA%KnJf`-sTzwXr{6ZzCB<+1yI!6{g-gb9Ri&%Hk}xr9l+syF
z*ZDj?=awxwIcNrl6p5@*gw{Dux?kqiHZ)~$950DWNEMaP3DR+dGNc>6+OFSb;lHRh
z$~2yOtu0b8jd4A#rjNeit^9t=fg9&lV-$cUTqEOOQ#E+QEbwa#9Td(>@Uq)J04T=f
zcF04iOQk}HmErz?>^WNf{H0hVSKwRb#9tZZ>S>bv^-{-A+Yr?fDHiR+7x~vPr^mE!
z!yp1K<dHg7_ug50okPVt+`T%T)iqk1Gp8#n-S1m6l^62IpE@?kcZT+MR*jwv%Z+BJ
z_3$pG7i`Tn;szhUxDzxcE=s*JsOl9!;aYxm9(Y~|VoP77;3+pj0X1R|FzE&(!U7oi
zv$zkO=LyCrt6`w8w>f4gKysQX+D5)<Mk!s!$VVBl>!n0@rsdq8d4?<ot4nqXxX)Q|
zXC7lu4@$Lm00m1b5LLn`@{)p2*Z{hzqi<X%5EQN6LRDbYs(JGx^VaHy=LumD3_jhd
znPxWJ$N>h0Dd+?%Nkt(Ug*JkG<8vF&%yUZ06}+bU!zIVZ&huD{0ELcZs_|oF54H$&
zy~bR9jFhUM@At3`>*8>>sw|;bu^$tUWp)d`U0@nlt(frNyQiLhxD?m$T?~mgjbAYX
za}cLbAtdPv6{SJN3qnxgp9Kd?Cp%_;DOd(tiw)(uFl-zloe${Te`S}fER;YXs`^<(
zwi|A&&R6$DoMft-hx4H<Ytz-8ufx&)F{2S-_QWIVZrsc<fMUAN+-I;@O78>X)xugv
zgx13hN*bQAgN5nPO_|?3EiuJ6feETWy(tGB?yIu)SE!#1)@CRwyndaGjz&3gUb<b6
zkaC%3j00h}Pl0|oY)wngSoR&C>zbO?2+D7a{wZXBTGiY{zS&-I+8FW}FOeiq{i(q!
z_>Xy*XD<qE?7WysO_Dsd2~HXdq*t0e&Aij)XUHx+K<>&g2&$c@C=N^QKFY!7$9b#}
zQxBIzj0`<H-LAQflDz4)v)TpSqLahwb+yy!r8MynB-CRJN1v6;+X7-vs)4x{s4Kas
zjL0DeE?6dCu#<l<Sw|r2&~h_e5HI|k>w@WmoSpr1w~NF>&TjD;viD}&SnHbKG+$VP
z1%59f%E=wgqP{$Je2_K;$}i`y)9LuwcSTN?s0m~UZkd~NJoFSDg3V=c{_r#O#GFKd
zOMI7AaXWtK_!I2N9b&l)Z>Dws&GU7HDJ(nzbZ#c_F8Z<Sm!i;F9=c)1klE%@t1YEv
zIM=2&D~-h`uw-a5*%zY~+j#uf@j8*sm?Ryaft#x_NgepF`9}Fqi66%SjC@Zy^6zj;
zie~MQup*A$tGAyd_D*FIqXC(hf7!?f{Ye_LKi-V}W|6Mfxi7gy&nqygJ?%*nEp4mw
zq{86dn~(1&AsgojmASzLoW6UMHG8<rw#8cK7~L&ePv#|I-Op6et8ZTcHt({=gn$i8
z-`hl1H@4{x)R8QW(MBmz4vkIF!E-)Qj!jse!NIbqDvY`N!@K=@%88ut+fvdM=(RP9
zibY0R{VO{n=4t$jz0V>bleIEfnrme(@&27UqvOTR#=TL_u4C9xF~-J07S$Q}fy8;|
z@lUId=i$f~mSzllY&>}R;onF}f7&l}QT5z{4TpsK*Yna(Rqs0mxpBL9WPJ37D?BVX
zrXH;GG>N1Go=g>JS|%!r*t(=iab=kaomNc0#GBM1rGrEB+N_VBi~gJI(TY^GRX*|#
zhYhJMq?EEAOEZZ2MbnF(=)kL-BQ>3Ohdk0v*0dYJD_&G4i^i#Bh8H#?FehnqXOL9m
z)E3@Gj|{Gb`nd{eKepWkz1Z7DJ_E;3Ie#&Ub4S>MxFO;8&-8XGJ;Il#g{rlI=-?LR
zVUlpzV<(0k>xkoF2C|6=3kRN)1xTDq^iG!i?f6-YK)dq@iR#~;)@vgXb<T7HUzDsL
zQYq4oz)-K4F`R$bFw>SJN;ne%T4(ck0Mcuj1)HW}y5B$X4=N!z;XN{^+7_D#3&m8#
z*!$g(%Te6fB~@19Sn3#i09Hx5;1o6&VyU*t4@9~r=_=Ty*9_s7a~ld~q8`rN%G{am
z_5;r)s)FyPQuYH*t{Gof%vZvgqOF{s^e9_nKpZI1<*sf5F1KeRu9Z)IQ_z*j;9|dL
zRV9>@AvXou-8u7^b+rQ>7p0WBLlNRY)!H_fa}$w`P}AXa{hh1pZ#{S|eMa;AiudBp
zc|)OovLOlza`6{OGG?BAtB)5)dA)v(g*zI#>Ifc-Ns5IN-hZg_G8t(o(x_crE4KWy
zqe^lfnaz-~?=S1>lKhsOl>ME0$3G%3?Q<r-k!|>j;}B-l&}U-S49auDGR=)*=4x||
z=P+C{YBgg;fXFM#JurS&+0T|hq_p}TM&@;WV5o)Y<sdbPiU#3imY6+^8~I2gmQ6Ag
zREPqDC?z2U9m3a|Zy~I5qJXn<4dqK#8U+E60I+(oWqZ88Sf>YQzn<U2yYG9r5{k7L
zq|IK*j0WzanjPb2Uowr@LBrlO=mc(X=CfeZiwFMu@s*Ztj$n*WyGR;_?XjK;1L^j}
zHTGwded5qVn_B!U^&uy6ojDnACYu4_Qhb9+jA>~P##X16+|;iTNt5c~b(2@QHA63d
z`Pr~7L2x1p3v0achF?u|+y)wW*RhfAx_LLE@ipO!^c;X^uzBQd)U1uLU{a&3D;x*S
zdb5r3*#6JQHg{E!Udo$-0e-3vWu(Y|Qe&-V<v>jNqx<ZNs9T$s4&1inO44dVEYke^
zzA<(L1+4cq-4q@S1%H*iFa@waSnZO;oZ)M@N7<WN^@RIGFIo-lse3c8q(xId)(k-_
zrif~@_SYkl*~1iMCTJk84-=#l`A6p#${Fjm=~1U+KS&Jmh{`xIBt%z6l|}$|?6!UN
zq>~FJ>#d%2urBzT&qb`bmFD#%O|QJO;sH9Aeihc8+-+U3@!jtP<?HCR2KwHK4$UvW
zHj`xC4NV-c{AEM+{TuWMU@w@HbfNtzZXvbV*NR_RSn-W~@r#Ks2sAUdaV&go<OJ^i
zjRs=7zM3rE@M={N08|ofrY^p$@bg==J-_D`P^ZWFO;sJ0d>-2ulCcAAe#^CJr46E6
z^)%JZIVgCA4zaW>P#Pwd=4TR^ZRGRBB!E_`Bz4`pj1ln$Ru|j%sVqCTQ{EII60M?A
z^Sbd#488U+bGmgA6<&Ulzu<_^UUBvq!E&|#I~e{c<gMf_4qWs=Nfon>@Mz8MBkKL(
zZ{*XO7gVq2Jkh~=1<cG$uIysgdFi4Y92a`&Zd_WvI(qKa-&gwDNuxl$+=aI4mv~V$
zfIZSJRt=@Mi>Zua3^+|xw<8`)q&?PRGhYkaVjm`?igXw_18?&2wC5EGg?ipv$#^4u
z>Z>qwh(HH>!EItvO}R2}hZp9@bDND-Rh6et1BC!@-Upzg?whORdxXtq`&MysbALuC
zf-i%$0o=!*MAJAKJMaq*J28Agg&)N#^`%z!HpUjEauzrx>2@r%$1jtBEAcFwRnnXn
z0~+U`86Lw=xzQ6c&CiDBJgq4fui3bOVrKj?&g;JL<*<p06Q_$9+nhT;`%j0e>j#EU
zj?=mtQb=Lr^W3#RBwjtx$x#7Gy2a@#_@w=fV!u~!bR*@K=2rd4OZWuM+m73AN2jr_
z<h9#nYfXqR#l)KESXaxqOM&R-=>41pK&l8XkL*~ybwyb(aPGxLg|8|>$^GLmfgizn
z^+r^b#f7<RXQfrNZDHV2#!$j|$IB}G+Cv`M#iIx}&Zxh}U8-hav<)$s<U7ZR;wqap
zTn`2uyTLEs6SUW_P;L)0H$4Tw;TPtGYE=y~SV<?clhe<Ibp5Pm&&!nc_h*{bgN@_q
z>mxdt6a%S+a2p{`7<ne@(3oS#9<&)t7r&!dU&oY7iNmJ(Ou{-l40d0#;Gz?<Cjmt|
z6<py-zZ1R{w|Soi|0BcXCZDc+GlBqu%(luR(`ptCEK~Yt=@nrQuo2Cl$jp<LatpHc
zCg!~|f8fD?w#GeFhaP%Lkf+u!fku`}MVgPk6q(iaVQ=8$)F_89HbM5ODytlqHOi#*
zR{P%P=7yd*Yn7P?w=p3a4Fxb^ua{w#XYX-VLr%J`b$;>$#szf!3@y(TCpBe)W|Y)8
z5HH)hIln3_#cJyxGwi(Y0M$GECQQ&AyXZT;+>alh(7^^XEj5OTzKGg5^JChS{&8mI
z3#l~o?X9E$MY|WR^P#Tl_izHj>7w_zc;a?$c@6~tJ?tVK8RN$@Am(D5lINdKN*}pU
z=7C2Abl#{Q6(Gx2DT}1Cime=*0Z2r)-&hykhBu?@6AJ7k2ytDVo<U`|^cOW?vFHVg
zbn{?RYwRKLx%>z9LBpPU)1!q0>>9+?Sxxy(5AiNbXvXDcrikWPcP)3Il2Ox#cbB=-
zJN7wT;%R4<N7=xa`qeI8VWnz;w(JZ1$@(H;62kgF@*3kY@T<9bqDr`quY#a4y473*
z2B>(BK6GQCtlw~;V^jv1{@vC+R%ZI!&+dlyw9i-cakSSMiREmE(EuN=sj-t_!7uGV
zUD%<_ks_fOT<=(yXGIFButJ4Lk*<F9@i#pyx@Dm!BJD6;4~?uHXdPJG0}t}KF5l?A
z0tsIg?dE8|o)!Uo<_JK}PbL?BZYP&Pfnuk2%mI5M_fV&sV`YVna<-3zTp^x;#Bj#L
zp3@PP-aNxE_3zN(8H^7knPVMm8jFCd3h5B~aJ8&@mXnat1=T~~`4EEao$Ps-v!hC|
z$Z~AF1@RYj!w!{29jSv)Y$X`9<7?cT=jBomge+d+RDspUHyR<Lm4+i|0bAmc03o|f
z)_sq=yaP=1#tTe|{GsW5y0g5J_3LLJvs(2{Nw|9x<)dXEg~Fik;EUVS<1EEjm6k)V
z7zJAuULQ62SHKm~Y7CEA2)us;O=<cwDR_l4z!beRS<S3M3p&KO8$+G<N-j({dE8ZH
zb@>NCcK|)N%<mX?r>y-!#kYHRFI7Bq#Zr*qf8g_DTSlYLaofQHw6s5W5DIY+^_ueY
zpz!^@Ag9x=@TX{8o6jLcj*V}1<6HhX==JZPd0QSEbQFx;Y2*2G0r2;Shx-QlZrt>b
z(jk<t#qqpEZ+e%0|7ZV?!vE))#2X>{34>}WigiWN1S9NrK8c%$&hAOIIhVx!)+TF&
zE`G-^DptAjw=Yk1;US^wurts9qJNLtVV9qVYVS;X&A{P*-!}b^?SQy{ZHgZpjCuV&
Vp;=PHO>cF?U}9*FdZh0h^Iuh^t_uJF

literal 0
HcmV?d00001


From 4f7c66238726fd599a7837907c17580da7a15f7e Mon Sep 17 00:00:00 2001
From: quzha <Quanlu.Zhang@microsoft.com>
Date: Mon, 23 Dec 2019 19:16:47 +0800
Subject: [PATCH 55/60] resolve comments

---
 examples/nas/proxylessnas/datasets.py         | 20 -----
 examples/nas/proxylessnas/main.py             |  6 +-
 examples/nas/proxylessnas/model.py            | 21 -----
 examples/nas/proxylessnas/ops.py              | 34 ++------
 examples/nas/proxylessnas/putils.py           | 48 -----------
 examples/nas/proxylessnas/retrain.py          |  3 -
 src/sdk/pynni/nni/nas/pytorch/fixed.py        |  1 -
 .../nni/nas/pytorch/proxylessnas/mutator.py   | 81 ++++++++++---------
 .../nni/nas/pytorch/proxylessnas/trainer.py   |  1 -
 .../nni/nas/pytorch/proxylessnas/utils.py     |  4 +-
 10 files changed, 52 insertions(+), 167 deletions(-)

diff --git a/examples/nas/proxylessnas/datasets.py b/examples/nas/proxylessnas/datasets.py
index b0a9731429..b939005749 100644
--- a/examples/nas/proxylessnas/datasets.py
+++ b/examples/nas/proxylessnas/datasets.py
@@ -1,23 +1,3 @@
-# Copyright (c) Microsoft Corporation
-# All rights reserved.
-#
-# MIT License
-#
-# Permission is hereby granted, free of charge,
-# to any person obtaining a copy of this software and associated
-# documentation files (the "Software"), to deal in the Software without restriction,
-# including without limitation the rights to use, copy, modify, merge, publish,
-# distribute, sublicense, and/or sell copies of the Software, and
-# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
-# The above copyright notice and this permission notice shall be included
-# in all copies or substantial portions of the Software.
-#
-# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
-# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
-# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-
 import os
 import numpy as np
 import torch.utils.data
diff --git a/examples/nas/proxylessnas/main.py b/examples/nas/proxylessnas/main.py
index 33351f30fe..a675cc7231 100644
--- a/examples/nas/proxylessnas/main.py
+++ b/examples/nas/proxylessnas/main.py
@@ -1,6 +1,3 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
 import os
 import sys
 import logging
@@ -26,8 +23,7 @@
     parser.add_argument("--dropout_rate", default=0, type=float)
     parser.add_argument("--no_decay_keys", default='bn', type=str, choices=[None, 'bn', 'bn#bias'])
     # configurations of imagenet dataset
-    parser.add_argument("--data_path", default='/data/ssd1/v-yugzh/imagenet/', type=str)
-    #parser.add_argument("--data_path", default='/mnt/v-yugzh/imagenet/', type=str)
+    parser.add_argument("--data_path", default='/data/imagenet/', type=str)
     parser.add_argument("--train_batch_size", default=256, type=int)
     parser.add_argument("--test_batch_size", default=500, type=int)
     parser.add_argument("--n_worker", default=32, type=int)
diff --git a/examples/nas/proxylessnas/model.py b/examples/nas/proxylessnas/model.py
index 1b5483f4a3..ee32970d7f 100644
--- a/examples/nas/proxylessnas/model.py
+++ b/examples/nas/proxylessnas/model.py
@@ -1,23 +1,3 @@
-# Copyright (c) Microsoft Corporation
-# All rights reserved.
-#
-# MIT License
-#
-# Permission is hereby granted, free of charge,
-# to any person obtaining a copy of this software and associated
-# documentation files (the "Software"), to deal in the Software without restriction,
-# including without limitation the rights to use, copy, modify, merge, publish,
-# distribute, sublicense, and/or sell copies of the Software, and
-# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
-# The above copyright notice and this permission notice shall be included
-# in all copies or substantial portions of the Software.
-#
-# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
-# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
-# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-
 import torch
 import torch.nn as nn
 import math
@@ -55,7 +35,6 @@ def __init__(self,
         first_conv = ops.ConvLayer(3, input_channel, kernel_size=3, stride=2, use_bn=True, act_func='relu6', ops_order='weight_bn_act')
         # first block
         first_block_conv = ops.OPS['3x3_MBConv1'](input_channel, first_cell_width, 1)
-        #first_block = ops.MobileInvertedResidualBlock(first_block_conv, None, False)
         first_block = first_block_conv
 
         input_channel = first_cell_width
diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py
index 3bfc66a8bd..880f395f77 100644
--- a/examples/nas/proxylessnas/ops.py
+++ b/examples/nas/proxylessnas/ops.py
@@ -1,23 +1,3 @@
-# Copyright (c) Microsoft Corporation
-# All rights reserved.
-#
-# MIT License
-#
-# Permission is hereby granted, free of charge,
-# to any person obtaining a copy of this software and associated
-# documentation files (the "Software"), to deal in the Software without restriction,
-# including without limitation the rights to use, copy, modify, merge, publish,
-# distribute, sublicense, and/or sell copies of the Software, and
-# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
-# The above copyright notice and this permission notice shall be included
-# in all copies or substantial portions of the Software.
-#
-# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
-# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
-# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-
 from collections import OrderedDict
 import torch
 import torch.nn as nn
@@ -68,9 +48,9 @@ def forward(self, x):
         elif self.shortcut is None:
             res = out
         else:
-           conv_x = out
-           skip_x = self.shortcut(x)
-           res = skip_x + conv_x
+            conv_x = out
+            skip_x = self.shortcut(x)
+            res = skip_x + conv_x
         return res
 
 
@@ -90,11 +70,11 @@ def forward(self, x):
         x = x.view(batchsize, -1, height, width)
         return x
 
-class My2DLayer(nn.Module):
+class Base2DLayer(nn.Module):
     
     def __init__(self, in_channels, out_channels,
                  use_bn=True, act_func='relu', dropout_rate=0, ops_order='weight_bn_act'):
-        super(My2DLayer, self).__init__()
+        super(Base2DLayer, self).__init__()
         self.in_channels = in_channels
         self.out_channels = out_channels
 
@@ -161,7 +141,7 @@ def is_zero_layer():
         return False
 
 
-class ConvLayer(My2DLayer):
+class ConvLayer(Base2DLayer):
 
     def __init__(self, in_channels, out_channels,
                  kernel_size=3, stride=1, dilation=1, groups=1, bias=False, has_shuffle=False,
@@ -194,7 +174,7 @@ def weight_op(self):
         return weight_dict
 
 
-class IdentityLayer(My2DLayer):
+class IdentityLayer(Base2DLayer):
 
     def __init__(self, in_channels, out_channels,
                  use_bn=False, act_func=None, dropout_rate=0, ops_order='weight_bn_act'):
diff --git a/examples/nas/proxylessnas/putils.py b/examples/nas/proxylessnas/putils.py
index cf2b23d6b5..c4900067a5 100644
--- a/examples/nas/proxylessnas/putils.py
+++ b/examples/nas/proxylessnas/putils.py
@@ -1,23 +1,3 @@
-# Copyright (c) Microsoft Corporation
-# All rights reserved.
-#
-# MIT License
-#
-# Permission is hereby granted, free of charge,
-# to any person obtaining a copy of this software and associated
-# documentation files (the "Software"), to deal in the Software without restriction,
-# including without limitation the rights to use, copy, modify, merge, publish,
-# distribute, sublicense, and/or sell copies of the Software, and
-# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
-# The above copyright notice and this permission notice shall be included
-# in all copies or substantial portions of the Software.
-#
-# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
-# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
-# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-
 import torch.nn as nn
 
 def get_parameters(model, keys=None, mode='include'):
@@ -77,10 +57,6 @@ def make_divisible(v, divisor, min_val=None):
     It ensures that all layers have a channel number that is divisible by 8
     It can be seen here:
     https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
-    :param v:
-    :param divisor:
-    :param min_val:
-    :return:
     """
     if min_val is None:
         min_val = divisor
@@ -89,27 +65,3 @@ def make_divisible(v, divisor, min_val=None):
     if new_v < 0.9 * v:
         new_v += divisor
     return new_v
-
-class AverageMeter(object):
-    """
-    Computes and stores the average and current value
-    Copied from: https://github.com/pytorch/examples/blob/master/imagenet/main.py
-    """
-
-    def __init__(self):
-        self.val = 0
-        self.avg = 0
-        self.sum = 0
-        self.count = 0
-
-    def reset(self):
-        self.val = 0
-        self.avg = 0
-        self.sum = 0
-        self.count = 0
-
-    def update(self, val, n=1):
-        self.val = val
-        self.sum += val * n
-        self.count += n
-        self.avg = self.sum / self.count
diff --git a/examples/nas/proxylessnas/retrain.py b/examples/nas/proxylessnas/retrain.py
index 5fc707103c..a7afb62927 100644
--- a/examples/nas/proxylessnas/retrain.py
+++ b/examples/nas/proxylessnas/retrain.py
@@ -1,6 +1,3 @@
-# Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
-
 import time
 import math
 from datetime import timedelta
diff --git a/src/sdk/pynni/nni/nas/pytorch/fixed.py b/src/sdk/pynni/nni/nas/pytorch/fixed.py
index 125e848fb2..bb49819c61 100644
--- a/src/sdk/pynni/nni/nas/pytorch/fixed.py
+++ b/src/sdk/pynni/nni/nas/pytorch/fixed.py
@@ -77,6 +77,5 @@ def apply_fixed_architecture(model, fixed_arc_path, device=None):
             fixed_arc = json.load(f)
     fixed_arc = _encode_tensor(fixed_arc, device)
     architecture = FixedArchitecture(model, fixed_arc)
-    #architecture.to(device)
     architecture.reset()
     return architecture
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index 2934e08d39..a289fa5714 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -38,10 +38,12 @@ class MixedOp(nn.Module):
     This class is to instantiate and manage info of one LayerChoice.
     It includes architecture weights, binary weights, and member functions
     operating the weights.
+
+    forward_mode:
+        forward/backward mode for LayerChoice: None, two, full, and full_v2.
+        For training architecture weights, we use full_v2 by default, and for training
+        model weights, we use None.
     """
-    # forward/backward mode for LayerChoice: None, two, full, and full_v2.
-    # For training architecture weights, we use full_v2 by default, and for training
-    # model weights, we use None.
     forward_mode = None
     def __init__(self, mutable):
         """
@@ -51,26 +53,26 @@ def __init__(self, mutable):
             A LayerChoice in user model
         """
         super(MixedOp, self).__init__()
-        self.AP_path_alpha = nn.Parameter(torch.Tensor(mutable.length))
-        self.AP_path_wb = nn.Parameter(torch.Tensor(mutable.length))
-        self.AP_path_alpha.requires_grad = False
-        self.AP_path_wb.requires_grad = False
+        self.ap_path_alpha = nn.Parameter(torch.Tensor(mutable.length))
+        self.ap_path_wb = nn.Parameter(torch.Tensor(mutable.length))
+        self.ap_path_alpha.requires_grad = False
+        self.ap_path_wb.requires_grad = False
         self.active_index = [0]
         self.inactive_index = None
         self.log_prob = None
         self.current_prob_over_ops = None
         self.n_choices = mutable.length
 
-    def get_AP_path_alpha(self):
-        return self.AP_path_alpha
+    def get_ap_path_alpha(self):
+        return self.ap_path_alpha
 
     def to_requires_grad(self):
-        self.AP_path_alpha.requires_grad = True
-        self.AP_path_wb.requires_grad = True
+        self.ap_path_alpha.requires_grad = True
+        self.ap_path_wb.requires_grad = True
 
     def to_disable_grad(self):
-        self.AP_path_alpha.requires_grad = False
-        self.AP_path_wb.requires_grad = False
+        self.ap_path_alpha.requires_grad = False
+        self.ap_path_wb.requires_grad = False
 
     def forward(self, mutable, x):
         """
@@ -92,10 +94,10 @@ def forward(self, mutable, x):
             output = 0
             for _i in self.active_index:
                 oi = self.candidate_ops[_i](x)
-                output = output + self.AP_path_wb[_i] * oi
+                output = output + self.ap_path_wb[_i] * oi
             for _i in self.inactive_index:
                 oi = self.candidate_ops[_i](x)
-                output = output + self.AP_path_wb[_i] * oi.detach()
+                output = output + self.ap_path_wb[_i] * oi.detach()
         elif MixedOp.forward_mode == 'full_v2':
             def run_function(key, candidate_ops, active_id):
                 def forward(_x):
@@ -116,8 +118,8 @@ def backward(_x, _output, grad_output):
                     return binary_grads
                 return backward
             output = ArchGradientFunction.apply(
-                x, self.AP_path_wb, run_function(mutable.key, mutable.choices, self.active_index[0]),
-                backward_function(mutable.key, mutable.choices, self.active_index[0], self.AP_path_wb))
+                x, self.ap_path_wb, run_function(mutable.key, mutable.choices, self.active_index[0]),
+                backward_function(mutable.key, mutable.choices, self.active_index[0], self.ap_path_wb))
         else:
             output = self.active_op(mutable)(x)
         return output
@@ -132,7 +134,7 @@ def probs_over_ops(self):
         pytorch tensor
             probability distribution
         """
-        probs = F.softmax(self.AP_path_alpha, dim=0)  # softmax to probability
+        probs = F.softmax(self.ap_path_alpha, dim=0)  # softmax to probability
         return probs
 
     @property
@@ -186,7 +188,7 @@ def set_chosen_op_active(self):
     def binarize(self, mutable):
         """
         Sample based on alpha, and set binary weights accordingly.
-        AP_path_wb is set in this function, which is called binarize.
+        ap_path_wb is set in this function, which is called binarize.
 
         Parameters
         ----------
@@ -195,13 +197,13 @@ def binarize(self, mutable):
         """
         self.log_prob = None
         # reset binary gates
-        self.AP_path_wb.data.zero_()
+        self.ap_path_wb.data.zero_()
         probs = self.probs_over_ops
         if MixedOp.forward_mode == 'two':
             # sample two ops according to probs
             sample_op = torch.multinomial(probs.data, 2, replacement=False)
             probs_slice = F.softmax(torch.stack([
-                self.AP_path_alpha[idx] for idx in sample_op
+                self.ap_path_alpha[idx] for idx in sample_op
             ]), dim=0)
             self.current_prob_over_ops = torch.zeros_like(probs)
             for i, idx in enumerate(sample_op):
@@ -213,7 +215,7 @@ def binarize(self, mutable):
             self.active_index = [active_op]
             self.inactive_index = [inactive_op]
             # set binary gate
-            self.AP_path_wb.data[active_op] = 1.0
+            self.ap_path_wb.data[active_op] = 1.0
         else:
             sample = torch.multinomial(probs, 1)[0].item()
             self.active_index = [sample]
@@ -221,13 +223,14 @@ def binarize(self, mutable):
                                 [_i for _i in range(sample + 1, len(mutable.choices))]
             self.log_prob = torch.log(probs[sample])
             self.current_prob_over_ops = probs
-            self.AP_path_wb.data[sample] = 1.0
+            self.ap_path_wb.data[sample] = 1.0
         # avoid over-regularization
         for choice in mutable.choices:
             for _, param in choice.named_parameters():
                 param.grad = None
 
-    def _delta_ij(self, i, j):
+    @staticmethod
+    def delta_ij(i, j):
         if i == j:
             return 1
         else:
@@ -238,32 +241,32 @@ def set_arch_param_grad(self, mutable):
         Calculate alpha gradient for this LayerChoice.
         It is calculated using gradient of binary gate, probs of ops.
         """
-        binary_grads = self.AP_path_wb.grad.data
+        binary_grads = self.ap_path_wb.grad.data
         if self.active_op(mutable).is_zero_layer():
-            self.AP_path_alpha.grad = None
+            self.ap_path_alpha.grad = None
             return
-        if self.AP_path_alpha.grad is None:
-            self.AP_path_alpha.grad = torch.zeros_like(self.AP_path_alpha.data)
+        if self.ap_path_alpha.grad is None:
+            self.ap_path_alpha.grad = torch.zeros_like(self.ap_path_alpha.data)
         if MixedOp.forward_mode == 'two':
             involved_idx = self.active_index + self.inactive_index
             probs_slice = F.softmax(torch.stack([
-                self.AP_path_alpha[idx] for idx in involved_idx
+                self.ap_path_alpha[idx] for idx in involved_idx
             ]), dim=0).data
             for i in range(2):
                 for j in range(2):
                     origin_i = involved_idx[i]
                     origin_j = involved_idx[j]
-                    self.AP_path_alpha.grad.data[origin_i] += \
-                        binary_grads[origin_j] * probs_slice[j] * (self._delta_ij(i, j) - probs_slice[i])
+                    self.ap_path_alpha.grad.data[origin_i] += \
+                        binary_grads[origin_j] * probs_slice[j] * (MixedOp.delta_ij(i, j) - probs_slice[i])
             for _i, idx in enumerate(self.active_index):
-                self.active_index[_i] = (idx, self.AP_path_alpha.data[idx].item())
+                self.active_index[_i] = (idx, self.ap_path_alpha.data[idx].item())
             for _i, idx in enumerate(self.inactive_index):
-                self.inactive_index[_i] = (idx, self.AP_path_alpha.data[idx].item())
+                self.inactive_index[_i] = (idx, self.ap_path_alpha.data[idx].item())
         else:
             probs = self.probs_over_ops.data
             for i in range(self.n_choices):
                 for j in range(self.n_choices):
-                    self.AP_path_alpha.grad.data[i] += binary_grads[j] * probs[j] * (self._delta_ij(i, j) - probs[i])
+                    self.ap_path_alpha.grad.data[i] += binary_grads[j] * probs[j] * (MixedOp.delta_ij(i, j) - probs[i])
         return
 
     def rescale_updated_arch_param(self):
@@ -275,14 +278,14 @@ def rescale_updated_arch_param(self):
             return
         involved_idx = [idx for idx, _ in (self.active_index + self.inactive_index)]
         old_alphas = [alpha for _, alpha in (self.active_index + self.inactive_index)]
-        new_alphas = [self.AP_path_alpha.data[idx] for idx in involved_idx]
+        new_alphas = [self.ap_path_alpha.data[idx] for idx in involved_idx]
 
         offset = math.log(
             sum([math.exp(alpha) for alpha in new_alphas]) / sum([math.exp(alpha) for alpha in old_alphas])
         )
 
         for idx in involved_idx:
-            self.AP_path_alpha.data[idx] -= offset
+            self.ap_path_alpha.data[idx] -= offset
 
 
 class ProxylessNasMutator(BaseMutator):
@@ -378,10 +381,10 @@ def get_architecture_parameters(self):
         yield
         -----
         PyTorch Parameter
-            Return AP_path_alpha of the traversed mutable
+            Return ap_path_alpha of the traversed mutable
         """
         for mutable in self.undedup_mutables:
-            yield mutable.registered_module.get_AP_path_alpha()
+            yield mutable.registered_module.get_ap_path_alpha()
 
     def change_forward_mode(self, mode):
         """
@@ -468,4 +471,4 @@ def sample_final(self):
             index, _ = mutable.registered_module.chosen_index
             # pylint: disable=not-callable
             result[mutable.key] = F.one_hot(torch.tensor(index), num_classes=mutable.length).view(-1).bool()
-        return result
\ No newline at end of file
+        return result
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
index 0887107fb0..d9c86a6a9f 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/trainer.py
@@ -373,7 +373,6 @@ def _train(self):
                         format(epoch + 1, i, nBatch - 1, batch_time=batch_time, data_time=data_time,
                                losses=losses, top1=top1, top5=top5, lr=lr)
                     logger.info(batch_log)
-            # TODO: print current network architecture
             # validate
             if (epoch + 1) % self.arch_valid_frequency == 0:
                 val_loss, val_top1, val_top5 = self._validate()
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py
index e6f7b1533e..b703810d3b 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/utils.py
@@ -2,7 +2,7 @@
 # Licensed under the MIT license.
 
 import torch
-from torch import nn as nn
+import torch.nn as nn
 
 def detach_variable(inputs):
     """
@@ -75,4 +75,4 @@ def accuracy(output, target, topk=(1,)):
     for k in topk:
         correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
         res.append(correct_k.mul_(100.0 / batch_size))
-    return res
\ No newline at end of file
+    return res

From 0b8cb1e2aca5891002041f20b9dc8bcc3783e107 Mon Sep 17 00:00:00 2001
From: quzha <Quanlu.Zhang@microsoft.com>
Date: Tue, 24 Dec 2019 11:25:24 +0800
Subject: [PATCH 56/60] update

---
 docs/en_US/NAS/Overview.md                            | 2 +-
 docs/en_US/NAS/Proxylessnas.md                        | 6 +++---
 examples/nas/proxylessnas/ops.py                      | 4 +++-
 src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py | 2 ++
 4 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md
index ffa0e5bcb2..0788b73948 100644
--- a/docs/en_US/NAS/Overview.md
+++ b/docs/en_US/NAS/Overview.md
@@ -96,7 +96,7 @@ python3 retrain.py --arc-checkpoint ../pdarts/checkpoints/epoch_2.json
 
 ### ProxylessNAS
 
-The paper [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) removes proxy, it directly learn the architectures for large-scale target tasks and target hardware platforms. It addresses high memory consumption issue of differentiable NAS and reduces the computational cost to the same level of regular training while still allowing a large candidate set.
+The paper [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) removes proxy, it directly learns the architectures for large-scale target tasks and target hardware platforms. It addresses high memory consumption issue of differentiable NAS and reduces the computational cost to the same level of regular training while still allowing a large candidate set.
 
 #### Usage
 
diff --git a/docs/en_US/NAS/Proxylessnas.md b/docs/en_US/NAS/Proxylessnas.md
index 3fe24d06b8..8f58f3306e 100644
--- a/docs/en_US/NAS/Proxylessnas.md
+++ b/docs/en_US/NAS/Proxylessnas.md
@@ -2,7 +2,7 @@
 
 ## Introduction
 
-The paper [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) removes proxy, it directly learn the architectures for large-scale target tasks and target hardware platforms. They address high memory consumption issue of differentiable NAS and reduce the computational cost to the same level of regular training while still allowing a large candidate set. Please refer to the paper for the details.
+The paper [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) removes proxy, it directly learns the architectures for large-scale target tasks and target hardware platforms. They address high memory consumption issue of differentiable NAS and reduce the computational cost to the same level of regular training while still allowing a large candidate set. Please refer to the paper for the details.
 
 ## Usage
 
@@ -48,9 +48,9 @@ The complete example code can be found [here](https://github.com/microsoft/nni/t
 
 ## Implementation
 
-The implementation on NNI is based on the [offical implementation](https://github.com/mit-han-lab/ProxylessNAS). The offical implementation supports two training approaches: gradient descent and RL based, and support different targeted hardwared, including 'mobile', 'cpu', 'gpu8', 'flops'. In our current implementation on NNI, gradient descent training approach is supported, but has not supported different hardwares. The complete support is ongoing.
+The implementation on NNI is based on the [offical implementation](https://github.com/mit-han-lab/ProxylessNAS). The official implementation supports two training approaches: gradient descent and RL based, and support different targeted hardware, including 'mobile', 'cpu', 'gpu8', 'flops'. In our current implementation on NNI, gradient descent training approach is supported, but has not supported different hardwares. The complete support is ongoing.
 
-Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibily define their own search space and use built-in ProxylessNAS training approach, we put the specified search space in [example code](https://github.com/microsoft/nni/tree/master/examples/nas/proxylessnas) using [NNI NAS interface](NasInterface.md), and put the training approach in [SDK](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch/proxylessnas).
+Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibly define their own search space and use built-in ProxylessNAS training approach, we put the specified search space in [example code](https://github.com/microsoft/nni/tree/master/examples/nas/proxylessnas) using [NNI NAS interface](NasInterface.md), and put the training approach in [SDK](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch/proxylessnas).
 
 ![](../../img/proxylessnas.png)
 
diff --git a/examples/nas/proxylessnas/ops.py b/examples/nas/proxylessnas/ops.py
index 880f395f77..6ff0bbf1cc 100644
--- a/examples/nas/proxylessnas/ops.py
+++ b/examples/nas/proxylessnas/ops.py
@@ -255,7 +255,9 @@ def is_zero_layer():
 
 
 class MBInvertedConvLayer(nn.Module):
-
+    """
+    This layer is introduced in section 4.2 in the paper https://arxiv.org/pdf/1812.00332.pdf
+    """
     def __init__(self, in_channels, out_channels,
                  kernel_size=3, stride=1, expand_ratio=6, mid_channels=None):
         super(MBInvertedConvLayer, self).__init__()
diff --git a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
index a289fa5714..6e3c7a5b60 100644
--- a/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
+++ b/src/sdk/pynni/nni/nas/pytorch/proxylessnas/mutator.py
@@ -77,6 +77,8 @@ def to_disable_grad(self):
     def forward(self, mutable, x):
         """
         Define forward of LayerChoice. For 'full_v2', backward is also defined.
+        The 'two' mode is explained in section 3.2.1 in the paper.
+        The 'full_v2' mode is explained in Appendix D in the paper.
 
         Parameters
         ----------

From 927ab9e4e5430d680e4fbb0e486da9478bf8e5d7 Mon Sep 17 00:00:00 2001
From: QuanluZhang
 <quzha@v100test.j54c4fvz4ikexg1bdsarwjfncg.ix.internal.cloudapp.net>
Date: Mon, 10 Feb 2020 10:00:05 +0000
Subject: [PATCH 57/60] update doc

---
 docs/en_US/NAS/Overview.md | 20 --------------------
 1 file changed, 20 deletions(-)

diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md
index d824f0ad6e..5e63acc76b 100644
--- a/docs/en_US/NAS/Overview.md
+++ b/docs/en_US/NAS/Overview.md
@@ -39,26 +39,6 @@ Here are some common dependencies to run the examples. PyTorch needs to be above
 .. Note:: SPOS is a two-stage algorithm, whose first stage is one-shot and second stage is distributed, leveraging result of first stage as a checkpoint.
 ```
 
-### ProxylessNAS
-
-The paper [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) removes proxy, it directly learns the architectures for large-scale target tasks and target hardware platforms. It addresses high memory consumption issue of differentiable NAS and reduces the computational cost to the same level of regular training while still allowing a large candidate set.
-
-#### Usage
-
-```bash
-# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
-git clone https://github.com/Microsoft/nni.git
-
-# search the best architecture
-cd examples/nas/proxylessnas
-python3 main.py
-
-# train the best architecture after you get the best architecture
-python3 main.py --train_mode='retrain' --exported_arch_path='your_arch_path'
-```
-
-Please refer to [here](Proxylessnas.md) for detailed usage and implementation of ProxylessNAS on NNI.
-
 ## Use NNI API
 
 The programming interface of designing and searching a model is often demanded in two scenarios.

From e32bb72b6176b20a10c3e27b68bac578e4bbaeb2 Mon Sep 17 00:00:00 2001
From: QuanluZhang
 <quzha@v100test.j54c4fvz4ikexg1bdsarwjfncg.ix.internal.cloudapp.net>
Date: Mon, 10 Feb 2020 10:50:39 +0000
Subject: [PATCH 58/60] update doc

---
 docs/en_US/NAS/Overview.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/en_US/NAS/Overview.md b/docs/en_US/NAS/Overview.md
index 5e63acc76b..1a325d911f 100644
--- a/docs/en_US/NAS/Overview.md
+++ b/docs/en_US/NAS/Overview.md
@@ -19,6 +19,7 @@ NNI supports below NAS algorithms now and is adding more. User can reproduce an
 | [P-DARTS](PDARTS.md) | [Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation](https://arxiv.org/abs/1904.12760) is based on DARTS. It introduces an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure. |
 | [SPOS](SPOS.md) | [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) constructs a simplified supernet trained with an uniform path sampling method, and applies an evolutionary algorithm to efficiently search for the best-performing architectures. |
 | [CDARTS](CDARTS.md) | [Cyclic Differentiable Architecture Search](https://arxiv.org/abs/****) builds a cyclic feedback mechanism between the search and evaluation networks. It introduces a cyclic differentiable architecture search framework which integrates the two networks into a unified architecture.|
+| [ProxylessNAS](Proxylessnas.md) | [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/abs/1812.00332).|
 
 One-shot algorithms run **standalone without nnictl**. Only PyTorch version has been implemented. Tensorflow 2.x will be supported in future release.
 

From 61d2944f117c678531c781878a762715cbf9fb97 Mon Sep 17 00:00:00 2001
From: QuanluZhang
 <quzha@v100test.j54c4fvz4ikexg1bdsarwjfncg.ix.internal.cloudapp.net>
Date: Mon, 10 Feb 2020 11:19:51 +0000
Subject: [PATCH 59/60] update toctree

---
 docs/en_US/nas.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/en_US/nas.rst b/docs/en_US/nas.rst
index b04f3a9e70..73b6aad0e5 100644
--- a/docs/en_US/nas.rst
+++ b/docs/en_US/nas.rst
@@ -24,4 +24,5 @@ For details, please refer to the following tutorials:
     P-DARTS <NAS/PDARTS>
     SPOS <NAS/SPOS>
     CDARTS <NAS/CDARTS>
+    ProxylessNAS <NAS/Proxylessnas>
     API Reference <NAS/NasReference>

From fba009e824fee3490a685b6c1b96e73a4e4ca3c8 Mon Sep 17 00:00:00 2001
From: QuanluZhang
 <quzha@v100test.j54c4fvz4ikexg1bdsarwjfncg.ix.internal.cloudapp.net>
Date: Mon, 10 Feb 2020 11:53:02 +0000
Subject: [PATCH 60/60] fix broken link

---
 docs/en_US/NAS/Proxylessnas.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/en_US/NAS/Proxylessnas.md b/docs/en_US/NAS/Proxylessnas.md
index 8f58f3306e..9c913203d8 100644
--- a/docs/en_US/NAS/Proxylessnas.md
+++ b/docs/en_US/NAS/Proxylessnas.md
@@ -6,7 +6,7 @@ The paper [ProxylessNAS: Direct Neural Architecture Search on Target Task and Ha
 
 ## Usage
 
-To use ProxylessNAS training/searching approach, users need to specify search space in their model using [NNI NAS interface](NasInterface.md), e.g., `LayerChoice`, `InputChoice`. After defining and instantiating the model, the following work can be leaved to ProxylessNasTrainer by instantiating the trainer and passing the model to it.
+To use ProxylessNAS training/searching approach, users need to specify search space in their model using [NNI NAS interface](NasGuide.md), e.g., `LayerChoice`, `InputChoice`. After defining and instantiating the model, the following work can be leaved to ProxylessNasTrainer by instantiating the trainer and passing the model to it.
 ```python
 trainer = ProxylessNasTrainer(model,
                               model_optim=optimizer,
@@ -50,7 +50,7 @@ The complete example code can be found [here](https://github.com/microsoft/nni/t
 
 The implementation on NNI is based on the [offical implementation](https://github.com/mit-han-lab/ProxylessNAS). The official implementation supports two training approaches: gradient descent and RL based, and support different targeted hardware, including 'mobile', 'cpu', 'gpu8', 'flops'. In our current implementation on NNI, gradient descent training approach is supported, but has not supported different hardwares. The complete support is ongoing.
 
-Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibly define their own search space and use built-in ProxylessNAS training approach, we put the specified search space in [example code](https://github.com/microsoft/nni/tree/master/examples/nas/proxylessnas) using [NNI NAS interface](NasInterface.md), and put the training approach in [SDK](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch/proxylessnas).
+Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibly define their own search space and use built-in ProxylessNAS training approach, we put the specified search space in [example code](https://github.com/microsoft/nni/tree/master/examples/nas/proxylessnas) using [NNI NAS interface](NasGuide.md), and put the training approach in [SDK](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch/proxylessnas).
 
 ![](../../img/proxylessnas.png)