diff --git a/docs/en_US/NAS/Advanced.rst b/docs/en_US/NAS/Advanced.rst
deleted file mode 100644
index 96dc07596c..0000000000
--- a/docs/en_US/NAS/Advanced.rst
+++ /dev/null
@@ -1,136 +0,0 @@
-Customize a NAS Algorithm
-=========================
-
-Extend the Ability of One-Shot Trainers
----------------------------------------
-
-Users might want to do multiple things if they are using the trainers on real tasks, for example, distributed training, half-precision training, logging periodically, writing tensorboard, dumping checkpoints and so on. As mentioned previously, some trainers do have support for some of the items listed above; others might not. Generally, there are two recommended ways to add anything you want to an existing trainer: inherit an existing trainer and override, or copy an existing trainer and modify.
-
-Either way, you are walking into the scope of implementing a new trainer. Basically, implementing a one-shot trainer is no different from any traditional deep learning trainer, except that a new concept called mutator will reveal itself. So that the implementation will be different in at least two places:
-
-
-* Initialization
-
-.. code-block:: python
-
-   model = Model()
-   mutator = MyMutator(model)
-
-
-* Training
-
-.. code-block:: python
-
-   for _ in range(epochs):
-       for x, y in data_loader:
-           mutator.reset()  # reset all the choices in model
-           out = model(x)  # like traditional model
-           loss = criterion(out, y)
-           loss.backward()
-           # no difference below
-
-To demonstrate what mutators are for, we need to know how one-shot NAS normally works. Usually, one-shot NAS "co-optimize model weights and architecture weights". It repeatedly: sample an architecture or combination of several architectures from the supernet, train the chosen architectures like traditional deep learning model, update the trained parameters to the supernet, and use the metrics or loss as some signal to guide the architecture sampler. The mutator, is the architecture sampler here, often defined to be another deep-learning model. Therefore, you can treat it as any model, by defining parameters in it and optimizing it with optimizers. One mutator is initialized with exactly one model. Once a mutator is binded to a model, it cannot be rebinded to another model.
-
-``mutator.reset()`` is the core step. That's where all the choices in the model are finalized. The reset result will be always effective, until the next reset flushes the data. After the reset, the model can be seen as a traditional model to do forward-pass and backward-pass.
-
-Finally, mutators provide a method called ``mutator.export()`` that export a dict with architectures to the model. Note that currently this dict this a mapping from keys of mutables to tensors of selection. So in order to dump to json, users need to convert the tensors explicitly into python list.
-
-Meanwhile, NNI provides some useful tools so that users can implement trainers more easily. See `Trainers <./NasReference.rst>`__ for details.
-
-Implement New Mutators
-----------------------
-
-To start with, here is the pseudo-code that demonstrates what happens on ``mutator.reset()`` and ``mutator.export()``.
-
-.. code-block:: python
-
-   def reset(self):
-       self.apply_on_model(self.sample_search())
-
-.. code-block:: python
-
-   def export(self):
-       return self.sample_final()
-
-On reset, a new architecture is sampled with ``sample_search()`` and applied on the model. Then the model is trained for one or more steps in search phase. On export, a new architecture is sampled with ``sample_final()`` and **do nothing to the model**. This is either for checkpoint or exporting the final architecture.
-
-The requirements of return values of ``sample_search()`` and ``sample_final()`` are the same: a mapping from mutable keys to tensors. The tensor can be either a BoolTensor (true for selected, false for negative), or a FloatTensor which applies weight on each candidate. The selected branches will then be computed (in ``LayerChoice``\ , modules will be called; in ``InputChoice``\ , it's just tensors themselves), and reduce with the reduction operation specified in the choices. For most algorithms only worrying about the former part, here is an example of your mutator implementation.
-
-.. code-block:: python
-
-   class RandomMutator(Mutator):
-       def __init__(self, model):
-           super().__init__(model)  # don't forget to call super
-           # do something else
-
-       def sample_search(self):
-           result = dict()
-           for mutable in self.mutables:  # this is all the mutable modules in user model
-               # mutables share the same key will be de-duplicated
-               if isinstance(mutable, LayerChoice):
-                   # decided that this mutable should choose `gen_index`
-                   gen_index = np.random.randint(mutable.length)
-                   result[mutable.key] = torch.tensor([i == gen_index for i in range(mutable.length)], 
-                                                      dtype=torch.bool)
-               elif isinstance(mutable, InputChoice):
-                   if mutable.n_chosen is None:  # n_chosen is None, then choose any number
-                       result[mutable.key] = torch.randint(high=2, size=(mutable.n_candidates,)).view(-1).bool()
-                   # else do something else
-           return result
-
-       def sample_final(self):
-           return self.sample_search()  # use the same logic here. you can do something different
-
-The complete example of random mutator can be found :githublink:`here <nni/nas/pytorch/mutator.py>`.
-
-For advanced usages, e.g., users want to manipulate the way modules in ``LayerChoice`` are executed, they can inherit ``BaseMutator``\ , and overwrite ``on_forward_layer_choice`` and ``on_forward_input_choice``\ , which are the callback implementation of ``LayerChoice`` and ``InputChoice`` respectively. Users can still use property ``mutables`` to get all ``LayerChoice`` and ``InputChoice`` in the model code. For details, please refer to :githublink:`reference <nni/nas/pytorch/>` here to learn more.
-
-.. tip::
-    A useful application of random mutator is for debugging. Use
-
-    .. code-block:: python
-
-        mutator = RandomMutator(model)
-        mutator.reset()
-
-    will immediately set one possible candidate in the search space as the active one.
-
-Implemented a Distributed NAS Tuner
------------------------------------
-
-Before learning how to write a distributed NAS tuner, users should first learn how to write a general tuner. read `Customize Tuner <../Tuner/CustomizeTuner.rst>`__ for tutorials.
-
-When users call "\ `nnictl ss_gen <../Tutorial/Nnictl.rst>`__\ " to generate search space file, a search space file like this will be generated:
-
-.. code-block:: json
-
-   {
-       "key_name": {
-           "_type": "layer_choice",
-           "_value": ["op1_repr", "op2_repr", "op3_repr"]
-       },
-       "key_name": {
-           "_type": "input_choice",
-           "_value": {
-               "candidates": ["in1_key", "in2_key", "in3_key"],
-               "n_chosen": 1
-           }
-       }
-   }
-
-This is the exact search space tuners will receive in ``update_search_space``. It's then tuners' responsibility to interpret the search space and generate new candidates in ``generate_parameters``. A valid "parameters" will be in the following format:
-
-.. code-block:: json
-
-   {
-       "key_name": {
-           "_value": "op1_repr",
-           "_idx": 0
-       },
-       "key_name": {
-           "_value": ["in2_key"],
-           "_idex": [1]
-       }
-   }
-
-Send it through ``generate_parameters``\ , and the tuner would look like any HPO tuner. Refer to `SPOS <./SPOS.rst>`__ example code for an example.
diff --git a/docs/en_US/NAS/retiarii/ApiReference.rst b/docs/en_US/NAS/ApiReference.rst
similarity index 91%
rename from docs/en_US/NAS/retiarii/ApiReference.rst
rename to docs/en_US/NAS/ApiReference.rst
index 9d8cd03059..359fc5c3aa 100644
--- a/docs/en_US/NAS/retiarii/ApiReference.rst
+++ b/docs/en_US/NAS/ApiReference.rst
@@ -75,8 +75,8 @@ Oneshot Trainers
 ..  autoclass:: nni.retiarii.oneshot.pytorch.SinglePathTrainer
     :members:
 
-Strategies
-----------
+Exploration Strategies
+----------------------
 
 ..  autoclass:: nni.retiarii.strategy.Random
     :members:
@@ -90,6 +90,9 @@ Strategies
 ..  autoclass:: nni.retiarii.strategy.TPEStrategy
     :members:
 
+..  autoclass:: nni.retiarii.strategy.PolicyBasedRL
+    :members:
+
 Retiarii Experiments
 --------------------
 
@@ -98,3 +101,8 @@ Retiarii Experiments
 
 ..  autoclass:: nni.retiarii.experiment.pytorch.RetiariiExeConfig
     :members:
+
+Utilities
+---------
+
+..  autofunction:: nni.retiarii.serialize
\ No newline at end of file
diff --git a/docs/en_US/NAS/Benchmarks.rst b/docs/en_US/NAS/Benchmarks.rst
index 4749b46543..b0dd04c8d6 100644
--- a/docs/en_US/NAS/Benchmarks.rst
+++ b/docs/en_US/NAS/Benchmarks.rst
@@ -9,7 +9,7 @@ NAS Benchmarks
 Introduction
 ------------
 
-To imporve the reproducibility of NAS algorithms as well as reducing computing resource requirements, researchers proposed a series of NAS benchmarks such as `NAS-Bench-101 <https://arxiv.org/abs/1902.09635>`__\ , `NAS-Bench-201 <https://arxiv.org/abs/2001.00326>`__\ , `NDS <https://arxiv.org/abs/1905.13214>`__\ , etc. NNI provides a query interface for users to acquire these benchmarks. Within just a few lines of code, researcher are able to evaluate their NAS algorithms easily and fairly by utilizing these benchmarks.
+To improve the reproducibility of NAS algorithms as well as reducing computing resource requirements, researchers proposed a series of NAS benchmarks such as `NAS-Bench-101 <https://arxiv.org/abs/1902.09635>`__\ , `NAS-Bench-201 <https://arxiv.org/abs/2001.00326>`__\ , `NDS <https://arxiv.org/abs/1905.13214>`__\ , etc. NNI provides a query interface for users to acquire these benchmarks. Within just a few lines of code, researcher are able to evaluate their NAS algorithms easily and fairly by utilizing these benchmarks.
 
 Prerequisites
 -------------
diff --git a/docs/en_US/NAS/CDARTS.rst b/docs/en_US/NAS/CDARTS.rst
deleted file mode 100644
index 1504a56fa6..0000000000
--- a/docs/en_US/NAS/CDARTS.rst
+++ /dev/null
@@ -1,72 +0,0 @@
-CDARTS
-======
-
-Introduction
-------------
-
-`CDARTS <https://arxiv.org/pdf/2006.10724.pdf>`__ builds a cyclic feedback mechanism between the search and evaluation networks. First, the search network generates an initial topology for evaluation, so that the weights of the evaluation network can be optimized. Second, the architecture topology in the search network is further optimized by the label supervision in classification, as well as the regularization from the evaluation network through feature distillation. Repeating the above cycle results in a joint optimization of the search and evaluation networks, and thus enables the evolution of the topology to fit the final evaluation network.
-
-In implementation of ``CdartsTrainer``\ , it first instantiates two models and two mutators (one for each). The first model is the so-called "search network", which is mutated with a ``RegularizedDartsMutator`` -- a mutator with subtle differences with ``DartsMutator``. The second model is the "evaluation network", which is mutated with a discrete mutator that leverages the previous search network mutator, to sample a single path each time. Trainers train models and mutators alternatively. Users can refer to `paper <https://arxiv.org/pdf/2006.10724.pdf>`__ if they are interested in more details on these trainers and mutators.
-
-Reproduction Results
---------------------
-
-This is CDARTS based on the NNI platform, which currently supports CIFAR10 search and retrain. ImageNet search and retrain should also be supported, and we provide corresponding interfaces. Our reproduced results on NNI are slightly lower than the paper, but much higher than the original DARTS. Here we show the results of three independent experiments on CIFAR10.
-
-.. list-table::
-   :header-rows: 1
-   :widths: auto
-
-   * - Runs
-     - Paper
-     - NNI
-   * - 1
-     - 97.52
-     - 97.44
-   * - 2
-     - 97.53
-     - 97.48
-   * - 3
-     - 97.58
-     - 97.56
-
-
-Examples
---------
-
-`Example code <https://github.com/microsoft/nni/tree/master/examples/nas/legacy/cdarts>`__
-
-.. code-block:: bash
-
-   # In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
-   git clone https://github.com/Microsoft/nni.git
-
-   # install apex for distributed training.
-   git clone https://github.com/NVIDIA/apex
-   cd apex
-   python setup.py install --cpp_ext --cuda_ext
-
-   # search the best architecture
-   cd examples/nas/legacy/cdarts
-   bash run_search_cifar.sh
-
-   # train the best architecture.
-   bash run_retrain_cifar.sh
-
-Reference
----------
-
-PyTorch
-^^^^^^^
-
-..  autoclass:: nni.algorithms.nas.pytorch.cdarts.CdartsTrainer
-    :members:
-
-..  autoclass:: nni.algorithms.nas.pytorch.cdarts.RegularizedDartsMutator
-    :members:
-
-..  autoclass:: nni.algorithms.nas.pytorch.cdarts.DartsDiscreteMutator
-    :members:
-
-..  autoclass:: nni.algorithms.nas.pytorch.cdarts.RegularizedMutatorParallel
-    :members:
diff --git a/docs/en_US/NAS/ClassicNas.rst b/docs/en_US/NAS/ClassicNas.rst
deleted file mode 100644
index cec54d910f..0000000000
--- a/docs/en_US/NAS/ClassicNas.rst
+++ /dev/null
@@ -1,59 +0,0 @@
-.. role:: raw-html(raw)
-   :format: html
-
-
-Classic NAS Algorithms
-======================
-
-In classic NAS algorithms, each architecture is trained as a trial and the NAS algorithm acts as a tuner. Thus, this training mode naturally fits within the NNI hyper-parameter tuning framework, where Tuner generates new architecture for the next trial and trials run in the training service.
-
-Quick Start
------------
-
-The following example shows how to use classic NAS algorithms. You can see it is quite similar to NNI hyper-parameter tuning.
-
-.. code-block:: python
-
-   model = Net()
-
-   # get the chosen architecture from tuner and apply it on model
-   get_and_apply_next_architecture(model)
-   train(model)  # your code for training the model
-   acc = test(model)  # test the trained model
-   nni.report_final_result(acc)  # report the performance of the chosen architecture
-
-First, instantiate the model. Search space has been defined in this model through ``LayerChoice`` and ``InputChoice``. After that, user should invoke ``get_and_apply_next_architecture(model)`` to settle down to a specific architecture. This function receives the architecture from tuner (i.e., the classic NAS algorithm) and applies the architecture to ``model``. At this point, ``model`` becomes a specific architecture rather than a search space. Then users are free to train this model just like training a normal PyTorch model. After get the accuracy of this model, users should invoke ``nni.report_final_result(acc)`` to report the result to the tuner.
-
-At this point, trial code is ready. Then, we can prepare an NNI experiment, i.e., search space file and experiment config file. Different from NNI hyper-parameter tuning, search space file is automatically generated from the trial code by running the command (the detailed usage of this command can be found `here <../Tutorial/Nnictl.rst>`__\ ):
-
-``nnictl ss_gen --trial_command="the command for running your trial code"``
-
-A file named ``nni_auto_gen_search_space.json`` is generated by this command. Then put the path of the generated search space in the field ``searchSpacePath`` of the experiment config file. The other fields of the config file can be filled by referring `this tutorial <../Tutorial/QuickStart.rst>`__.
-
-Currently, we only support :githublink:`PPO Tuner <examples/tuners/random_nas_tuner>` for classic NAS. More classic NAS algorithms will be supported soon.
-
-The complete examples can be found :githublink:`here <examples/nas/legacy/classic_nas>` for PyTorch and :githublink:`here <examples/nas/legacy/classic_nas-tf>` for TensorFlow.
-
-Standalone mode for easy debugging
-----------------------------------
-
-We support a standalone mode for easy debugging, where you can directly run the trial command without launching an NNI experiment. This is for checking whether your trial code can correctly run. The first candidate(s) are chosen for ``LayerChoice`` and ``InputChoice`` in this standalone mode.
-
-:raw-html:`<a name="regulaized-evolution-tuner"></a>`
-
-Regularized Evolution Tuner
----------------------------
-
-This is a tuner geared for NNI’s Neural Architecture Search (NAS) interface. It uses the `evolution algorithm <https://arxiv.org/pdf/1802.01548.pdf>`__.
-
-The tuner first randomly initializes the number of ``population`` models and evaluates them. After that, every time to produce a new architecture, the tuner randomly chooses the number of ``sample`` architectures from ``population``\ , then mutates the best model in ``sample``\ , the parent model, to produce the child model. The mutation includes the hidden mutation and the op mutation. The hidden state mutation consists of replacing a hidden state with another hidden state from within the cell, subject to the constraint that no loops are formed. The op mutation behaves like the hidden state mutation as far as replacing one op with another op from the op set. Note that keeping the child model the same as its parent is not allowed. After evaluating the child model, it is added to the tail of the ``population``\ , then pops the front one.
-
-Note that **trial concurrency should be less than the population of the model**\ , otherwise NO_MORE_TRIAL exception will be raised.
-
-The whole procedure is summarized by the pseudocode below.
-
-
-.. image:: ../../img/EvoNasTuner.png
-   :target: ../../img/EvoNasTuner.png
-   :alt: 
-
diff --git a/docs/en_US/NAS/Cream.rst b/docs/en_US/NAS/Cream.rst
deleted file mode 100644
index 4aa9c9cf41..0000000000
--- a/docs/en_US/NAS/Cream.rst
+++ /dev/null
@@ -1,156 +0,0 @@
-Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search
-=======================================================================================
-
-* `Paper <https://papers.nips.cc/paper/2020/file/d072677d210ac4c03ba046120f0802ec-Paper.pdf>`__
-* `Models-Google Drive <https://drive.google.com/drive/folders/1NLGAbBF9bA1IUAxKlk2VjgRXhr6RHvRW?usp=sharing>`__
-* `Models-Baidu Disk (PWD: wqw6) <https://pan.baidu.com/s/1TqQNm2s14oEdyNPimw3T9g>`__
-* `BibTex <https://scholar.googleusercontent.com/scholar.bib?q=info:ICWVXc_SsKAJ:scholar.google.com/&output=citation&scisdr=CgUmooXfEMfTi0cV5aU:AAGBfm0AAAAAX7sQ_aXoamdKRaBI12tAVN8REq1VKNwM&scisig=AAGBfm0AAAAAX7sQ_RdYtp6BSro3zgbXVJU2MCgsG730&scisf=4&ct=citation&cd=-1&hl=ja>`__
-
-In this work, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop. The discovered architectures achieve superior performance compared to the recent `MobileNetV3 <https://arxiv.org/abs/1905.02244>`__ and `EfficientNet <https://arxiv.org/abs/1905.11946>`__ families under aligned settings.
-
-.. image:: https://raw.githubusercontent.com/microsoft/Cream/main/demo/intro.jpg
-
-Reproduced Results
-------------------
-
-Top-1 Accuracy on ImageNet. The top-1 accuracy of Cream search algorithm surpasses MobileNetV3 and EfficientNet-B0/B1 on ImageNet.
-The training with 16 Gpus is a little bit superior than 8 Gpus, as below.
-
-.. list-table::
-   :header-rows: 1
-   :widths: auto
-
-   * - Model (M Flops)
-     - 8Gpus
-     - 16Gpus
-   * - 14M
-     - 53.7
-     - 53.8
-   * - 43M
-     - 65.8
-     - 66.5
-   * - 114M
-     - 72.1
-     - 72.8
-   * - 287M
-     - 76.7
-     - 77.6
-   * - 481M
-     - 78.9
-     - 79.2
-   * - 604M
-     - 79.4
-     - 80.0
-
-
-
-.. image:: ../../img/cream_flops100.jpg
-   :scale: 50%
-
-.. image:: ../../img/cream_flops600.jpg
-   :scale: 50%
-
-Examples
---------
-
-`Example code <https://github.com/microsoft/nni/tree/master/examples/nas/legacy/cream>`__
-
-Please run the following scripts in the example folder.
-
-Data Preparation
-----------------
-
-You need to first download the `ImageNet-2012 <http://www.image-net.org/>`__ to the folder ``./data/imagenet`` and move the validation set to the subfolder ``./data/imagenet/val``. To move the validation set, you cloud use `the following script <https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh>`__ .
-
-Put the imagenet data in ``./data``. It should be like following:
-
-.. code-block:: bash
-
-   ./data/imagenet/train
-   ./data/imagenet/val
-   ...
-
-Quick Start
------------
-
-1. Search
-^^^^^^^^^
-
-First, build environments for searching.
-
-.. code-block:: bash
-
-   pip install -r ./requirements
-
-   git clone https://github.com/NVIDIA/apex.git
-   cd apex
-   python setup.py install --cpp_ext --cuda_ext
-
-To search for an architecture, you need to configure the parameters ``FLOPS_MINIMUM`` and ``FLOPS_MAXIMUM`` to specify the desired model flops, such as [0,600]MB flops. You can specify the flops interval by changing these two parameters in ``./configs/train.yaml``
-
-.. code-block:: bash
-
-   FLOPS_MINIMUM: 0 # Minimum Flops of Architecture
-   FLOPS_MAXIMUM: 600 # Maximum Flops of Architecture
-
-For example, if you expect to search an architecture with model flops <= 200M, please set the ``FLOPS_MINIMUM`` and ``FLOPS_MAXIMUM`` to be ``0`` and ``200``.
-
-After you specify the flops of the architectures you would like to search, you can search an architecture now by running:
-
-.. code-block:: bash
-
-   python -m torch.distributed.launch --nproc_per_node=8 ./train.py --cfg ./configs/train.yaml
-
-The searched architectures need to be retrained and obtain the final model. The final model is saved in ``.pth.tar`` format. Retraining code will be released soon.
-
-2. Retrain
-^^^^^^^^^^^
-
-To train searched architectures, you need to configure the parameter ``MODEL_SELECTION`` to specify the model Flops. To specify which model to train, you should add ``MODEL_SELECTION`` in ``./configs/retrain.yaml``. You can select one from [14,43,112,287,481,604], which stands for different Flops(MB).
-
-.. code-block:: bash
-
-   MODEL_SELECTION: 43 # Retrain 43m model
-   MODEL_SELECTION: 481 # Retrain 481m model
-   ......
-
-To train random architectures, you need specify ``MODEL_SELECTION`` to ``-1`` and configure the parameter ``INPUT_ARCH``\ :
-
-.. code-block:: bash
-
-   MODEL_SELECTION: -1 # Train random architectures
-   INPUT_ARCH: [[0], [3], [3, 3], [3, 1, 3], [3, 3, 3, 3], [3, 3, 3], [0]] # Random Architectures
-   ......
-
-After adding ``MODEL_SELECTION`` in ``./configs/retrain.yaml``\ , you need to use the following command to train the model.
-
-.. code-block:: bash
-
-   python -m torch.distributed.launch --nproc_per_node=8 ./retrain.py --cfg ./configs/retrain.yaml
-
-3. Test
-^^^^^^^^^
-
-To test our trained of models, you need to use ``MODEL_SELECTION`` in ``./configs/test.yaml`` to specify which model to test.
-
-.. code-block:: bash
-
-   MODEL_SELECTION: 43 # test 43m model
-   MODEL_SELECTION: 481 # test 470m model
-   ......
-
-After specifying the flops of the model, you need to write the path to the resume model in ``./test.sh``.
-
-.. code-block:: bash
-
-   RESUME_PATH: './43.pth.tar'
-   RESUME_PATH: './481.pth.tar'
-   ......
-
-We provide 14M/43M/114M/287M/481M/604M pretrained models in `google drive <https://drive.google.com/drive/folders/1CQjyBryZ4F20Rutj7coF8HWFcedApUn2>`__ or `[Models-Baidu Disk (password: wqw6)] <https://pan.baidu.com/s/1TqQNm2s14oEdyNPimw3T9g>`__ .
-
-After downloading the pretrained models and adding ``MODEL_SELECTION`` and ``RESUME_PATH`` in './configs/test.yaml', you need to use the following command to test the model.
-
-.. code-block:: bash
-
-   python -m torch.distributed.launch --nproc_per_node=8 ./test.py --cfg ./configs/test.yaml
diff --git a/docs/en_US/NAS/DARTS.rst b/docs/en_US/NAS/DARTS.rst
index fde97fe105..20dfef7ba4 100644
--- a/docs/en_US/NAS/DARTS.rst
+++ b/docs/en_US/NAS/DARTS.rst
@@ -56,11 +56,8 @@ Reference
 PyTorch
 ^^^^^^^
 
-..  autoclass:: nni.algorithms.nas.pytorch.darts.DartsTrainer
-    :members:
-
-..  autoclass:: nni.algorithms.nas.pytorch.darts.DartsMutator
-    :members:
+..  autoclass:: nni.retiarii.oneshot.pytorch.DartsTrainer
+    :noindex:
 
 Limitations
 -----------
diff --git a/docs/en_US/NAS/ENAS.rst b/docs/en_US/NAS/ENAS.rst
index 29048dd35a..60415e5450 100644
--- a/docs/en_US/NAS/ENAS.rst
+++ b/docs/en_US/NAS/ENAS.rst
@@ -39,8 +39,5 @@ Reference
 PyTorch
 ^^^^^^^
 
-.. autoclass:: nni.algorithms.nas.pytorch.enas.EnasTrainer
-    :members:
-
-.. autoclass:: nni.algorithms.nas.pytorch.enas.EnasMutator
-    :members:
+..  autoclass:: nni.retiarii.oneshot.pytorch.EnasTrainer
+    :noindex:
diff --git a/docs/en_US/NAS/ExecutionEngines.rst b/docs/en_US/NAS/ExecutionEngines.rst
new file mode 100644
index 0000000000..5be6525c78
--- /dev/null
+++ b/docs/en_US/NAS/ExecutionEngines.rst
@@ -0,0 +1,59 @@
+Execution Engines
+=================
+
+Execution engine is for running Retiarii Experiment. NNI supports three execution engines, users can choose a speicific engine according to the type of their model mutation definition and their requirements for cross-model optimizations. 
+
+* **Pure-python execution engine** is the default engine, it supports the model space expressed by `inline mutation API <./MutationPrimitives.rst>`__. 
+
+* **Graph-based execution engine** supports the use of `inline mutation APIs <./MutationPrimitives.rst>`__ and model spaces represented by `mutators <./Mutators.rst>`__. It requires the user's model to be parsed by `TorchScript <https://pytorch.org/docs/stable/jit.html>`__.
+
+* **CGO execution engine** has the same requirements and capabilities as the **Graph-based execution engine**. But further enables cross-model optimizations, which makes model space exploration faster.
+
+Pure-python Execution Engine
+----------------------------
+
+Pure-python Execution Engine is the default engine, we recommend users to keep using this execution engine, if they are new to NNI NAS. Pure-python execution engine plays magic within the scope of inline mutation APIs, while does not touch the rest of user model. Thus, it has minimal requirement on user model. 
+
+One steps are needed to use this engine now.
+
+1. Add ``@nni.retiarii.model_wrapper`` decorator outside the whole PyTorch model.
+
+.. note:: You should always use ``super().__init__()`` instead of ``super(MyNetwork, self).__init__()`` in the PyTorch model, because the latter one has issues with model wrapper.
+
+Graph-based Execution Engine
+----------------------------
+
+For graph-based execution engine, it converts user-defined model to a graph representation (called graph IR) using `TorchScript <https://pytorch.org/docs/stable/jit.html>`__, each instantiated module in the model is converted to a subgraph. Then mutations are applied to the graph to generate new graphs. Each new graph is then converted back to PyTorch code and executed on the user specified training service.
+
+Users may find ``@basic_unit`` helpful in some cases. ``@basic_unit`` here means the module will not be converted to a subgraph, instead, it is converted to a single graph node as a basic unit.
+
+``@basic_unit`` is usually used in the following cases:
+
+* When users want to tune initialization parameters of a module using ``ValueChoice``, then decorate the module with ``@basic_unit``. For example, ``self.conv = MyConv(kernel_size=nn.ValueChoice([1, 3, 5]))``, here ``MyConv`` should be decorated.
+
+* When a module cannot be successfully parsed to a subgraph, decorate the module with ``@basic_unit``. The parse failure could be due to complex control flow. Currently Retiarii does not support adhoc loop, if there is adhoc loop in a module's forward, this class should be decorated as serializable module. For example, the following ``MyModule`` should be decorated.
+
+  .. code-block:: python
+
+    @basic_unit
+    class MyModule(nn.Module):
+      def __init__(self):
+        ...
+      def forward(self, x):
+        for i in range(10): # <- adhoc loop
+          ...
+
+* Some inline mutation APIs require their handled module to be decorated with ``@basic_unit``. For example, user-defined module that is provided to ``LayerChoice`` as a candidate op should be decorated.
+
+Three steps are need to use graph-based execution engine.
+
+1. Remove ``@nni.retiarii.model_wrapper`` if there is any in your model.
+2. Add ``config.execution_engine = 'base'`` to ``RetiariiExeConfig``. The default value of ``execution_engine`` is 'py', which means pure-python execution engine.
+3. Add ``@basic_unit`` when necessary following the above guidelines.
+
+For exporting top models, graph-based execution engine supports exporting source code for top models by running ``exp.export_top_models(formatter='code')``.
+
+CGO Execution Engine
+--------------------
+
+CGO execution engine does cross-model optimizations based on the graph-based execution engine. This execution engine will be `released in v2.4 <https://github.com/microsoft/nni/issues/3813>`__.
diff --git a/docs/en_US/NAS/ExplorationStrategies.rst b/docs/en_US/NAS/ExplorationStrategies.rst
new file mode 100644
index 0000000000..e312832010
--- /dev/null
+++ b/docs/en_US/NAS/ExplorationStrategies.rst
@@ -0,0 +1,35 @@
+Exploration Strategies for Multi-trial NAS
+==========================================
+
+Usage of Exploration Strategy
+-----------------------------
+
+To use an exploration strategy, users simply instantiate an exploration strategy and pass the instantiated object to ``RetiariiExperiment``. Below is a simple example.
+
+.. code-block:: python
+
+  import nni.retiarii.strategy as strategy
+
+  exploration_strategy = strategy.Random(dedup=True)  # dedup=False if deduplication is not wanted
+
+Supported Exploration Strategies
+--------------------------------
+
+NNI provides the following exploration strategies for multi-trial NAS. Users could also `customize new exploration strategies <./WriteStrategy.rst>`__.
+
+.. list-table::
+   :header-rows: 1
+   :widths: auto
+
+   * - Name
+     - Brief Introduction of Algorithm
+   * - `Random Strategy <./ApiReference.rst#nni.retiarii.strategy.Random>`__
+     - Randomly sampling new model(s) from user defined model space. (``nni.retiarii.strategy.Random``)
+   * - `Grid Search <./ApiReference.rst#nni.retiarii.strategy.GridSearch>`__
+     - Sampling new model(s) from user defined model space using grid search algorithm. (``nni.retiarii.strategy.GridSearch``)
+   * - `Regularized Evolution <./ApiReference.rst#nni.retiarii.strategy.RegularizedEvolution>`__
+     - Generating new model(s) from generated models using `regularized evolution algorithm <https://arxiv.org/abs/1802.01548>`__ . (``nni.retiarii.strategy.RegularizedEvolution``)
+   * - `TPE Strategy <./ApiReference.rst#nni.retiarii.strategy.TPEStrategy>`__
+     - Sampling new model(s) from user defined model space using `TPE algorithm <https://papers.nips.cc/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf>`__ . (``nni.retiarii.strategy.TPEStrategy``)
+   * - `RL Strategy <./ApiReference.rst#nni.retiarii.strategy.PolicyBasedRL>`__
+     - It uses `PPO algorithm <https://arxiv.org/abs/1707.06347>`__ to sample new model(s) from user defined model space. (``nni.retiarii.strategy.PolicyBasedRL``)
\ No newline at end of file
diff --git a/docs/en_US/NAS/FBNet.rst b/docs/en_US/NAS/FBNet.rst
index 1fb18f4165..8e92b1cd4b 100644
--- a/docs/en_US/NAS/FBNet.rst
+++ b/docs/en_US/NAS/FBNet.rst
@@ -1,6 +1,8 @@
 FBNet
 ======
 
+.. note:: This one-shot NAS is still implemented under NNI NAS 1.0, and will `be migrated to Retiarii framework in v2.4 <https://github.com/microsoft/nni/issues/3814>`__.
+
 For the mobile application of facial landmark, based on the basic architecture of PFLD model, we have applied the FBNet (Block-wise DNAS) to design an concise model with the trade-off between latency and accuracy. References are listed as below:
 
 
@@ -148,4 +150,4 @@ The checkpoints of pre-trained supernet and subnet are offered as below:
 
 * `Supernet <https://drive.google.com/file/d/1TCuWKq8u4_BQ84BWbHSCZ45N3JGB9kFJ/view?usp=sharing>`__
 * `Subnet <https://drive.google.com/file/d/160rkuwB7y7qlBZNM3W_T53cb6MQIYHIE/view?usp=sharing>`__
-* `ONNX model <https://drive.google.com/file/d/1s-v-aOiMv0cqBspPVF3vSGujTbn_T_Uo/view?usp=sharing>`__
+* `ONNX model <https://drive.google.com/file/d/1s-v-aOiMv0cqBspPVF3vSGujTbn_T_Uo/view?usp=sharing>`__
\ No newline at end of file
diff --git a/docs/en_US/NAS/ModelEvaluators.rst b/docs/en_US/NAS/ModelEvaluators.rst
new file mode 100644
index 0000000000..5e241004a6
--- /dev/null
+++ b/docs/en_US/NAS/ModelEvaluators.rst
@@ -0,0 +1,43 @@
+Model Evaluators
+================
+
+A model evaluator is for training and validating each generated model.
+
+Usage of Model Evaluator
+------------------------
+
+In multi-trial NAS, a sampled model should be able to be executed on a remote machine or a training platform (e.g., AzureML, OpenPAI). Thus, both the model and its model evaluator should be correctly serialized. To make NNI correctly serialize model evaluator, users should apply ``serialize`` on some of their functions and objects.
+
+.. _serializer:
+
+`serialize <./ApiReference.rst#utilities>`__ enables re-instantiation of model evaluator in another process or machine. It is implemented by recording the initialization parameters of user instantiated evaluator.
+
+The evaluator related APIs provided by Retiarii have already supported serialization, for example ``pl.Classification``, ``pl.DataLoader``, no need to apply ``serialize`` on them. In the following case users should use ``serialize`` API manually.
+
+If the initialization parameters of the evaluator APIs (e.g., ``pl.Classification``, ``pl.DataLoader``) are not primitive types (e.g., ``int``, ``string``), they should be applied with  ``serialize``. If those parameters' initialization parameters are not primitive types, ``serialize`` should also be applied. In a word, ``serialize`` should be applied recursively if necessary.
+
+Below is an example, ``transforms.Compose``, ``transforms.Normalize``, and ``MNIST`` are serialized manually using ``serialize``. ``serialize`` takes a class ``cls`` as its first argument, its following arguments are the arguments for initializing this class. ``pl.Classification`` is not applied ``serialize`` because it is already serializable as an API provided by NNI.
+
+.. code-block:: python
+
+  import nni.retiarii.evaluator.pytorch.lightning as pl
+  from nni.retiarii import serialize
+  from torchvision import transforms
+
+  transform = serialize(transforms.Compose, [serialize(transforms.ToTensor()), serialize(transforms.Normalize, (0.1307,), (0.3081,))])
+  train_dataset = serialize(MNIST, root='data/mnist', train=True, download=True, transform=transform)
+  test_dataset = serialize(MNIST, root='data/mnist', train=False, download=True, transform=transform)
+  evaluator = pl.Classification(train_dataloader=pl.DataLoader(train_dataset, batch_size=100),
+                                val_dataloaders=pl.DataLoader(test_dataset, batch_size=100),
+                                max_epochs=10)
+
+Supported Model Evaluators
+--------------------------
+
+NNI provides some commonly used model evaluators for users' convenience. If these model evaluators do not meet users' requirement, they can customize new model evaluators following the tutorial `here <./WriteTrainer.rst>`__.
+
+..  autoclass:: nni.retiarii.evaluator.pytorch.lightning.Classification
+    :noindex:
+
+..  autoclass:: nni.retiarii.evaluator.pytorch.lightning.Regression
+    :noindex:
diff --git a/docs/en_US/NAS/MutationPrimitives.rst b/docs/en_US/NAS/MutationPrimitives.rst
new file mode 100644
index 0000000000..77a1be8606
--- /dev/null
+++ b/docs/en_US/NAS/MutationPrimitives.rst
@@ -0,0 +1,41 @@
+Mutation Primitives
+===================
+
+To make users easily express a model space within their PyTorch/TensorFlow model, NNI provides some inline mutation APIs as shown below.
+
+* `nn.LayerChoice <./ApiReference.rst#nni.retiarii.nn.pytorch.LayerChoice>`__. It allows users to put several candidate operations (e.g., PyTorch modules), one of them is chosen in each explored model.
+
+  .. code-block:: python
+
+    # import nni.retiarii.nn.pytorch as nn
+    # declared in `__init__` method
+    self.layer = nn.LayerChoice([
+      ops.PoolBN('max', channels, 3, stride, 1),
+      ops.SepConv(channels, channels, 3, stride, 1),
+      nn.Identity()
+    ]))
+    # invoked in `forward` method
+    out = self.layer(x)
+
+* `nn.InputChoice <./ApiReference.rst#nni.retiarii.nn.pytorch.InputChoice>`__. It is mainly for choosing (or trying) different connections. It takes several tensors and chooses ``n_chosen`` tensors from them.
+
+  .. code-block:: python
+
+    # import nni.retiarii.nn.pytorch as nn
+    # declared in `__init__` method
+    self.input_switch = nn.InputChoice(n_chosen=1)
+    # invoked in `forward` method, choose one from the three
+    out = self.input_switch([tensor1, tensor2, tensor3])
+
+* `nn.ValueChoice <./ApiReference.rst#nni.retiarii.nn.pytorch.ValueChoice>`__. It is for choosing one value from some candidate values. It can only be used as input argument of basic units, that is, modules in ``nni.retiarii.nn.pytorch`` and user-defined modules decorated with ``@basic_unit``.
+
+  .. code-block:: python
+
+    # import nni.retiarii.nn.pytorch as nn
+    # used in `__init__` method
+    self.conv = nn.Conv2d(XX, XX, kernel_size=nn.ValueChoice([1, 3, 5])
+    self.op = MyOp(nn.ValueChoice([0, 1]), nn.ValueChoice([-1, 1]))
+
+* `nn.Repeat <./ApiReference.rst#nni.retiarii.nn.pytorch.Repeat>`__. Repeat a block by a variable number of times.
+
+* `nn.Cell <./ApiReference.rst#nni.retiarii.nn.pytorch.Cell>`__. `This cell structure is popularly used in NAS literature <https://arxiv.org/abs/1611.01578>`__. Specifically, the cell consists of multiple "nodes". Each node is a sum of multiple operators. Each operator is chosen from user specified candidates, and takes one input from previous nodes and predecessors. Predecessor means the input of cell. The output of cell is the concatenation of some of the nodes in the cell (currently all the nodes).
\ No newline at end of file
diff --git a/docs/en_US/NAS/Mutators.rst b/docs/en_US/NAS/Mutators.rst
new file mode 100644
index 0000000000..3e02f89d46
--- /dev/null
+++ b/docs/en_US/NAS/Mutators.rst
@@ -0,0 +1,64 @@
+Express Mutations with Mutators
+===============================
+
+Besides the inline mutation APIs demonstrated `here <./MutationPrimitives.rst>`__, NNI provides a more general approach to express a model space, i.e., *Mutator*, to cover more complex model spaces. Those inline mutation APIs are also implemented with mutator in the underlying system, which can be seen as a special case of model mutation.
+
+.. note:: Mutator and inline mutation APIs cannot be used together.
+
+A mutator is a piece of logic to express how to mutate a given model. Users are free to write their own mutators. Then a model space is expressed with a base model and a list of mutators. A model in the model space is sampled by applying the mutators on the base model one after another. An example is shown below.
+
+.. code-block:: python
+
+  applied_mutators = []
+  applied_mutators.append(BlockMutator('mutable_0'))
+  applied_mutators.append(BlockMutator('mutable_1'))
+
+``BlockMutator`` is defined by users to express how to mutate the base model. 
+
+Write a mutator
+---------------
+
+User-defined mutator should inherit ``Mutator`` class, and implement mutation logic in the member function ``mutate``.
+
+.. code-block:: python
+
+  from nni.retiarii import Mutator
+  class BlockMutator(Mutator):
+    def __init__(self, target: str, candidates: List):
+        super(BlockMutator, self).__init__()
+        self.target = target
+        self.candidate_op_list = candidates
+
+    def mutate(self, model):
+      nodes = model.get_nodes_by_label(self.target)
+      for node in nodes:
+        chosen_op = self.choice(self.candidate_op_list)
+        node.update_operation(chosen_op.type, chosen_op.params)
+
+The input of ``mutate`` is graph IR (Intermediate Representation) of the base model (please refer to `here <./ApiReference.rst>`__ for the format and APIs of the IR), users can mutate the graph using the graph's member functions (e.g., ``get_nodes_by_label``, ``update_operation``). The mutation operations can be combined with the API ``self.choice``, in order to express a set of possible mutations. In the above example, the node's operation can be changed to any operation from ``candidate_op_list``.
+
+Use placehoder to make mutation easier: ``nn.Placeholder``. If you want to mutate a subgraph or node of your model, you can define a placeholder in this model to represent the subgraph or node. Then, use mutator to mutate this placeholder to make it real modules.
+
+.. code-block:: python
+
+  ph = nn.Placeholder(
+    label='mutable_0',
+    kernel_size_options=[1, 3, 5],
+    n_layer_options=[1, 2, 3, 4],
+    exp_ratio=exp_ratio,
+    stride=stride
+  )
+
+``label`` is used by mutator to identify this placeholder. The other parameters are the information that is required by mutator. They can be accessed from ``node.operation.parameters`` as a dict, it could include any information that users want to put to pass it to user defined mutator. The complete example code can be found in :githublink:`Mnasnet base model <examples/nas/multi-trial/mnasnet/base_mnasnet.py>`.
+
+Starting an experiment is almost the same as using inline mutation APIs. The only difference is that the applied mutators should be passed to ``RetiariiExperiment``. Below is a simple example.
+
+.. code-block:: python
+
+  exp = RetiariiExperiment(base_model, trainer, applied_mutators, simple_strategy)
+  exp_config = RetiariiExeConfig('local')
+  exp_config.experiment_name = 'mnasnet_search'
+  exp_config.trial_concurrency = 2
+  exp_config.max_trial_number = 10
+  exp_config.training_service.use_active_gpu = False
+  exp.run(exp_config, 8081)
diff --git a/docs/en_US/NAS/NasGuide.rst b/docs/en_US/NAS/NasGuide.rst
deleted file mode 100644
index 45475c686a..0000000000
--- a/docs/en_US/NAS/NasGuide.rst
+++ /dev/null
@@ -1,88 +0,0 @@
-One-shot NAS algorithms
-=======================
-
-Besides `classic NAS algorithms <./ClassicNas.rst>`__\ , users also apply more advanced one-shot NAS algorithms to find better models from a search space. There are lots of related works about one-shot NAS algorithms, such as `SMASH <https://arxiv.org/abs/1708.05344>`__\ , `ENAS <https://arxiv.org/abs/1802.03268>`__\ , `DARTS <https://arxiv.org/abs/1808.05377>`__\ , `FBNet <https://arxiv.org/abs/1812.03443>`__\ , `ProxylessNAS <https://arxiv.org/abs/1812.00332>`__\ , `SPOS <https://arxiv.org/abs/1904.00420>`__\ , `Single-Path NAS <https://arxiv.org/abs/1904.02877>`__\ ,  `Understanding One-shot <http://proceedings.mlr.press/v80/bender18a>`__ and `GDAS <https://arxiv.org/abs/1910.04465>`__. One-shot NAS algorithms usually build a supernet containing every candidate in the search space as its subnetwork, and in each step, a subnetwork or combination of several subnetworks is trained.
-
-Currently, several one-shot NAS methods are supported on NNI. For example, ``DartsTrainer``\ , which uses SGD to train architecture weights and model weights iteratively, and ``ENASTrainer``\ , which `uses a controller to train the model <https://arxiv.org/abs/1802.03268>`__. New and more efficient NAS trainers keep emerging in research community and some will be implemented in future releases of NNI.
-
-Search with One-shot NAS Algorithms
------------------------------------
-
-Each one-shot NAS algorithm implements a trainer, for which users can find usage details in the description of each algorithm. Here is a simple example, demonstrating how users can use ``EnasTrainer``.
-
-.. code-block:: python
-
-   # this is exactly same as traditional model training
-   model = Net()
-   dataset_train = CIFAR10(root="./data", train=True, download=True, transform=train_transform)
-   dataset_valid = CIFAR10(root="./data", train=False, download=True, transform=valid_transform)
-   criterion = nn.CrossEntropyLoss()
-   optimizer = torch.optim.SGD(model.parameters(), 0.05, momentum=0.9, weight_decay=1.0E-4)
-
-   # use NAS here
-   def top1_accuracy(output, target):
-       # this is the function that computes the reward, as required by ENAS algorithm
-       batch_size = target.size(0)
-       _, predicted = torch.max(output.data, 1)
-       return (predicted == target).sum().item() / batch_size
-
-   def metrics_fn(output, target):
-       # metrics function receives output and target and computes a dict of metrics
-       return {"acc1": top1_accuracy(output, target)}
-
-   from nni.algorithms.nas.pytorch import enas
-   trainer = enas.EnasTrainer(model,
-                              loss=criterion,
-                              metrics=metrics_fn,
-                              reward_function=top1_accuracy,
-                              optimizer=optimizer,
-                              batch_size=128
-                              num_epochs=10,  # 10 epochs
-                              dataset_train=dataset_train,
-                              dataset_valid=dataset_valid,
-                              log_frequency=10)  # print log every 10 steps
-   trainer.train()  # training
-   trainer.export(file="model_dir/final_architecture.json")  # export the final architecture to file
-
-``model`` is the one with `user defined search space <./WriteSearchSpace.rst>`__. Then users should prepare training data and model evaluation metrics. To search from the defined search space, a one-shot algorithm is instantiated, called trainer (e.g., EnasTrainer). The trainer exposes a few arguments that you can customize. For example, the loss function, the metrics function, the optimizer, and the datasets. These should satisfy most usage requirements and we do our best to make sure our built-in trainers work on as many models, tasks, and datasets as possible.
-
-**Note that** when using one-shot NAS algorithms, there is no need to start an NNI experiment. Users can directly run this Python script (i.e., ``train.py``\ ) through ``python3 train.py`` without ``nnictl``. After training, users can export the best one of the found models through ``trainer.export()``.
-
-Each trainer in NNI has its targeted scenario and usage. Some trainers have the assumption that the task is a classification task; some trainers might have a different definition of "epoch" (e.g., an ENAS epoch = some child steps + some controller steps). Most trainers do not have support for distributed training: they won't wrap your model with ``DataParallel`` or ``DistributedDataParallel`` to do that. So after a few tryouts, if you want to actually use the trainers on your very customized applications, you might need to `customize your trainer <./Advanced.rst#extend-the-ability-of-one-shot-trainers>`__.
-
-Furthermore, one-shot NAS can be visualized with our NAS UI. `See more details. <./Visualization.rst>`__
-
-Retrain with Exported Architecture
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-After the search phase, it's time to train the found architecture. Unlike many open-source NAS algorithms who write a whole new model specifically for retraining. We found that the search model and retraining model are usually very similar, and therefore you can construct your final model with the exact same model code. For example
-
-.. code-block:: python
-
-   model = Net()
-   apply_fixed_architecture(model, "model_dir/final_architecture.json")
-
-The JSON is simply a mapping from mutable keys to choices. Choices can be expressed in:
-
-
-* A string: select the candidate with corresponding name.
-* A number: select the candidate with corresponding index.
-* A list of string: select the candidates with corresponding names.
-* A list of number: select the candidates with corresponding indices.
-* A list of boolean values: a multi-hot array.
-
-For example,
-
-.. code-block:: json
-
-   {
-       "LayerChoice1": "conv5x5",
-       "LayerChoice2": 6,
-       "InputChoice3": ["layer1", "layer3"],
-       "InputChoice4": [1, 2],
-       "InputChoice5": [false, true, false, false, true]
-   }
-
-After applying, the model is then fixed and ready for final training. The model works as a single model, and unused parameters and modules are pruned.
-
-Also, refer to `DARTS <./DARTS.rst>`__ for code exemplifying retraining.
diff --git a/docs/en_US/NAS/NasReference.rst b/docs/en_US/NAS/NasReference.rst
deleted file mode 100644
index 6df2be425b..0000000000
--- a/docs/en_US/NAS/NasReference.rst
+++ /dev/null
@@ -1,99 +0,0 @@
-NAS Reference
-=============
-
-.. contents::
-
-Mutables
---------
-
-..  autoclass:: nni.nas.pytorch.mutables.Mutable
-    :members:
-
-..  autoclass:: nni.nas.pytorch.mutables.LayerChoice
-    :members:
-
-..  autoclass:: nni.nas.pytorch.mutables.InputChoice
-    :members:
-
-..  autoclass:: nni.nas.pytorch.mutables.MutableScope
-    :members:
-
-Utilities
-^^^^^^^^^
-
-..  autofunction:: nni.nas.pytorch.utils.global_mutable_counting
-
-Mutators
---------
-
-..  autoclass:: nni.nas.pytorch.base_mutator.BaseMutator
-    :members:
-
-..  autoclass:: nni.nas.pytorch.mutator.Mutator
-    :members:
-
-Random Mutator
-^^^^^^^^^^^^^^
-
-..  autoclass:: nni.algorithms.nas.pytorch.random.RandomMutator
-    :members:
-
-Utilities
-^^^^^^^^^
-
-..  autoclass:: nni.nas.pytorch.utils.StructuredMutableTreeNode
-    :members:
-
-Trainers
---------
-
-Trainer
-^^^^^^^
-
-..  autoclass:: nni.nas.pytorch.base_trainer.BaseTrainer
-    :members:
-
-..  autoclass:: nni.nas.pytorch.trainer.Trainer
-    :members:
-
-Retrain
-^^^^^^^
-
-..  autofunction:: nni.nas.pytorch.fixed.apply_fixed_architecture
-
-..  autoclass:: nni.nas.pytorch.fixed.FixedArchitecture
-    :members:
-
-Distributed NAS
-^^^^^^^^^^^^^^^
-
-..  autofunction:: nni.algorithms.nas.pytorch.classic_nas.get_and_apply_next_architecture
-
-..  autoclass:: nni.algorithms.nas.pytorch.classic_nas.mutator.ClassicMutator
-    :members:
-
-Callbacks
-^^^^^^^^^
-
-..  autoclass:: nni.nas.pytorch.callbacks.Callback
-    :members:
-
-..  autoclass:: nni.nas.pytorch.callbacks.LRSchedulerCallback
-    :members:
-
-..  autoclass:: nni.nas.pytorch.callbacks.ArchitectureCheckpoint
-    :members:
-
-..  autoclass:: nni.nas.pytorch.callbacks.ModelCheckpoint
-    :members:
-
-Utilities
-^^^^^^^^^
-
-..  autoclass:: nni.nas.pytorch.utils.AverageMeterGroup
-    :members:
-
-..  autoclass:: nni.nas.pytorch.utils.AverageMeter
-    :members:
-
-..  autofunction:: nni.nas.pytorch.utils.to_device
diff --git a/docs/en_US/NAS/OneshotTrainer.rst b/docs/en_US/NAS/OneshotTrainer.rst
new file mode 100644
index 0000000000..76baa36b92
--- /dev/null
+++ b/docs/en_US/NAS/OneshotTrainer.rst
@@ -0,0 +1,37 @@
+One-shot NAS
+============
+
+Before reading this tutorial, we highly recommend you to first go through the tutorial of how to `define a model space <./QuickStart.rst#define-your-model-space>`__.
+
+Model Search with One-shot Trainer
+----------------------------------
+
+With a defined model space, users can explore the space in two ways. One is using strategy and single-arch evaluator as demonstrated `here <./QuickStart.rst#explore-the-defined-model-space>`__. The other is using one-shot trainer, which consumes much less computational resource compared to the first one. In this tutorial we focus on this one-shot approach. The principle of one-shot approach is combining all the models in a model space into one big model (usually called super-model or super-graph). It takes charge of both search, training and testing, by training and evaluating this big model.
+
+We list the supported one-shot trainers here:
+
+* DARTS trainer
+* ENAS trainer
+* ProxylessNAS trainer
+* Single-path (random) trainer
+
+See `API reference <./ApiReference.rst>`__ for detailed usages. Here, we show an example to use DARTS trainer manually.
+
+.. code-block:: python
+
+  from nni.retiarii.oneshot.pytorch import DartsTrainer
+  trainer = DartsTrainer(
+      model=model,
+      loss=criterion,
+      metrics=lambda output, target: accuracy(output, target, topk=(1,)),
+      optimizer=optim,
+      num_epochs=args.epochs,
+      dataset=dataset_train,
+      batch_size=args.batch_size,
+      log_frequency=args.log_frequency,
+      unrolled=args.unrolled
+  )
+  trainer.fit()
+  final_architecture = trainer.export()
+
+**Format of the exported architecture.** TBD.
diff --git a/docs/en_US/NAS/Overview.rst b/docs/en_US/NAS/Overview.rst
index a8d20a2e7d..3b177231a6 100644
--- a/docs/en_US/NAS/Overview.rst
+++ b/docs/en_US/NAS/Overview.rst
@@ -1,116 +1,82 @@
-Neural Architecture Search (NAS) on NNI
+Retiarii for Neural Architecture Search
 =======================================
 
+.. Note:: NNI's latest NAS supports are all based on Retiarii Framework, users who are still on `early version using NNI NAS v1.0 <https://nni.readthedocs.io/en/v2.2/nas.html>`__ shall migrate your work to Retiarii as soon as possible.
+
 .. contents::
 
+Motivation
+----------
+
+Automatic neural architecture search is playing an increasingly important role in finding better models. Recent research has proven the feasibility of automatic NAS and has led to models that beat many manually designed and tuned models. Representative works include `NASNet <https://arxiv.org/abs/1707.07012>`__\ , `ENAS <https://arxiv.org/abs/1802.03268>`__\ , `DARTS <https://arxiv.org/abs/1806.09055>`__\ , `Network Morphism <https://arxiv.org/abs/1806.10282>`__\ , and `Evolution <https://arxiv.org/abs/1703.01041>`__. In addition, new innovations continue to emerge.
+
+However, it is pretty hard to use existing NAS work to help develop common DNN models. Therefore, we designed `Retiarii <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__, a novel NAS/HPO framework, and implemented it in NNI. It helps users easily construct a model space (or search space, tuning space), and utilize existing NAS algorithms. The framework also facilitates NAS innovation and is used to design new NAS algorithms.
+
 Overview
 --------
 
-Automatic neural architecture search is taking an increasingly important role in finding better models. Recent research has proved the feasibility of automatic NAS and has lead to models that beat many manually designed and tuned models. Some representative works are `NASNet <https://arxiv.org/abs/1707.07012>`__\ , `ENAS <https://arxiv.org/abs/1802.03268>`__\ , `DARTS <https://arxiv.org/abs/1806.09055>`__\ , `Network Morphism <https://arxiv.org/abs/1806.10282>`__\ , and `Evolution <https://arxiv.org/abs/1703.01041>`__. Further, new innovations keep emerging.
-
-However, it takes a great effort to implement NAS algorithms, and it's hard to reuse the code base of existing algorithms for new ones. To facilitate NAS innovations (e.g., the design and implementation of new NAS models, the comparison of different NAS models side-by-side, etc.), an easy-to-use and flexible programming interface is crucial.
+There are three key characteristics of the Retiarii framework:
 
-With this motivation, our ambition is to provide a unified architecture in NNI, accelerate innovations on NAS, and apply state-of-the-art algorithms to real-world problems faster.
+* Simple APIs are provided for defining model search space within PyTorch/TensorFlow model.
+* SOTA NAS algorithms are built-in to be used for exploring model search space.
+* System-level optimizations are implemented for speeding up the exploration.
 
-With the unified interface, there are two different modes for architecture search. `One <#supported-one-shot-nas-algorithms>`__ is the so-called one-shot NAS, where a super-net is built based on a search space and one-shot training is used to generate a good-performing child model. `The other <#supported-classic-nas-algorithms>`__ is the traditional search-based approach, where each child model within the search space runs as an independent trial. We call it classic NAS.
+There are two types of model space exploration approach: **Multi-trial NAS** and **One-shot NAS**. Mutli-trial NAS trains each sampled model in the model space independently, while One-shot NAS samples the model from a super model. After constructing the model space, users can use either exploration appraoch to explore the model space. 
 
-NNI also provides dedicated `visualization tool <#nas-visualization>`__ for users to check the status of the neural architecture search process.
 
-Supported Classic NAS Algorithms
---------------------------------
+Multi-trial NAS
+---------------
 
-The procedure of classic NAS algorithms is similar to hyper-parameter tuning, users use ``nnictl`` to start experiments and each model runs as a trial. The difference is that search space file is automatically generated from user model (with search space in it) by running ``nnictl ss_gen``. The following table listed supported tuning algorihtms for classic NAS mode. More algorihtms will be supported in future release.
+Multi-trial NAS means each sampled model from model space is trained independently. A typical multi-trial NAS is `NASNet <https://arxiv.org/abs/1707.07012>`__. The algorithm to sample models from model space is called exploration strategy. NNI has supported the following exploration strategies for multi-trial NAS.
 
 .. list-table::
    :header-rows: 1
    :widths: auto
 
-   * - Name
+   * - Exploration Strategy Name
      - Brief Introduction of Algorithm
-   * - :githublink:`Random Search <examples/tuners/random_nas_tuner>`
-     - Randomly pick a model from search space
-   * - `PPO Tuner <../Tuner/BuiltinTuner.rst#PPO-Tuner>`__
-     - PPO Tuner is a Reinforcement Learning tuner based on PPO algorithm. `Reference Paper <https://arxiv.org/abs/1707.06347>`__
+   * - Random Strategy
+     - Randomly sampling new model(s) from user defined model space. (``nni.retiarii.strategy.Random``)
+   * - Grid Search
+     - Sampling new model(s) from user defined model space using grid search algorithm. (``nni.retiarii.strategy.GridSearch``)
+   * - Regularized Evolution
+     - Generating new model(s) from generated models using `regularized evolution algorithm <https://arxiv.org/abs/1802.01548>`__ . (``nni.retiarii.strategy.RegularizedEvolution``)
+   * - TPE Strategy
+     - Sampling new model(s) from user defined model space using `TPE algorithm <https://papers.nips.cc/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf>`__ . (``nni.retiarii.strategy.TPEStrategy``)
+   * - RL Strategy
+     - It uses `PPO algorithm <https://arxiv.org/abs/1707.06347>`__ to sample new model(s) from user defined model space. (``nni.retiarii.strategy.PolicyBasedRL``)
 
 
-Please refer to `here <ClassicNas.rst>`__ for the usage of classic NAS algorithms.
+Please refer to `here <./multi_trial_nas.rst>`__ for detailed usage of multi-trial NAS.
 
-Supported One-shot NAS Algorithms
----------------------------------
+One-shot NAS
+------------
 
-NNI currently supports the one-shot NAS algorithms listed below and is adding more. Users can reproduce an algorithm or use it on their own dataset. We also encourage users to implement other algorithms with `NNI API <#use-nni-api>`__\ , to benefit more people.
+One-shot NAS means building model space into a super-model, training the super-model with weight sharing, and then sampling models from the super-model to find the best one. `DARTS <https://arxiv.org/abs/1806.09055>`__ is a typical one-shot NAS.
+Below is the supported one-shot NAS algorithms. More one-shot NAS will be supported soon.
 
 .. list-table::
    :header-rows: 1
    :widths: auto
 
-   * - Name
+   * - One-shot Algorithm Name
      - Brief Introduction of Algorithm
    * - `ENAS <ENAS.rst>`__
      - `Efficient Neural Architecture Search via Parameter Sharing <https://arxiv.org/abs/1802.03268>`__. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. It uses parameter sharing between child models to achieve fast speed and excellent performance.
    * - `DARTS <DARTS.rst>`__
      - `DARTS: Differentiable Architecture Search <https://arxiv.org/abs/1806.09055>`__ introduces a novel algorithm for differentiable network architecture search on bilevel optimization.
-   * - `P-DARTS <PDARTS.rst>`__
-     - `Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation <https://arxiv.org/abs/1904.12760>`__ is based on DARTS. It introduces an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure.
    * - `SPOS <SPOS.rst>`__
      - `Single Path One-Shot Neural Architecture Search with Uniform Sampling <https://arxiv.org/abs/1904.00420>`__ constructs a simplified supernet trained with a uniform path sampling method and applies an evolutionary algorithm to efficiently search for the best-performing architectures.
-   * - `CDARTS <CDARTS.rst>`__
-     - `Cyclic Differentiable Architecture Search <https://arxiv.org/pdf/2006.10724.pdf>`__ builds a cyclic feedback mechanism between the search and evaluation networks. It introduces a cyclic differentiable architecture search framework which integrates the two networks into a unified architecture.
    * - `ProxylessNAS <Proxylessnas.rst>`__
      - `ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware <https://arxiv.org/abs/1812.00332>`__. It removes proxy, directly learns the architectures for large-scale target tasks and target hardware platforms.
-   * - `FBNet <FBNet.rst>`__
-     - `FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search <https://arxiv.org/abs/1812.03443>`__. It is a block-wise differentiable neural network architecture search method with the hardware-aware constraint.
-   * - `TextNAS <TextNAS.rst>`__
-     - `TextNAS: A Neural Architecture Search Space tailored for Text Representation <https://arxiv.org/pdf/1912.10729.pdf>`__. It is a neural architecture search algorithm tailored for text representation.
-   * - `Cream <Cream.rst>`__
-     - `Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search <https://papers.nips.cc/paper/2020/file/d072677d210ac4c03ba046120f0802ec-Paper.pdf>`__. It is a new NAS algorithm distilling prioritized paths in search space, without using evolutionary algorithms. Achieving competitive performance on ImageNet, especially for small models (e.g. <200 M FLOPs).
-
-
-One-shot algorithms run **standalone without nnictl**. NNI supports both PyTorch and Tensorflow 2.X.
-
-Here are some common dependencies to run the examples. PyTorch needs to be above 1.2 to use ``BoolTensor``.
-
-
-* tensorboard
-* PyTorch 1.2+
-* git
-
-Please refer to `here <NasGuide.rst>`__ for the usage of one-shot NAS algorithms.
-
-One-shot NAS can be visualized with our visualization tool. Learn more details `here <./Visualization.rst>`__.
 
-Search Space Zoo
-----------------
-
-NNI provides some predefined search space which can be easily reused. By stacking the extracted cells, user can quickly reproduce those NAS models.
-
-Search Space Zoo contains the following NAS cells:
-
-
-* `DartsCell <./SearchSpaceZoo.rst#DartsCell>`__
-* `ENAS micro <./SearchSpaceZoo.rst#ENASMicroLayer>`__
-* `ENAS macro <./SearchSpaceZoo.rst#ENASMacroLayer>`__
-* `NAS Bench 201 <./SearchSpaceZoo.rst#nas-bench-201>`__
-
-Using NNI API to Write Your Search Space
-----------------------------------------
-
-The programming interface of designing and searching a model is often demanded in two scenarios.
-
-
-#. When designing a neural network, there may be multiple operation choices on a layer, sub-model, or connection, and it's undetermined which one or combination performs best. So, it needs an easy way to express the candidate layers or sub-models.
-#. When applying NAS on a neural network, it needs a unified way to express the search space of architectures, so that it doesn't need to update trial code for different search algorithms.
-
-For using NNI NAS, we suggest users to first go through `the tutorial of NAS API for building search space <./WriteSearchSpace.rst>`__.
-
-NAS Visualization
------------------
-
-To help users track the process and status of how the model is searched under specified search space, we developed a visualization tool. It visualizes search space as a super-net and shows importance of subnets and layers/operations, as well as how the importance changes along with the search process. Please refer to `the document of NAS visualization <./Visualization.rst>`__ for how to use it.
+Please refer to `here <one_shot_nas.rst>`__ for detailed usage of one-shot NAS algorithms.
 
 Reference and Feedback
 ----------------------
 
-
-* To `report a bug <https://github.com/microsoft/nni/issues/new?template=bug-report.rst>`__ for this feature in GitHub;
-* To `file a feature or improvement request <https://github.com/microsoft/nni/issues/new?template=enhancement.rst>`__ for this feature in GitHub.
+* `Quick Start <./QuickStart.rst>`__ ;
+* `Construct Your Model Space <./construct_space.rst>`__ ;
+* `Retiarii: A Deep Learning Exploratory-Training Framework <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__ ;
+* To `report a bug <https://github.com/microsoft/nni/issues/new?template=bug-report.rst>`__ for this feature in GitHub ;
+* To `file a feature or improvement request <https://github.com/microsoft/nni/issues/new?template=enhancement.rst>`__ for this feature in GitHub .
diff --git a/docs/en_US/NAS/PDARTS.rst b/docs/en_US/NAS/PDARTS.rst
deleted file mode 100644
index 4884ce33b7..0000000000
--- a/docs/en_US/NAS/PDARTS.rst
+++ /dev/null
@@ -1,20 +0,0 @@
-P-DARTS
-=======
-
-Examples
---------
-
-:githublink:`Example code <examples/nas/legacy/pdarts>`
-
-.. code-block:: bash
-
-   # In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
-   git clone https://github.com/Microsoft/nni.git
-
-   # search the best architecture
-   cd examples/nas/legacy/pdarts
-   python3 search.py
-
-   # train the best architecture, it's the same progress as darts.
-   cd ../darts
-   python3 retrain.py --arc-checkpoint ../pdarts/checkpoints/epoch_2.json
diff --git a/docs/en_US/NAS/Proxylessnas.rst b/docs/en_US/NAS/Proxylessnas.rst
index 2a1c8980d1..c6771ec302 100644
--- a/docs/en_US/NAS/Proxylessnas.rst
+++ b/docs/en_US/NAS/Proxylessnas.rst
@@ -9,7 +9,7 @@ The paper `ProxylessNAS: Direct Neural Architecture Search on Target Task and Ha
 Usage
 -----
 
-To use ProxylessNAS training/searching approach, users need to specify search space in their model using `NNI NAS interface <NasGuide.rst>`__\ , e.g., ``LayerChoice``\ , ``InputChoice``. After defining and instantiating the model, the following work can be leaved to ProxylessNasTrainer by instantiating the trainer and passing the model to it.
+To use ProxylessNAS training/searching approach, users need to specify search space in their model using `NNI NAS interface <./MutationPrimitives.rst>`__\ , e.g., ``LayerChoice``\ , ``InputChoice``. After defining and instantiating the model, the following work can be leaved to ProxylessNasTrainer by instantiating the trainer and passing the model to it.
 
 .. code-block:: python
 
diff --git a/docs/en_US/NAS/retiarii/Tutorial.rst b/docs/en_US/NAS/QuickStart.rst
similarity index 64%
rename from docs/en_US/NAS/retiarii/Tutorial.rst
rename to docs/en_US/NAS/QuickStart.rst
index 3c1af8e838..2aa8492036 100644
--- a/docs/en_US/NAS/retiarii/Tutorial.rst
+++ b/docs/en_US/NAS/QuickStart.rst
@@ -1,24 +1,17 @@
-Neural Architecture Search with Retiarii (Alpha)
-================================================
+Quick Start of Retiarii on NNI
+==============================
 
-*This is a pre-release, its interfaces may subject to minor changes. The roadmap of this feature is: experimental in V2.0 -> alpha version in V2.1 -> beta version in V2.2 -> official release in V2.3. Feel free to give us your comments and suggestions.*
-
-`Retiarii <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__ is a new framework to support neural architecture search and hyper-parameter tuning. It allows users to express various search space with high flexibility, to reuse many SOTA search algorithms, and to leverage system level optimizations to speed up the search process. This framework provides the following new user experiences.
-
-* Search space can be expressed directly in user model code. A tuning space can be expressed during defining a model.
-* Neural architecture candidates and hyper-parameter candidates are more friendly supported in an experiment.
-* The experiment can be launched directly from python code.
-
-.. Note:: `Our previous NAS framework <../Overview.rst>`__ is still supported for now, but will be migrated to Retiarii framework in V2.3.
 
 .. contents::
 
-There are mainly two crucial components for a neural architecture search task, namely,
+In this quick start tutorial, we use multi-trial NAS as an example to show how to construct and explore a model space. There are mainly three crucial components for a neural architecture search task, namely,
 
 * Model search space that defines the set of models to explore.
 * A proper strategy as the method to explore this search space.
 * A model evaluator that reports the performance of a given model.
 
+One-shot NAS tutorial can be found `here <./OneshotTrainer.rst>`__.
+
 .. note:: Currently, PyTorch is the only supported framework by Retiarii, and we have only tested with **PyTorch 1.6 and 1.7**. This documentation assumes PyTorch context but it should also apply to other frameworks, that is in our future plan.
 
 Define your Model Space
@@ -37,8 +30,8 @@ Below is a very simple example of defining a base model, it is almost the same a
 
   import torch.nn.functional as F
   import nni.retiarii.nn.pytorch as nn
+  from nni.retiarii import model_wrapper
 
-  @basic_unit
   class BasicBlock(nn.Module):
     def __init__(self, const):
       self.const = const
@@ -53,6 +46,7 @@ Below is a very simple example of defining a base model, it is almost the same a
     def forward(self, x):
       return self.pool(self.conv(x))
 
+  @model_wrapper      # this decorator should be put on the out most PyTorch module
   class Model(nn.Module):
     def __init__(self):
       super().__init__()
@@ -61,10 +55,6 @@ Below is a very simple example of defining a base model, it is almost the same a
     def forward(self, x):
       return F.relu(self.convpool(self.mymodule(x)))
 
-The above example also shows how to use ``@basic_unit``. ``@basic_unit`` is decorated on a user-defined module to tell Retiarii that there will be no mutation within this module, Retiarii can treat it as a basic unit (i.e., as a blackbox). It is useful when (1) users want to mutate the initialization parameters of this module, or (2) Retiarii fails to parse this module due to complex control flow (e.g., ``for``, ``while``). More detailed description of ``@basic_unit`` can be found `here <./Advanced.rst>`__.
-
-Users can refer to :githublink:`Darts base model <test/retiarii_test/darts/darts_model.py>` and :githublink:`Mnasnet base model <examples/nas/multi-trial/mnasnet/base_mnasnet.py>` for more complicated examples.
-
 Define Model Mutations
 ^^^^^^^^^^^^^^^^^^^^^^
 
@@ -72,7 +62,7 @@ A base model is only one concrete model not a model space. We provide APIs and p
 
 We provide some APIs as shown below for users to easily express possible mutations after defining a base model. The APIs can be used just like PyTorch module. This approach is also called inline mutations.
 
-* ``nn.LayerChoice``. It allows users to put several candidate operations (e.g., PyTorch modules), one of them is chosen in each explored model. Note that if the candidate is a user-defined module, it should be decorated as a `basic unit <./Advanced.rst>`__ with ``@basic_unit``. In the following example, ``ops.PoolBN`` and ``ops.SepConv`` should be decorated.
+* ``nn.LayerChoice``. It allows users to put several candidate operations (e.g., PyTorch modules), one of them is chosen in each explored model.
 
   .. code-block:: python
 
@@ -114,27 +104,21 @@ All the APIs have an optional argument called ``label``, mutations with the same
         nn.Linear(nn.ValueChoice([32, 64, 128], label='hidden_dim'), 3)
     )
 
-Detailed API description and usage can be found `here <./ApiReference.rst>`__\. Example of using these APIs can be found in :githublink:`Darts base model <test/retiarii_test/darts/darts_model.py>`. We are actively enriching the set of inline mutations, to make it easier to express a new search space.
-
-If the inline mutation APIs are not enough for your scenario, you can refer to `defining model space using mutators <./Advanced.rst#express-mutations-with-mutators>`__ to write more complex model spaces.
+Detailed API description and usage can be found `here <./ApiReference.rst>`__\. Example of using these APIs can be found in :githublink:`Darts base model <test/retiarii_test/darts/darts_model.py>`. We are actively enriching the set of inline mutation APIs, to make it easier to express a new search space. Please refer to `here <./construct_space.rst>`__ for more tutorials about how to express complex model spaces.
 
 Explore the Defined Model Space
 -------------------------------
 
 There are basically two exploration approaches: (1) search by evaluating each sampled model independently and (2) one-shot weight-sharing based search. We demonstrate the first approach below in this tutorial. Users can refer to `here <./OneshotTrainer.rst>`__ for the second approach.
 
-Users can choose a proper search strategy to explore the model space, and use a chosen or user-defined model evaluator to evaluate the performance of each sampled model.
+Users can choose a proper exploration strategy to explore the model space, and use a chosen or user-defined model evaluator to evaluate the performance of each sampled model.
 
-Choose a search strategy
+Pick a search strategy
 ^^^^^^^^^^^^^^^^^^^^^^^^
 
-Retiarii currently supports the following search strategies:
-
-* Grid search: enumerate all the possible models defined in the space.
-* Random: randomly pick the models from search space.
-* Regularized evolution: a genetic algorithm that explores the space based on inheritance and mutation.
+Retiarii supports many `exploration strategies <./ExplorationStrategies.rst>`__.
 
-Choose (i.e., instantiate) a search strategy is very easy. An example is as follows,
+Simply choosing (i.e., instantiate) an exploration strategy as below.
 
 .. code-block:: python
 
@@ -142,14 +126,10 @@ Choose (i.e., instantiate) a search strategy is very easy. An example is as foll
 
   search_strategy = strategy.Random(dedup=True)  # dedup=False if deduplication is not wanted
 
-Detailed descriptions and usages of available strategies can be found `here <./ApiReference.rst>`__ .
-
-Choose or write a model evaluator
+Pick or write a model evaluator
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-In the NAS process, the search strategy repeatedly generates new models. A model evaluator is for training and validating each generated model. The obtained performance of a generated model is collected and sent to search strategy for generating better models.
-
-The model evaluator should correctly identify the use case of the model and the optimization goal. For example, on a classification task, an <input, label> dataset is needed, the loss function could be cross entropy and the optimized metric could be accuracy. On a regression task, the optimized metric could be mean-squared-error.
+In the NAS process, the exploration strategy repeatedly generates new models. A model evaluator is for training and validating each generated model. The obtained performance of a generated model is collected and sent to the exploration strategy for generating better models.
 
 In the context of PyTorch, Retiarii has provided two built-in model evaluators, designed for simple use cases: classification and regression. These two evaluators are built upon the awesome library PyTorch-Lightning.
 
@@ -172,7 +152,7 @@ As the model evaluator is running in another process (possibly in some remote ma
 
 Detailed descriptions and usages of model evaluators can be found `here <./ApiReference.rst>`__ .
 
-If the built-in model evaluators do not meet your requirement, or you already wrote the training code and just want to use it, you can follow `the guide to write a new evaluator <./WriteTrainer.rst>`__ .
+If the built-in model evaluators do not meet your requirement, or you already wrote the training code and just want to use it, you can follow `the guide to write a new model evaluator <./WriteTrainer.rst>`__ .
 
 .. note:: In case you want to run the model evaluator locally for debug purpose, you can directly run the evaluator via ``evaluator._execute(Net)`` (note that it has to be ``Net``, not ``Net()``). However, this API is currently internal and subject to change.
 
@@ -195,11 +175,19 @@ After all the above are prepared, it is time to start an experiment to do the mo
   exp_config.training_service.use_active_gpu = False
   exp.run(exp_config, 8081)
 
-The complete code of a simple MNIST example can be found :githublink:`here <examples/nas/multi-trial/mnist/search.py>`.
-
-**Local Debug Mode**: When running an experiment, it is easy to get some trivial errors in trial code, such as shape mismatch, undefined variable. To quickly fix these kinds of errors, we provide local debug mode which locally applies mutators once and runs only that generated model. To use local debug mode, users can simply invoke the API `debug_mutated_model(base_model, trainer, applied_mutators)`.
+The complete code of a simple MNIST example can be found :githublink:`here <examples/nas/multi-trial/mnist/search.py>`. Users can also run Retiarii Experiment on `different training services <../training_services.rst>`__ besides ``local`` training service.
 
 Visualize the Experiment
 ------------------------
 
-Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment. For example, open ``localhost::8081`` in your browser, 8081 is the port that you set in ``exp.run``. Please refer to `here <../../Tutorial/WebUI.rst>`__ for details.
\ No newline at end of file
+Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment. For example, open ``localhost::8081`` in your browser, 8081 is the port that you set in ``exp.run``. Please refer to `here <../../Tutorial/WebUI.rst>`__ for details.
+
+Export Top Models
+-----------------
+
+Users can export top models after the exploration is done using ``export_top_models``.
+
+.. code-block:: python
+
+  for model_code in exp.export_top_models(formatter='dict'):
+    print(model_code)
diff --git a/docs/en_US/NAS/SPOS.rst b/docs/en_US/NAS/SPOS.rst
index 7dd040ff91..9c08eba469 100644
--- a/docs/en_US/NAS/SPOS.rst
+++ b/docs/en_US/NAS/SPOS.rst
@@ -98,14 +98,8 @@ Reference
 PyTorch
 ^^^^^^^
 
-..  autoclass:: nni.algorithms.nas.pytorch.spos.SPOSEvolution
-    :members:
-
-..  autoclass:: nni.algorithms.nas.pytorch.spos.SPOSSupernetTrainer
-    :members:
-
-..  autoclass:: nni.algorithms.nas.pytorch.spos.SPOSSupernetTrainingMutator
-    :members:
+..  autoclass:: nni.retiarii.oneshot.pytorch.SinglePathTrainer
+    :noindex:
 
 Known Limitations
 -----------------
diff --git a/docs/en_US/NAS/SearchSpaceZoo.rst b/docs/en_US/NAS/SearchSpaceZoo.rst
deleted file mode 100644
index 6117e71d81..0000000000
--- a/docs/en_US/NAS/SearchSpaceZoo.rst
+++ /dev/null
@@ -1,281 +0,0 @@
-.. role:: raw-html(raw)
-   :format: html
-
-
-Search Space Zoo
-================
-
-DartsCell
----------
-
-DartsCell is extracted from :githublink:`CNN model <examples/nas/oneshot/darts>`. A DartsCell is a directed acyclic graph containing an ordered sequence of N nodes and each node stands for a latent representation (e.g. feature map in a convolutional network). Directed edges from Node 1 to Node 2 are associated with some operations that transform Node 1 and the result is stored on Node 2. The `Candidate operators <#predefined-operations-darts>`__ between nodes is predefined and unchangeable. One edge represents an operation that chosen from the predefined ones to be applied to the starting node of the edge. One cell contains two input nodes, a single output node, and other ``n_node`` nodes. The input nodes are defined as the cell outputs in the previous two layers. The output of the cell is obtained by applying a reduction operation (e.g. concatenation) to all the intermediate nodes. To make the search space continuous, the categorical choice of a particular operation is relaxed to a softmax over all possible operations. By adjusting the weight of softmax on every node, the operation with the highest probability is chosen to be part of the final structure. A CNN model can be formed by stacking several cells together, which builds a search space. Note that, in DARTS paper all cells in the model share the same structure.
-
-One structure in the Darts search space is shown below. Note that, NNI merges the last one of the four intermediate nodes and the output node.
-
-
-.. image:: ../../img/NAS_Darts_cell.svg
-   :target: ../../img/NAS_Darts_cell.svg
-   :alt: 
-
-
-The predefined operators are shown `here <#predefined-operations-darts>`__.
-
-..  autoclass:: nni.nas.pytorch.search_space_zoo.DartsCell
-    :members:
-
-Example code
-^^^^^^^^^^^^
-
-:githublink:`example code <examples/nas/search_space_zoo/darts_example.py>`
-
-.. code-block:: bash
-
-   git clone https://github.com/Microsoft/nni.git
-   cd nni/examples/nas/search_space_zoo
-   # search the best structure
-   python3 darts_example.py
-
-:raw-html:`<a name="predefined-operations-darts"></a>`
-
-Candidate operators
-^^^^^^^^^^^^^^^^^^^
-
-All supported operators for Darts are listed below.
-
-
-* 
-  MaxPool / AvgPool
-
-
-  * MaxPool: Call ``torch.nn.MaxPool2d``. This operation applies a 2D max pooling over all input channels. Its parameters ``kernel_size=3`` and ``padding=1`` are fixed. The pooling result will pass through a BatchNorm2d then return as the result.
-  * 
-    AvgPool: Call ``torch.nn.AvgPool2d``. This operation applies a 2D average pooling over all input channels. Its parameters ``kernel_size=3`` and ``padding=1`` are fixed. The pooling result will pass through a BatchNorm2d then return as the result.
-
-    MaxPool / AvgPool with ``kernel_size=3`` and ``padding=1`` followed by BatchNorm2d
-
-..  autoclass:: nni.nas.pytorch.search_space_zoo.darts_ops.PoolBN
-
-* 
-  SkipConnect
-
-    There is no operation between two nodes. Call ``torch.nn.Identity`` to forward what it gets to the output.
-
-* 
-  Zero operation
-
-    There is no connection between two nodes.
-
-* 
-  DilConv3x3 / DilConv5x5
-
-    :raw-html:`<a name="DilConv"></a>`\ DilConv3x3: (Dilated) depthwise separable Conv. It's a 3x3 depthwise convolution with ``C_in`` groups, followed by a 1x1 pointwise convolution. It reduces the amount of parameters. Input is first passed through relu, then DilConv and finally batchNorm2d. **Note that the operation is not Dilated Convolution, but we follow the convention in NAS papers to name it DilConv.** 3x3 DilConv has parameters ``kernel_size=3``\ , ``padding=1`` and 5x5 DilConv has parameters ``kernel_size=5``\ , ``padding=4``.
-
-  ..  autoclass:: nni.nas.pytorch.search_space_zoo.darts_ops.DilConv
-
-* 
-  SepConv3x3 / SepConv5x5
-
-    Composed of two DilConvs with fixed ``kernel_size=3``\ , ``padding=1`` or ``kernel_size=5``\ , ``padding=2`` sequentially.
-
-  ..  autoclass:: nni.nas.pytorch.search_space_zoo.darts_ops.SepConv
-
-ENASMicroLayer
---------------
-
-This layer is extracted from the model designed :githublink:`here <examples/nas/oneshot/enas>`. A model contains several blocks that share the same architecture. A block is made up of some normal layers and reduction layers, ``ENASMicroLayer`` is a unified implementation of the two types of layers. The only difference between the two layers is that reduction layers apply all operations with ``stride=2``.
-
-ENAS Micro employs a DAG with N nodes in one cell, where the nodes represent local computations, and the edges represent the flow of information between the N nodes. One cell contains two input nodes and a single output node. The following nodes choose two previous nodes as input and apply two operations from `predefined ones <#predefined-operations-enas>`__ then add them as the output of this node. For example, Node 4 chooses Node 1 and Node 3 as inputs then applies ``MaxPool`` and ``AvgPool`` on the inputs respectively, then adds and sums them as the output of Node 4. Nodes that are not served as input for any other node are viewed as the output of the layer. If there are multiple output nodes, the model will calculate the average of these nodes as the layer output.
-
-The ENAS micro search space is shown below.
-
-
-.. image:: ../../img/NAS_ENAS_micro.svg
-   :target: ../../img/NAS_ENAS_micro.svg
-   :alt: 
- 
-
-The predefined operators can be seen `here <#predefined-operations-enas>`__.
-
-..  autoclass:: nni.nas.pytorch.search_space_zoo.ENASMicroLayer
-    :members:
-
-The Reduction Layer is made up of two Conv operations followed by BatchNorm, each of them will output ``C_out//2`` channels and concat them in channels as the output. The Convolution has ``kernel_size=1`` and ``stride=2``\ , and they perform alternate sampling on the input to reduce the resolution without loss of information. This layer is wrapped in ``ENASMicroLayer``.
-
-Example code
-^^^^^^^^^^^^
-
-:githublink:`example code <examples/nas/search_space_zoo/enas_micro_example.py>`
-
-.. code-block:: bash
-
-   git clone https://github.com/Microsoft/nni.git
-   cd nni/examples/nas/search_space_zoo
-   # search the best cell structure
-   python3 enas_micro_example.py
-
-:raw-html:`<a name="predefined-operations-enas"></a>`
-
-Candidate operators
-^^^^^^^^^^^^^^^^^^^
-
-All supported operators for ENAS micro search are listed below.
-
-
-* 
-  MaxPool / AvgPool
-
-
-  * MaxPool: Call ``torch.nn.MaxPool2d``. This operation applies a 2D max pooling over all input channels followed by BatchNorm2d. Its parameters are fixed to ``kernel_size=3``\ , ``stride=1`` and ``padding=1``.
-  * AvgPool: Call ``torch.nn.AvgPool2d``. This operation applies a 2D average pooling over all input channels followed by BatchNorm2d. Its parameters are fixed to ``kernel_size=3``\ , ``stride=1`` and ``padding=1``.
-
-..  autoclass:: nni.nas.pytorch.search_space_zoo.enas_ops.Pool
-
-* 
-  SepConv
-
-
-  * SepConvBN3x3: ReLU followed by a `DilConv <#DilConv>`__ and BatchNorm. Convolution parameters are ``kernel_size=3``\ , ``stride=1`` and ``padding=1``.
-  * 
-    SepConvBN5x5: Do the same operation as the previous one but it has different kernel sizes and paddings, which is set to 5 and 2 respectively.
-
-..  autoclass:: nni.nas.pytorch.search_space_zoo.enas_ops.SepConvBN
-
-* 
-  SkipConnect
-
-    Call ``torch.nn.Identity`` to connect directly to the next cell.
-
-ENASMacroLayer
---------------
-
-In Macro search, the controller makes two decisions for each layer: i) the `operation <#macro-operations>`__ to perform on the result of the previous layer, ii) which the previous layer to connect to for SkipConnects. ENAS uses a controller to design the whole model architecture instead of one of its components. The output of operations is going to concat with the tensor of the chosen layer for SkipConnect. NNI provides `predefined operators <#macro-operations>`__ for macro search, which are listed in `Candidate operators <#macro-operations>`__.
-
-Part of one structure in the ENAS macro search space is shown below.
-
-
-.. image:: ../../img/NAS_ENAS_macro.svg
-   :target: ../../img/NAS_ENAS_macro.svg
-   :alt: 
-
-
-..  autoclass:: nni.nas.pytorch.search_space_zoo.ENASMacroLayer
-    :members:
-
-To describe the whole search space, NNI provides a model, which is built by stacking the layers.
-
-..  autoclass:: nni.nas.pytorch.search_space_zoo.ENASMacroGeneralModel
-    :members:
-
-Example code
-^^^^^^^^^^^^
-
-:githublink:`example code <examples/nas/search_space_zoo/enas_macro_example.py>`
-
-.. code-block:: bash
-
-   git clone https://github.com/Microsoft/nni.git
-   cd nni/examples/nas/search_space_zoo
-   # search the best cell structure
-   python3 enas_macro_example.py
-
-:raw-html:`<a name="macro-operations"></a>`
-
-Candidate operators
-^^^^^^^^^^^^^^^^^^^
-
-All supported operators for ENAS macro search are listed below.
-
-
-* 
-  ConvBranch
-
-    All input first passes into a StdConv, which is made up of a 1x1Conv followed by BatchNorm2d and ReLU. Then the intermediate result goes through one of the operations listed below. The final result is calculated through a BatchNorm2d and ReLU as post-procedure.
-
-
-  * Separable Conv3x3: If ``separable=True``\ , the cell will use `SepConv <#DilConv>`__ instead of normal Conv operation. SepConv's ``kernel_size=3``\ , ``stride=1`` and ``padding=1``.
-  * Separable Conv5x5: SepConv's ``kernel_size=5``\ , ``stride=1`` and ``padding=2``.
-  * Normal Conv3x3: If ``separable=False``\ , the cell will use a normal Conv operations with ``kernel_size=3``\ , ``stride=1`` and ``padding=1``.
-  * 
-    Normal Conv5x5: Conv's ``kernel_size=5``\ , ``stride=1`` and ``padding=2``.
-
-..  autoclass:: nni.nas.pytorch.search_space_zoo.enas_ops.ConvBranch
-
-* 
-  PoolBranch
-
-    All input first passes into a StdConv, which is made up of a 1x1Conv followed by BatchNorm2d and ReLU. Then the intermediate goes through pooling operation followed by BatchNorm.
-
-
-  * AvgPool: Call ``torch.nn.AvgPool2d``. This operation applies a 2D average pooling over all input channels. Its parameters are fixed to ``kernel_size=3``\ , ``stride=1`` and ``padding=1``.
-  * 
-    MaxPool: Call ``torch.nn.MaxPool2d``. This operation applies a 2D max pooling over all input channels. Its parameters are fixed to ``kernel_size=3``\ , ``stride=1`` and ``padding=1``.
-
-..  autoclass:: nni.nas.pytorch.search_space_zoo.enas_ops.PoolBranch
-
-NAS-Bench-201
--------------
-
-NAS Bench 201 defines a unified search space, which is algorithm agnostic. The predefined skeleton consists of a stack of cells that share the same architecture. Every cell contains four nodes and a DAG is formed by connecting edges among them, where the node represents the sum of feature maps and the edge stands for an operation transforming a tensor from the source node to the target node. The predefined candidate operators can be found in `Candidate operators <#nas-bench-201-reference>`__.
-
-The search space of NAS Bench 201 is shown below.
-
-
-.. image:: ../../img/NAS_Bench_201.svg
-   :target: ../../img/NAS_Bench_201.svg
-   :alt: 
-
-
-..  autoclass:: nni.nas.pytorch.nasbench201.NASBench201Cell
-    :members:
-
-Example code
-^^^^^^^^^^^^
-
-:githublink:`example code <examples/nas/search_space_zoo/nas_bench_201.py>`
-
-.. code-block:: bash
-
-   # for structure searching
-   git clone https://github.com/Microsoft/nni.git
-   cd nni/examples/nas/search_space_zoo
-   python3 nas_bench_201.py
-
-:raw-html:`<a name="nas-bench-201-reference"></a>`
-
-Candidate operators
-^^^^^^^^^^^^^^^^^^^
-
-All supported operators for NAS Bench 201 are listed below.
-
-
-* 
-  AvgPool
-
-  If the number of input channels is not equal to the number of output channels, the input will first pass through a ``ReLUConvBN`` layer with ``kernel_size=1``\ , ``stride=1``\ , ``padding=0``\ , and ``dilation=0``.
-  Call ``torch.nn.AvgPool2d``. This operation applies a 2D average pooling over all input channels followed by BatchNorm2d. Its parameters are fixed to ``kernel_size=3`` and ``padding=1``.
-
-..  autoclass:: nni.nas.pytorch.nasbench201.nasbench201_ops.Pooling
-    :members:
-
-* 
-  Conv
-
-
-  * Conv1x1: Consist of a sequence of ReLU, ``nn.Cinv2d`` and BatchNorm. The Conv operation's parameter is fixed to ``kernal_size=1``\ , ``padding=0``\ , and ``dilation=1``.
-  * Conv3x3: Consist of a sequence of ReLU, ``nn.Cinv2d`` and BatchNorm. The Conv operation's parameter is fixed to ``kernal_size=3``\ , ``padding=1``\ , and ``dilation=1``.
-
-..  autoclass:: nni.nas.pytorch.nasbench201.nasbench201_ops.ReLUConvBN
-    :members:
-
-* 
-  SkipConnect
-
-  Call ``torch.nn.Identity`` to connect directly to the next cell.
-
-* 
-  Zeroize
-
-  Generate zero tensors indicating there is no connection from the source node to the target node.
-
-..  autoclass:: nni.nas.pytorch.nasbench201.nasbench201_ops.Zero
-    :members:
diff --git a/docs/en_US/NAS/TextNAS.rst b/docs/en_US/NAS/TextNAS.rst
deleted file mode 100644
index 341ddbbf62..0000000000
--- a/docs/en_US/NAS/TextNAS.rst
+++ /dev/null
@@ -1,94 +0,0 @@
-TextNAS
-=======
-
-Introduction
-------------
-
-This is the implementation of the TextNAS algorithm proposed in the paper `TextNAS: A Neural Architecture Search Space tailored for Text Representation <https://arxiv.org/pdf/1912.10729.pdf>`__. TextNAS is a neural architecture search algorithm tailored for text representation, more specifically, TextNAS is based on a novel search space consists of operators widely adopted to solve various NLP tasks, and TextNAS also supports multi-path ensemble within a single network to balance the width and depth of the architecture. 
-
-The search space of TextNAS contains: 
-
-.. code-block:: bash
-
-   * 1-D convolutional operator with filter size 1, 3, 5, 7 
-   * recurrent operator (bi-directional GRU) 
-   * self-attention operator
-   * pooling operator (max/average)
-
-
-Following the ENAS algorithm, TextNAS also utilizes parameter sharing to accelerate the search speed and adopts a reinforcement-learning controller for the architecture sampling and generation. Please refer to the paper for more details of TextNAS.
-
-Preparation
------------
-
-Prepare the word vectors and SST dataset, and organize them in data directory as shown below:
-
-.. code-block:: bash
-
-   textnas
-   ├── data
-   │   ├── sst
-   │   │   └── trees
-   │   │       ├── dev.txt
-   │   │       ├── test.txt
-   │   │       └── train.txt
-   │   └── glove.840B.300d.txt
-   ├── dataloader.py
-   ├── model.py
-   ├── ops.py
-   ├── README.md
-   ├── search.py
-   └── utils.py
-
-The following link might be helpful for finding and downloading the corresponding dataset:
-
-
-* `GloVe: Global Vectors for Word Representation <https://nlp.stanford.edu/projects/glove/>`__
-
-  * `glove.840B.300d.txt <http://nlp.stanford.edu/data/glove.840B.300d.zip>`__
-
-* `Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank <https://nlp.stanford.edu/sentiment/>`__
-
-  * `trainDevTestTrees_PTB.zip <https://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip>`__
-
-Examples
---------
-
-Search Space
-^^^^^^^^^^^^
-
-:githublink:`Example code <examples/nas/legacy/textnas>`
-
-.. code-block:: bash
-
-   # In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
-   git clone https://github.com/Microsoft/nni.git
-
-   # search the best architecture
-   cd examples/nas/legacy/textnas
-
-   # view more options for search
-   python3 search.py -h
-
-After each search epoch, 10 sampled architectures will be tested directly. Their performances are expected to be 40% - 42% after 10 epochs.
-
-By default, 20 sampled architectures will be exported into ``checkpoints`` directory for next step.
-
-retrain
-^^^^^^^
-
-.. code-block:: bash
-
-   # In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
-   git clone https://github.com/Microsoft/nni.git
-
-   # search the best architecture
-   cd examples/nas/legacy/textnas
-
-   # default to retrain on sst-2
-   sh run_retrain.sh
-
-Reference
----------
-
-TextNAS directly uses EnasTrainer, please refer to `ENAS <./ENAS.rst>`__ for the trainer APIs.
diff --git a/docs/en_US/NAS/Visualization.rst b/docs/en_US/NAS/Visualization.rst
deleted file mode 100644
index 9588b2cc41..0000000000
--- a/docs/en_US/NAS/Visualization.rst
+++ /dev/null
@@ -1,86 +0,0 @@
-NAS Visualization (Experimental)
-================================
-
-Built-in Trainers Support
--------------------------
-
-Currently, only ENAS and DARTS support visualization. Examples of `ENAS <./ENAS.rst>`__ and `DARTS <./DARTS.rst>`__ has demonstrated how to enable visualization in your code, namely, adding this before ``trainer.train()``\ :
-
-.. code-block:: python
-
-   trainer.enable_visualization()
-
-This will create a directory ``logs/<current_time_stamp>`` in your working folder, in which you will find two files ``graph.json`` and ``log``.
-
-You don't have to wait until your program finishes to launch NAS UI, but it's important that these two files have been already created. Launch NAS UI with
-
-.. code-block:: bash
-
-   nnictl webui nas --logdir logs/<current_time_stamp> --port <port>
-
-Visualize a Customized Trainer
-------------------------------
-
-If you are interested in how to customize a trainer, please read this `doc <./Advanced.rst#extend-the-ability-of-one-shot-trainers>`__.
-
-You should do two modifications to an existing trainer to enable visualization:
-
-
-#. Export your graph before training, with
-
-.. code-block:: python
-
-   vis_graph = self.mutator.graph(inputs)
-   # `inputs` is a dummy input to your model. For example, torch.randn((1, 3, 32, 32)).cuda()
-   # If your model has multiple inputs, it should be a tuple.
-   with open("/path/to/your/logdir/graph.json", "w") as f:
-       json.dump(vis_graph, f)
-
-
-#. Logging the choices you've made. You can do it once per epoch, once per mini-batch or whatever frequency you'd like.
-
-.. code-block:: python
-
-   def __init__(self):
-       # ...
-       self.status_writer = open("/path/to/your/logdir/log", "w")  # create a writer
-
-   def train(self):
-       # ...
-       print(json.dumps(self.mutator.status()), file=self.status_writer, flush=True)  # dump a record of status
-
-If you are implementing your customized trainer inheriting ``Trainer``. We have provided ``enable_visualization()`` and ``_write_graph_status()`` for easy-to-use purposes. All you need to do is calling ``trainer.enable_visualization()`` before start, and ``trainer._write_graph_status()`` each time you want to do the logging. But remember both of these APIs are experimental and subject to change in future.
-
-Last but not least, invode NAS UI with
-
-.. code-block:: bash
-
-   nnictl webui nas --logdir /path/to/your/logdir
-
-NAS UI Preview
---------------
-
-
-.. image:: ../../img/nasui-1.png
-   :target: ../../img/nasui-1.png
-   :alt: 
-
-
-
-.. image:: ../../img/nasui-2.png
-   :target: ../../img/nasui-2.png
-   :alt: 
-
-
-Limitations
------------
-
-
-* NAS visualization only works with PyTorch >=1.4. We've tested it on PyTorch 1.3.1 and it doesn't work.
-* We rely on PyTorch support for tensorboard for graph export, which relies on ``torch.jit``. It will not work if your model doesn't support ``jit``.
-* There are known performance issues when loading a moderate-size graph with many op choices (like DARTS search space).
-
-Feedback
---------
-
-NAS UI is currently experimental. We welcome your feedback. `Here <https://github.com/microsoft/nni/pull/2085>`__ we have listed all the to-do items of NAS UI in the future. Feel free to comment (or `submit a new issue <https://github.com/microsoft/nni/issues/new?template=enhancement.rst>`__\ ) if you have other suggestions.
diff --git a/docs/en_US/NAS/retiarii/OneshotTrainer.rst b/docs/en_US/NAS/WriteOneshot.rst
similarity index 61%
rename from docs/en_US/NAS/retiarii/OneshotTrainer.rst
rename to docs/en_US/NAS/WriteOneshot.rst
index ab6ef33d82..c190099f3e 100644
--- a/docs/en_US/NAS/retiarii/OneshotTrainer.rst
+++ b/docs/en_US/NAS/WriteOneshot.rst
@@ -1,45 +1,5 @@
-One-shot Experiments on Retiarii
-================================
-
-Before reading this tutorial, we highly recommend you to first go through the tutorial of how to `define a model space <./Tutorial.rst#define-your-model-space>`__.
-
-Model Search with One-shot Trainer
-----------------------------------
-
-With a defined model space, users can explore the space in two ways. One is using strategy and single-arch evaluator as demonstrated `here <./Tutorial.rst#explore-the-defined-model-space>`__. The other is using one-shot trainer, which consumes much less computational resource compared to the first one. In this tutorial we focus on this one-shot approach. The principle of one-shot approach is combining all the models in a model space into one big model (usually called super-model or super-graph). It takes charge of both search, training and testing, by training and evaluating this big model.
-
-We list the supported one-shot trainers here:
-
-* DARTS trainer
-* ENAS trainer
-* ProxylessNAS trainer
-* Single-path (random) trainer
-
-See `API reference <./ApiReference.rst>`__ for detailed usages. Here, we show an example to use DARTS trainer manually.
-
-.. code-block:: python
-
-  from nni.retiarii.oneshot.pytorch import DartsTrainer
-  trainer = DartsTrainer(
-      model=model,
-      loss=criterion,
-      metrics=lambda output, target: accuracy(output, target, topk=(1,)),
-      optimizer=optim,
-      num_epochs=args.epochs,
-      dataset=dataset_train,
-      batch_size=args.batch_size,
-      log_frequency=args.log_frequency,
-      unrolled=args.unrolled
-  )
-  trainer.fit()
-  final_architecture = trainer.export()
-
-**Format of the exported architecture.** TBD.
-
-One-shot experiment can be visualized with NAS UI, please refer to `here <../Visualization.rst>`__ for the usage guidance. Note that NAS visualization is under intensive development.
-
 Customize a New One-shot Trainer
---------------------------------
+================================
 
 One-shot trainers should inherit ``nni.retiarii.oneshot.BaseOneShotTrainer``, and need to implement ``fit()`` (used to conduct the fitting and searching process) and ``export()`` method (used to return the searched best architecture).
 
diff --git a/docs/en_US/NAS/WriteSearchSpace.rst b/docs/en_US/NAS/WriteSearchSpace.rst
deleted file mode 100644
index 6d47d6d803..0000000000
--- a/docs/en_US/NAS/WriteSearchSpace.rst
+++ /dev/null
@@ -1,67 +0,0 @@
-Write A Search Space
-====================
-
-Genrally, a search space describes candiate architectures from which users want to find the best one. Different search algorithms, no matter classic NAS or one-shot NAS, can be applied on the search space. NNI provides APIs to unified the expression of neural architecture search space.
-
-A search space can be built on a base model. This is also a common practice when a user wants to apply NAS on an existing model. Take `MNIST on PyTorch <https://github.com/pytorch/examples/blob/master/mnist/main.py>`__ as an example. Note that NNI provides the same APIs for expressing search space on PyTorch and TensorFlow.
-
-.. code-block:: python
-
-   from nni.nas.pytorch import mutables
-
-   class Net(nn.Module):
-       def __init__(self):
-           super(Net, self).__init__()
-           self.conv1 = mutables.LayerChoice([
-               nn.Conv2d(1, 32, 3, 1),
-               nn.Conv2d(1, 32, 5, 3)
-           ])  # try 3x3 kernel and 5x5 kernel
-           self.conv2 = nn.Conv2d(32, 64, 3, 1)
-           self.dropout1 = nn.Dropout2d(0.25)
-           self.dropout2 = nn.Dropout2d(0.5)
-           self.fc1 = nn.Linear(9216, 128)
-           self.fc2 = nn.Linear(128, 10)
-
-       def forward(self, x):
-           x = self.conv1(x)
-           x = F.relu(x)
-           # ... same as original ...
-           return output
-
-The example above adds an option of choosing conv5x5 at conv1. The modification is as simple as declaring a ``LayerChoice`` with the original conv3x3 and a new conv5x5 as its parameter. That's it! You don't have to modify the forward function in any way. You can imagine conv1 as any other module without NAS.
-
-So how about the possibilities of connections? This can be done using ``InputChoice``. To allow for a skip connection on the MNIST example, we add another layer called conv3. In the following example, a possible connection from conv2 is added to the output of conv3.
-
-.. code-block:: python
-
-   from nni.nas.pytorch import mutables
-
-   class Net(nn.Module):
-       def __init__(self):
-           # ... same ...
-           self.conv2 = nn.Conv2d(32, 64, 3, 1)
-           self.conv3 = nn.Conv2d(64, 64, 1, 1)
-           # declaring that there is exactly one candidate to choose from
-           # search strategy will choose one or None
-           self.skipcon = mutables.InputChoice(n_candidates=1)
-           # ... same ...
-
-       def forward(self, x):
-           x = self.conv1(x)
-           x = F.relu(x)
-           x = self.conv2(x)
-           x0 = self.skipcon([x])  # choose one or none from [x]
-           x = self.conv3(x)
-           if x0 is not None:  # skipconnection is open
-               x += x0
-           x = F.max_pool2d(x, 2)
-           # ... same ...
-           return output
-
-Input choice can be thought of as a callable module that receives a list of tensors and outputs the concatenation/sum/mean of some of them (sum by default), or ``None`` if none is selected. Like layer choices, input choices should be initialized in ``__init__`` and called in ``forward``. This is to allow search algorithms to identify these choices and do necessary preparations.
-
-``LayerChoice`` and ``InputChoice`` are both **mutables**. Mutable means "changeable". As opposed to traditional deep learning layers/modules which have fixed operation types once defined, models with mutable are essentially a series of possible models.
-
-Users can specify a **key** for each mutable. By default, NNI will assign one for you that is globally unique, but in case users want to share choices (for example, there are two ``LayerChoice``\ s with the same candidate operations and you want them to have the same choice, i.e., if first one chooses the i-th op, the second one also chooses the i-th op), they can give them the same key. The key marks the identity for this choice and will be used in the dumped checkpoint. So if you want to increase the readability of your exported architecture, manually assigning keys to each mutable would be a good idea. For advanced usage on mutables (e.g., ``LayerChoice`` and ``InputChoice``\ ), see `Mutables <./NasReference.rst>`__.
-
-With search space defined, the next step is searching for the best model from it. Please refer to `classic NAS algorithms <./ClassicNas.rst>`__ and `one-shot NAS algorithms <./NasGuide.rst>`__ for how to search from your defined search space.
diff --git a/docs/en_US/NAS/retiarii/WriteStrategy.rst b/docs/en_US/NAS/WriteStrategy.rst
similarity index 69%
rename from docs/en_US/NAS/retiarii/WriteStrategy.rst
rename to docs/en_US/NAS/WriteStrategy.rst
index 6f354b3b85..e324385f67 100644
--- a/docs/en_US/NAS/retiarii/WriteStrategy.rst
+++ b/docs/en_US/NAS/WriteStrategy.rst
@@ -1,7 +1,7 @@
-Customize A New Strategy
-========================
+Customize Exploration Strategy
+==============================
 
-To write a new strategy, you should inherit the base strategy class ``BaseStrategy``, then implement the member function ``run``. This member function takes ``base_model`` and ``applied_mutators`` as its input arguments. It can simply apply the user specified mutators in ``applied_mutators`` onto ``base_model`` to generate a new model. When a mutator is applied, it should be bound with a sampler (e.g., ``RandomSampler``). Every sampler implements the ``choice`` function which chooses value(s) from candidate values. The ``choice`` functions invoked in mutators are executed with the sampler.
+If users want to innovate a new exploration strategy, they can easily customize a new one following the interface provided by NNI. Specifically, users should inherit the base strategy class ``BaseStrategy``, then implement the member function ``run``. This member function takes ``base_model`` and ``applied_mutators`` as its input arguments. It can simply apply the user specified mutators in ``applied_mutators`` onto ``base_model`` to generate a new model. When a mutator is applied, it should be bound with a sampler (e.g., ``RandomSampler``). Every sampler implements the ``choice`` function which chooses value(s) from candidate values. The ``choice`` functions invoked in mutators are executed with the sampler.
 
 Below is a very simple random strategy, which makes the choices completely random.
 
diff --git a/docs/en_US/NAS/retiarii/WriteTrainer.rst b/docs/en_US/NAS/WriteTrainer.rst
similarity index 100%
rename from docs/en_US/NAS/retiarii/WriteTrainer.rst
rename to docs/en_US/NAS/WriteTrainer.rst
diff --git a/docs/en_US/NAS/construct_space.rst b/docs/en_US/NAS/construct_space.rst
new file mode 100644
index 0000000000..b32489d4a7
--- /dev/null
+++ b/docs/en_US/NAS/construct_space.rst
@@ -0,0 +1,11 @@
+#####################
+Construct Model Space
+#####################
+
+NNI provides powerful APIs for users to easily express model space (or search space). First, users can use mutation primitives (e.g., ValueChoice, LayerChoice) to inline a space in their model. Second, NNI provides simple interface for users to customize new mutators for expressing more complicated model spaces. In most cases, the mutation primitives are enough to express users' model spaces.
+
+..  toctree::
+    :maxdepth: 1
+
+    Mutation Primitives <MutationPrimitives>
+    Customize Mutators <Mutators>
\ No newline at end of file
diff --git a/docs/en_US/NAS/multi_trial_nas.rst b/docs/en_US/NAS/multi_trial_nas.rst
new file mode 100644
index 0000000000..9e77216fb8
--- /dev/null
+++ b/docs/en_US/NAS/multi_trial_nas.rst
@@ -0,0 +1,13 @@
+Multi-trial NAS
+===============
+
+In multi-trial NAS, users need model evaluator to evaluate the performance of each sampled model, and need an exploration strategy to sample models from a defined model space. Here, users could use NNI provided model evaluators or write their own model evalutor. They can simply choose a exploration strategy. Advanced users can also customize new exploration strategy. For a simple example about how to run a multi-trial NAS experiment, please refer to `Quick Start <./QuickStart.rst>`__.
+
+..  toctree::
+    :maxdepth: 1
+
+    Model Evaluators <ModelEvaluators>
+    Customize Model Evaluator <WriteTrainer>
+    Exploration Strategies <ExplorationStrategies>
+    Customize Exploration Strategies <WriteStrategy>
+    Execution Engines <ExecutionEngines>
diff --git a/docs/en_US/NAS/one_shot_nas.rst b/docs/en_US/NAS/one_shot_nas.rst
index c0efeb282c..e2ea4c8619 100644
--- a/docs/en_US/NAS/one_shot_nas.rst
+++ b/docs/en_US/NAS/one_shot_nas.rst
@@ -1,19 +1,16 @@
-One-shot NAS Algorithms
-=======================
+One-shot NAS
+============
 
-One-shot NAS algorithms leverage weight sharing among models in neural architecture search space to train a supernet, and use this supernet to guide the selection of better models. This type of algorihtms greatly reduces computational resource compared to independently training each model from scratch (which we call "Classic NAS"). NNI has supported many popular One-shot NAS algorithms as following.
+One-shot NAS algorithms leverage weight sharing among models in neural architecture search space to train a supernet, and use this supernet to guide the selection of better models. This type of algorihtms greatly reduces computational resource compared to independently training each model from scratch (which we call "Multi-trial NAS"). NNI has supported many popular One-shot NAS algorithms as following.
 
 
 ..  toctree::
     :maxdepth: 1
 
-    Quick Start <NasGuide>
+    Run One-shot NAS <OneshotTrainer>
     ENAS <ENAS>
     DARTS <DARTS>
-    P-DARTS <PDARTS>
     SPOS <SPOS>
-    CDARTS <CDARTS>
     ProxylessNAS <Proxylessnas>
     FBNet <FBNet>
-    TextNAS <TextNAS>
-    Cream <Cream>
+    Customize one-shot NAS <WriteOneshot>
diff --git a/docs/en_US/NAS/retiarii/Advanced.rst b/docs/en_US/NAS/retiarii/Advanced.rst
deleted file mode 100644
index 85ac32514b..0000000000
--- a/docs/en_US/NAS/retiarii/Advanced.rst
+++ /dev/null
@@ -1,116 +0,0 @@
-Advanced Tutorial
-=================
-
-Pure-python execution engine (experimental)
--------------------------------------------
-
-If you are experiencing issues with TorchScript, or the generated model code by Retiarii, there is another execution engine called Pure-python execution engine which doesn't need the code-graph conversion. This should generally not affect models and strategies in most cases, but customized mutation might not be supported.
-
-This will come as the default execution engine in future version of Retiarii.
-
-Three steps are needed to enable this engine now.
-
-1. Add ``@nni.retiarii.model_wrapper`` decorator outside the whole PyTorch model.
-2. Add ``config.execution_engine = 'py'`` to ``RetiariiExeConfig``.
-3. If you need to export top models, formatter needs to be set to ``dict``. Exporting ``code`` won't work with this engine.
-
-.. note:: You should always use ``super().__init__()` instead of ``super(MyNetwork, self).__init__()`` in the PyTorch model, because the latter one has issues with model wrapper.
-
-``@basic_unit`` and ``serializer``
-----------------------------------
-
-.. _serializer:
-
-``@basic_unit`` and ``serialize`` can be viewed as some kind of serializer. They are designed for making the whole model (including training) serializable to be executed on another process or machine.
-
-**@basic_unit** annotates that a module is a basic unit, i.e, no need to understand the details of this module. The effect is that it prevents Retiarii to parse this module. To understand this, we first briefly explain how Retiarii works: it converts user-defined model to a graph representation (called graph IR) using `TorchScript <https://pytorch.org/docs/stable/jit.html>`__, each instantiated module in the model is converted to a subgraph. Then mutations are applied to the graph to generate new graphs. Each new graph is then converted back to PyTorch code and executed. ``@basic_unit`` here means the module will not be converted to a subgraph, instead, it is converted to a single graph node as a basic unit. That is, the module will not be unfolded anymore. When the module is not unfolded, mutations on initialization parameters of this module becomes easier.
-
-``@basic_unit`` is usually used in the following cases:
-
-* When users want to tune initialization parameters of a module using ``ValueChoice``, then decorate the module with ``@basic_unit``. For example, ``self.conv = MyConv(kernel_size=nn.ValueChoice([1, 3, 5]))``, here ``MyConv`` should be decorated.
-
-* When a module cannot be successfully parsed to a subgraph, decorate the module with ``@basic_unit``. The parse failure could be due to complex control flow. Currently Retiarii does not support adhoc loop, if there is adhoc loop in a module's forward, this class should be decorated as serializable module. For example, the following ``MyModule`` should be decorated.
-
-  .. code-block:: python
-
-    @basic_unit
-    class MyModule(nn.Module):
-      def __init__(self):
-        ...
-      def forward(self, x):
-        for i in range(10): # <- adhoc loop
-          ...
-
-* Some inline mutation APIs require their handled module to be decorated with ``@basic_unit``. For example, user-defined module that is provided to ``LayerChoice`` as a candidate op should be decorated.
-
-**serialize** is mainly used for serializing model training logic. It enables re-instantiation of model evaluator in another process or machine. Re-instantiation is necessary because most of time model and evaluator should be sent to training services. ``serialize`` is implemented by recording the initialization parameters of user instantiated evaluator.
-
-The evaluator related APIs provided by Retiarii have already supported serialization, for example ``pl.Classification``, ``pl.DataLoader``, no need to apply ``serialize`` on them. In the following case users should use ``serialize`` API manually.
-
-If the initialization parameters of the evaluator APIs (e.g., ``pl.Classification``, ``pl.DataLoader``) are not primitive types (e.g., ``int``, ``string``), they should be applied with  ``serialize``. If those parameters' initialization parameters are not primitive types, ``serialize`` should also be applied. In a word, ``serialize`` should be applied recursively if necessary.
-
-
-Express Mutations with Mutators
--------------------------------
-
-Besides inline mutations which have been demonstrated `here <./Tutorial.rst>`__, Retiarii provides a more general approach to express a model space: *Mutator*. Inline mutations APIs are also implemented with mutator, which can be seen as a special case of model mutation.
-
-.. note:: Mutator and inline mutation APIs cannot be used together.
-
-A mutator is a piece of logic to express how to mutate a given model. Users are free to write their own mutators. Then a model space is expressed with a base model and a list of mutators. A model in the model space is sampled by applying the mutators on the base model one after another. An example is shown below.
-
-.. code-block:: python
-
-  applied_mutators = []
-  applied_mutators.append(BlockMutator('mutable_0'))
-  applied_mutators.append(BlockMutator('mutable_1'))
-
-``BlockMutator`` is defined by users to express how to mutate the base model. 
-
-Write a mutator
-^^^^^^^^^^^^^^^
-
-User-defined mutator should inherit ``Mutator`` class, and implement mutation logic in the member function ``mutate``.
-
-.. code-block:: python
-
-  from nni.retiarii import Mutator
-  class BlockMutator(Mutator):
-    def __init__(self, target: str, candidates: List):
-        super(BlockMutator, self).__init__()
-        self.target = target
-        self.candidate_op_list = candidates
-
-    def mutate(self, model):
-      nodes = model.get_nodes_by_label(self.target)
-      for node in nodes:
-        chosen_op = self.choice(self.candidate_op_list)
-        node.update_operation(chosen_op.type, chosen_op.params)
-
-The input of ``mutate`` is graph IR (Intermediate Representation) of the base model (please refer to `here <./ApiReference.rst>`__ for the format and APIs of the IR), users can mutate the graph using the graph's member functions (e.g., ``get_nodes_by_label``, ``update_operation``). The mutation operations can be combined with the API ``self.choice``, in order to express a set of possible mutations. In the above example, the node's operation can be changed to any operation from ``candidate_op_list``.
-
-Use placehoder to make mutation easier: ``nn.Placeholder``. If you want to mutate a subgraph or node of your model, you can define a placeholder in this model to represent the subgraph or node. Then, use mutator to mutate this placeholder to make it real modules.
-
-.. code-block:: python
-
-  ph = nn.Placeholder(
-    label='mutable_0',
-    kernel_size_options=[1, 3, 5],
-    n_layer_options=[1, 2, 3, 4],
-    exp_ratio=exp_ratio,
-    stride=stride
-  )
-
-``label`` is used by mutator to identify this placeholder. The other parameters are the information that are required by mutator. They can be accessed from ``node.operation.parameters`` as a dict, it could include any information that users want to put to pass it to user defined mutator. The complete example code can be found in :githublink:`Mnasnet base model <examples/nas/multi-trial/mnasnet/base_mnasnet.py>`.
-
-Starting an experiment is almost the same as using inline mutation APIs. The only difference is that the applied mutators should be passed to ``RetiariiExperiment``. Below is a simple example.
-
-.. code-block:: python
-
-  exp = RetiariiExperiment(base_model, trainer, applied_mutators, simple_strategy)
-  exp_config = RetiariiExeConfig('local')
-  exp_config.experiment_name = 'mnasnet_search'
-  exp_config.trial_concurrency = 2
-  exp_config.max_trial_number = 10
-  exp_config.training_service.use_active_gpu = False
-  exp.run(exp_config, 8081)
diff --git a/docs/en_US/NAS/retiarii/retiarii_index.rst b/docs/en_US/NAS/retiarii/retiarii_index.rst
deleted file mode 100644
index 73ea9c9aa0..0000000000
--- a/docs/en_US/NAS/retiarii/retiarii_index.rst
+++ /dev/null
@@ -1,21 +0,0 @@
-#################
-Retiarii Overview
-#################
-
-`Retiarii <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__ is a deep learning framework that supports the exploratory training on a neural network model space, rather than on a single neural network model. 
-
-Exploratory training with Retiarii allows user to express various search space for **Neural Architecture Search** and **Hyper-Parameter Tuning** with high flexibility. 
-
-As previous NAS and HPO supports, the new framework continued the ability for allowing user to reuse SOTA search algorithms, and to leverage system level optimizations to speed up the search process. 
-
-Follow the instructions below to start your journey with Retiarii.
-
-..  toctree::
-    :maxdepth: 2
-
-    Quick Start <Tutorial>
-    Write a Model Evaluator <WriteTrainer>
-    One-shot NAS <OneshotTrainer>
-    Advanced Tutorial <Advanced>
-    Customize a New Strategy <WriteStrategy>
-    Retiarii APIs <ApiReference>
diff --git a/docs/en_US/Tuner/BuiltinTuner.rst b/docs/en_US/Tuner/BuiltinTuner.rst
index 0925bae104..41cf13fa85 100644
--- a/docs/en_US/Tuner/BuiltinTuner.rst
+++ b/docs/en_US/Tuner/BuiltinTuner.rst
@@ -41,8 +41,6 @@ Currently, we support the following algorithms:
      - BOHB is a follow-up work to Hyperband. It targets the weakness of Hyperband that new configurations are generated randomly without leveraging finished trials. For the name BOHB, HB means Hyperband, BO means Bayesian Optimization. BOHB leverages finished trials by building multiple TPE models, a proportion of new configurations are generated through these models. `Reference Paper <https://arxiv.org/abs/1807.01774>`__
    * - `GP Tuner <#GPTuner>`__
      - Gaussian Process Tuner is a sequential model-based optimization (SMBO) approach with Gaussian Process as the surrogate. `Reference Paper <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__\ , `Github Repo <https://github.com/fmfn/BayesianOptimization>`__
-   * - `PPO Tuner <#PPOTuner>`__
-     - PPO Tuner is a Reinforcement Learning tuner based on PPO algorithm. `Reference Paper <https://arxiv.org/abs/1707.06347>`__
    * - `PBT Tuner <#PBTTuner>`__
      - PBT Tuner is a simple asynchronous optimization algorithm which effectively utilizes a fixed computational budget to jointly optimize a population of models and their hyperparameters to maximize performance. `Reference Paper <https://arxiv.org/abs/1711.09846v1>`__
    * - `DNGO Tuner <#DNGOTuner>`__
diff --git a/docs/en_US/Tuner/PPOTuner.rst b/docs/en_US/Tuner/PPOTuner.rst
deleted file mode 100644
index ce769e1d3b..0000000000
--- a/docs/en_US/Tuner/PPOTuner.rst
+++ /dev/null
@@ -1,34 +0,0 @@
-PPO Tuner on NNI
-================
-
-PPOTuner
---------
-
-This is a tuner geared for NNI's Neural Architecture Search (NAS) interface. It uses the `ppo algorithm <https://arxiv.org/abs/1707.06347>`__. The implementation inherits the main logic of the ppo2 OpenAI implementation `here <https://github.com/openai/baselines/tree/master/baselines/ppo2>`__ and is adapted for the NAS scenario.
-
-We had successfully tuned the mnist-nas example and has the following result:
-
-.. Note:: we are refactoring this example to the latest NAS interface, will publish the example codes after the refactor.
-
-.. image:: ../../img/ppo_mnist.png
-   :target: ../../img/ppo_mnist.png
-   :alt: 
-
-
-We also tune :githublink:`the macro search space for image classification in the enas paper <examples/nas/legacy/classic_nas>` (with a limited epoch number for each trial, i.e., 8 epochs), which is implemented using the NAS interface and tuned with PPOTuner. Here is Figure 7 from the `enas paper <https://arxiv.org/pdf/1802.03268.pdf>`__ to show what the search space looks like
-
-
-.. image:: ../../img/enas_search_space.png
-   :target: ../../img/enas_search_space.png
-   :alt: 
-
-
-The figure above was the chosen architecture. Each square is a layer whose operation was chosen from 6 options. Each dashed line is a skip connection, each square layer can choose 0 or 1 skip connections, getting the output from a previous layer. **Note that**\ , in original macro search space, each square layer could choose any number of skip connections, while in our implementation, it is only allowed to choose 0 or 1.
-
-The results are shown in figure below (see the experimenal config :githublink:`here <examples/nas/legacy/classic_nas/config_ppo.yml>`\ :
-
-
-.. image:: ../../img/ppo_cifar10.png
-   :target: ../../img/ppo_cifar10.png
-   :alt: 
-
diff --git a/docs/en_US/builtin_tuner.rst b/docs/en_US/builtin_tuner.rst
index 1e6fd36a73..6282beca58 100644
--- a/docs/en_US/builtin_tuner.rst
+++ b/docs/en_US/builtin_tuner.rst
@@ -20,5 +20,4 @@ Tuner receives metrics from `Trial` to evaluate the performance of a specific pa
     Network Morphism <Tuner/NetworkmorphismTuner>
     Hyperband <Tuner/HyperbandAdvisor>
     BOHB <Tuner/BohbAdvisor>
-    PPO Tuner <Tuner/PPOTuner>
     PBT Tuner <Tuner/PBTTuner>
diff --git a/docs/en_US/nas.rst b/docs/en_US/nas.rst
index 24adf34ef2..ed8edfce56 100644
--- a/docs/en_US/nas.rst
+++ b/docs/en_US/nas.rst
@@ -1,6 +1,6 @@
-##########################
-Neural Architecture Search
-##########################
+#############################################
+Retiarii for Neural Architecture Search (NAS)
+#############################################
 
 Automatic neural architecture search is taking an increasingly important role on finding better models.
 Recent research works have proved the feasibility of automatic NAS, and also found some models that could beat manually tuned models.
@@ -10,20 +10,24 @@ However, it takes great efforts to implement NAS algorithms, and it is hard to r
 To facilitate NAS innovations (e.g., design and implement new NAS models, compare different NAS models side-by-side),
 an easy-to-use and flexible programming interface is crucial.
 
-Therefore, we provide a unified interface for NAS,
-to accelerate innovations on NAS, and apply state-of-art algorithms on real world problems faster.
-For details, please refer to the following tutorials:
+Thus, we design `Retiarii <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__. It is a deep learning framework that supports the exploratory training on a neural network model space, rather than on a single neural network model.
+Exploratory training with Retiarii allows user to express various search spaces for *Neural Architecture Search* and *Hyper-Parameter Tuning* with high flexibility.
+
+Some frequently used terminologies in this document:
+
+* *Model search space*: it means a set of models from which the best model is explored/searched. Sometimes we use *search space* or *model space* in short.
+* *Exploration strategy*: the algorithm that is used to explore a model search space.
+* *Model evaluator*: it is used to train a model and evaluate the model's performance.
+
+Follow the instructions below to start your journey with Retiarii.
 
 ..  toctree::
     :maxdepth: 2
 
     Overview <NAS/Overview>
-    Write A Search Space <NAS/WriteSearchSpace>
-    Classic NAS <NAS/ClassicNas>
+    Quick Start <NAS/QuickStart>
+    Construct Model Space <NAS/construct_space>
+    Multi-trial NAS <NAS/multi_trial_nas>
     One-shot NAS <NAS/one_shot_nas>
-    Retiarii NAS (Alpha) <NAS/retiarii/retiarii_index>
-    Customize a NAS Algorithm <NAS/Advanced>
-    NAS Visualization <NAS/Visualization>
-    Search Space Zoo <NAS/SearchSpaceZoo>
     NAS Benchmarks <NAS/Benchmarks>
-    API Reference <NAS/NasReference>
+    NAS API References <NAS/ApiReference>
diff --git a/docs/en_US/sdk_reference.rst b/docs/en_US/sdk_reference.rst
index c615f86ab8..1e13a1ff1f 100644
--- a/docs/en_US/sdk_reference.rst
+++ b/docs/en_US/sdk_reference.rst
@@ -7,5 +7,5 @@ Python API Reference
     :maxdepth: 1
 
     Auto Tune <autotune_ref>
-    NAS <NAS/NasReference>
+    NAS <NAS/ApiReference>
     Compression <Compression/CompressionReference>
\ No newline at end of file
diff --git a/examples/nas/multi-trial/mnasnet/search.py b/examples/nas/multi-trial/mnasnet/search.py
index f9f3074478..79c6cd3161 100644
--- a/examples/nas/multi-trial/mnasnet/search.py
+++ b/examples/nas/multi-trial/mnasnet/search.py
@@ -53,5 +53,6 @@
     exp_config.trial_concurrency = 2
     exp_config.max_trial_number = 10
     exp_config.training_service.use_active_gpu = False
+    exp_config.execution_engine = 'base'
 
     exp.run(exp_config, 8097)
diff --git a/examples/nas/multi-trial/mnist/search.py b/examples/nas/multi-trial/mnist/search.py
index 8b4c4b7e2e..b3c37b04e0 100644
--- a/examples/nas/multi-trial/mnist/search.py
+++ b/examples/nas/multi-trial/mnist/search.py
@@ -10,8 +10,8 @@
 from torchvision import transforms
 from torchvision.datasets import MNIST
 
-# uncomment this for python execution engine
-# @model_wrapper
+# comment the follwing line for graph-based execution engine
+@model_wrapper
 class Net(nn.Module):
     def __init__(self, hidden_size):
         super().__init__()
@@ -43,10 +43,6 @@ def forward(self, x):
                                 val_dataloaders=pl.DataLoader(test_dataset, batch_size=100),
                                 max_epochs=2)
 
-    # uncomment the following two lines to debug a generated model
-    #debug_mutated_model(base_model, trainer, [])
-    #exit(0)
-
     simple_strategy = strategy.Random()
 
     exp = RetiariiExperiment(base_model, trainer, [], simple_strategy)
@@ -56,11 +52,11 @@ def forward(self, x):
     exp_config.trial_concurrency = 2
     exp_config.max_trial_number = 2
     exp_config.training_service.use_active_gpu = False
-    export_formatter = 'code'
+    export_formatter = 'dict'
 
-    # uncomment this for python execution engine
-    # exp_config.execution_engine = 'py'
-    # export_formatter = 'dict'
+    # uncomment this for graph-based execution engine
+    # exp_config.execution_engine = 'base'
+    # export_formatter = 'code'
 
     exp.run(exp_config, 8081 + random.randint(0, 100))
     print('Final model:')
diff --git a/nni/retiarii/evaluator/pytorch/lightning.py b/nni/retiarii/evaluator/pytorch/lightning.py
index d316ce857a..4399844ac6 100644
--- a/nni/retiarii/evaluator/pytorch/lightning.py
+++ b/nni/retiarii/evaluator/pytorch/lightning.py
@@ -202,7 +202,7 @@ class Classification(Lightning):
         If the ``lightning_module`` has a predefined val_dataloaders method this will be skipped.
     trainer_kwargs : dict
         Optional keyword arguments passed to trainer. See
-        `Lightning documentation <https://pytorch-lightning.readthedocs.io/en/stable/trainer.html>`__ for details.
+        `Lightning documentation <https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html>`__ for details.
     """
 
     def __init__(self, criterion: nn.Module = nn.CrossEntropyLoss,
@@ -250,7 +250,7 @@ class Regression(Lightning):
         If the ``lightning_module`` has a predefined val_dataloaders method this will be skipped.
     trainer_kwargs : dict
         Optional keyword arguments passed to trainer. See
-        `Lightning documentation <https://pytorch-lightning.readthedocs.io/en/stable/trainer.html>`__ for details.
+        `Lightning documentation <https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html>`__ for details.
     """
 
     def __init__(self, criterion: nn.Module = nn.MSELoss,
diff --git a/nni/retiarii/experiment/pytorch.py b/nni/retiarii/experiment/pytorch.py
index 00fd9200ca..0780439cff 100644
--- a/nni/retiarii/experiment/pytorch.py
+++ b/nni/retiarii/experiment/pytorch.py
@@ -56,14 +56,14 @@ class RetiariiExeConfig(ConfigBase):
     experiment_working_directory: PathLike = '~/nni-experiments'
     # remove configuration of tuner/assessor/advisor
     training_service: TrainingServiceConfig
-    execution_engine: str = 'base'
+    execution_engine: str = 'py'
 
     def __init__(self, training_service_platform: Optional[str] = None, **kwargs):
         super().__init__(**kwargs)
         if training_service_platform is not None:
             assert 'training_service' not in kwargs
             self.training_service = util.training_service_config_factory(platform = training_service_platform)
-        self.__dict__['trial_command'] = 'python3 -m nni.retiarii.trial_entry base'
+        self.__dict__['trial_command'] = 'python3 -m nni.retiarii.trial_entry py'
 
     def __setattr__(self, key, value):
         fixed_attrs = {'search_space': '',
@@ -314,7 +314,7 @@ def stop(self) -> None:
         self._dispatcher_thread = None
         _logger.info('Experiment stopped')
 
-    def export_top_models(self, top_k: int = 1, optimize_mode: str = 'maximize', formatter: str = 'code') -> Any:
+    def export_top_models(self, top_k: int = 1, optimize_mode: str = 'maximize', formatter: str = 'dict') -> Any:
         """
         Export several top performing models.
 
diff --git a/nni/retiarii/serializer.py b/nni/retiarii/serializer.py
index e0c2a26115..efa78243bd 100644
--- a/nni/retiarii/serializer.py
+++ b/nni/retiarii/serializer.py
@@ -139,6 +139,7 @@ def serialize(cls, *args, **kwargs):
     To create an serializable instance inline without decorator. For example,
 
     .. code-block:: python
+
         self.op = serialize(MyCustomOp, hidden_units=128)
     """
     return serialize_cls(cls)(*args, **kwargs)