Dev weight sharing (#568)

* add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * simple weight sharing * update gitignore file * change tuner codedir to relative path * add python cache files to gitignore list * move extract scalar reward logic from dispatcher to tuner * update tuner code corresponding to last commit * update doc for receive_trial_result api change * add numpy to package whitelist of pylint * distinguish param value from return reward for tuner.extract_scalar_reward * update pylintrc * add comments to dispatcher.handle_report_metric_data * update install for mac support * fix root mode bug on Makefile * Quick fix bug: nnictl port value error (#245) * fix port bug * Dev exp stop more (#221) * Exp stop refactor (#161) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * fix setup.py (#115) * Add DAG model configuration format for SQuAD example. * Explain config format for SQuAD QA model. * Add more detailed introduction about the evolution algorithm. * Fix install.sh add add trial log path (#109) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * show trial log path * update document * fix install.sh * set default vallue for maxTrialNum and maxExecDuration * fix nnictl * Dev smac (#116) * support package install (#91) * fix nnictl bug * support package install * update * update package install logic * Fix package install issue (#95) * fix nnictl bug * fix pakcage install * support SMAC as a tuner on nni (#81) * update doc * update doc * update doc * update hyperopt installation * update doc * update doc * update description in setup.py * update setup.py * modify encoding * encoding * add encoding * remove pymc3 * update doc * update builtin tuner spec * support smac in sdk, fix logging issue * support smac tuner * add optimize_mode * update config in nnictl * add __init__.py * update smac * update import path * update setup.py: remove entry_point * update rest server validation * fix bug in nnictl launcher * support classArgs: optimize_mode * quick fix bug * test travis * add dependency * add dependency * add dependency * add dependency * create smac python package * fix trivial points * optimize import of tuners, modify nnictl accordingly * fix bug: incorrect algorithm_name * trivial refactor * for debug * support virtual * update doc of SMAC * update smac requirements * update requirements * change debug mode * update doc * update doc * refactor based on comments * fix comments * modify example config path to relative path and increase maxTrialNum (#94) * modify example config path to relative path and increase maxTrialNum * add document * support conda (#90) (#110) * support install from venv and travis CI * support install from venv and travis CI * support install from venv and travis CI * support conda * support conda * modify example config path to relative path and increase maxTrialNum * undo messy commit * undo messy commit * Support pip install as root (#77) * Typo on #58 (#122) * PAI Training Service implementation (#128) * PAI Training service implementation **1. Implement PAITrainingService **2. Add trial-keeper python module, and modify setup.py to install the module **3. Add PAItrainingService rest server to collect metrics from PAI container. * fix datastore for multiple final result (#129) * Update NNI v0.2 release notes (#132) Update NNI v0.2 release notes * Update setup.py Makefile and documents (#130) * update makefile and setup.py * update makefile and setup.py * update document * update document * Update Makefile no travis * update doc * update doc * fix convert from ss to pcs (#133) * Fix bugs about webui (#131) * Fix webui bugs * Fix tslint * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit 1d17483. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * Merge branch V0.2 to Master (#143) * webui logpath and document (#135) * Add webui document and logpath as a href * fix tslint * fix comments by Chengmin * Pai training service bug fix and enhancement (#136) * Add NNI installation scripts * Update pai script, update NNI_out_dir * Update NNI dir in nni sdk local.py * Create .nni folder in nni sdk local.py * Add check before creating .nni folder * Fix typo for PAI_INSTALL_NNI_SHELL_FORMAT * Improve annotation (#138) * Improve annotation * Minor bugfix * Selectively install through pip (#139) Selectively install through pip * update setup.py * fix paiTrainingService bugs (#137) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * Add documentation for NNI PAI mode experiment (#141) * Add documentation for NNI PAI mode * Fix typo based on PR comments * Exit with subprocess return code of trial keeper * Remove additional exit code * Fix typo based on PR comments * update doc for smac tuner (#140) * Revert "Selectively install through pip (#139)" due to potential pip install issue (#142) * Revert "Selectively install through pip (#139)" This reverts commit 1d17483. * Add exit code of subprocess for trial_keeper * Update README, add link to PAImode doc * fix bug (#147) * Refactor nnictl and add config_pai.yml (#144) * fix nnictl bug * add hdfs host validation * fix bugs * fix dockerfile * fix install.sh * update install.sh * fix dockerfile * Set timeout for HDFSUtility exists function * remove unused TODO * fix sdk * add optional for outputDir and dataDir * refactor dockerfile.base * Remove unused import in hdfsclientUtility * add config_pai.yml * refactor nnictl create logic and add colorful print * fix nnictl stop logic * add annotation for config_pai.yml * add document for start experiment * fix config.yml * fix document * Fix trial keeper wrongly exit issue (#152) * Fix trial keeper bug, use actual exitcode to exit rather than 1 * Fix bug of table sort (#145) * Update doc for PAIMode and v0.2 release notes (#153) * Update v0.2 documentation regards to release note and PAI training service * Update document to describe NNI docker image * fix antd (#159) * refactor experiment stopping logic * support change concurrency * remove trialJobs.ts * trivial changes * fix bugs * fix bug * support updating maxTrialNum * Modify IT scripts for supporting multiple experiments * Update ci (#175) * Update RemoteMachineMode.md (#63) * Remove unused classes for SQuAD QA example. * Remove more unused functions for SQuAD QA example. * Fix default dataset config. * Add Makefile README (#64) * update document (#92) * Edit readme.md * updated a word * Update GetStarted.md * Update GetStarted.md * refact readme, getstarted and write your trial md. * Update README.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Update WriteYourTrial.md * Fix nnictl bugs and add new feature (#75) * fix nnictl bug * fix nnictl create bug * add experiment status logic * add more information for nnictl * fix Evolution Tuner bug * refactor code * fix code in updater.py * fix nnictl --help * fix classArgs bug * update check response.status_code logic * remove Buffer warning (#100) * update readme in ga_squad * update readme * fix typo * Update README.md * Update README.md * Update README.md * Add support for debugging mode * modify CI cuz of refracting exp stop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * update CI for expstop * file saving * fix issues from code merge * remove $(INSTALL_PREFIX)/nni/nni_manager before install * fix indent * fix merge issue * socket close * update port * fix merge error * modify ci logic in nnimanager * fix ci * fix bug * change suspended to done * update ci (#229) * update ci * update ci * update ci (#232) * update ci * update ci * update azure-pipelines * update azure-pipelines * update ci (#233) * update ci * update ci * update azure-pipelines * update azure-pipelines * update azure-pipelines * run.py (#238) * Nnupdate ci (#239) * run.py * test ci * Nnupdate ci (#240) * run.py * test ci * test ci * Udci (#241) * run.py * test ci * test ci * test ci * update ci (#242) * run.py * test ci * test ci * test ci * update ci * revert install.sh (#244) * run.py * test ci * test ci * test ci * update ci * revert install.sh * add comments * remove assert * trivial change * trivial change * update Makefile (#246) * update Makefile * update Makefile * quick fix for ci (#248) * add update trialNum and fix bugs (#261) * Add builtin tuner to CI (#247) * update Makefile * update Makefile * add builtin-tuner test * add builtin-tuner test * refractor ci * update azure.yml * add built-in tuner test * fix bugs * Doc refactor (#258) * doc refactor * image name refactor * Refactor nnictl to support listing stopped experiments. (#256) Refactor nnictl to support listing stopped experiments. * Show experiment parameters more beautifully (#262) * fix error on example of RemoteMachineMode (#269) * add pycharm project files to .gitignore list * update pylintrc to conform vscode settings * fix RemoteMachineMode for wrong trainingServicePlatform * Update docker file to use latest nni release (#263) * fix bug about execDuration and endTime (#270) * fix bug about execDuration and endTime * modify time interval to 30 seconds * refactor based on Gems's suggestion * for triggering ci * Refactor dockerfile (#264) * refactor Dockerfile * Support nnictl tensorboard (#268) support tensorboard * Sdk update (#272) * Rename get_parameters to get_next_parameter * annotations add get_next_parameter * updates * updates * updates * updates * updates * add experiment log path to experiment profile (#276) * refactor extract reward from dict by tuner * update Makefile for mac support, wait for aka.ms support * refix Makefile for colorful echo * unversion config.yml with machine information * sync graph.py between tuners & trial of ga_squad * sync graph.py between tuners & trial of ga_squad * copy weight shared ga_squad under weight_sharing folder * mv ga_squad code back to master * simple tuner & trial ready * Fix nnictl multiThread option * weight sharing with async dispatcher simple example ready * update for ga_squad * fix bug * modify multihead attention name * add min_layer_num to Graph * fix bug * update share id calc * fix bug * add save logging * fix ga_squad tuner bug * sync bug fix for ga_squad tuner * fix same hash_id bug * add lock to simple tuner in weight sharing * Add readme to simple weight sharing * update * update * add paper link * update * reformat with autopep8 * add documentation for weight sharing * test for weight sharing * delete irrelevant files * move details of weight sharing in to code comments
microsoft · Jan 7, 2019 · 50758b9 · 50758b9
1 parent c265903
commit 50758b9
Show file tree

Hide file tree

Showing 26 changed files with 3,060 additions and 9 deletions.
diff --git a/docs/AdvancedNAS.md b/docs/AdvancedNAS.md
@@ -0,0 +1,71 @@
+# Tutorial for Advanced Neural Architecture Search
+Currently many of the NAS algorithms leverage the technique of **weight sharing** among trials to accelerate its training process. For example, [ENAS][1] delivers 1000x effiency with '_parameter sharing between child models_', compared with the previous [NASNet][2] algorithm. Other NAS algorithms such as [DARTS][3], [Network Morphism][4], and [Evolution][5] is also leveraging, or has the potential to leverage weight sharing.
+
+This is a tutorial on how to enable weight sharing in NNI. 
+
+## Weight Sharing among trials
+Currently we recommend sharing weights through NFS (Network File System), which supports sharing files across machines, and is light-weighted, (relatively) efficient. We also welcome contributions from the community on more efficient techniques.
+
+### NFS Setup
+In NFS, files are physically stored on a server machine, and trials on the client machine can read/write those files in the same way that they access local files.
+
+#### Install NFS on server machine
+First, install NFS server:
+```bash
+sudo apt-get install nfs-kernel-server
+```
+Suppose `/tmp/nni/shared` is used as the physical storage, then run:
+```bash
+sudo mkdir -p /tmp/nni/shared
+sudo echo "/tmp/nni/shared *(rw,sync,no_subtree_check,no_root_squash)" >> /etc/exports
+sudo service nfs-kernel-server restart
+```
+You can check if the above directory is successfully exported by NFS using `sudo showmount -e localhost`
+
+#### Install NFS on client machine
+First, install NFS client:
+```bash
+sudo apt-get install nfs-common
+```
+Then create & mount the mounted directory of shared files:
+```bash
+sudo mkdir -p /mnt/nfs/nni/
+sudo mount -t nfs 10.10.10.10:/tmp/nni/shared /mnt/nfs/nni
+```
+where `10.10.10.10` should be replaced by the real IP of NFS server machine in practice.
+
+### Weight Sharing through NFS file
+With the NFS setup, trial code can share model weight through loading & saving files. For example, in tensorflow:
+```python
+# save models
+saver = tf.train.Saver()
+saver.save(sess, os.path.join(params['save_path'], 'model.ckpt'))
+# load models
+tf.init_from_checkpoint(params['restore_path'])
+```
+where `'save_path'` and `'restore_path'` in hyper-parameter can be managed by the tuner.
+
+## Asynchornous Dispatcher Mode for trial dependency control
+The feature of weight sharing enables trials from different machines, in which most of the time **read after write** consistency must be assured. After all, the child model should not load parent model before parent trial finishes training. To deal with this, users can enable **asynchronous dispatcher mode** with `multiThread: true` in `config.yml` in NNI, where the dispatcher assign a tuner thread each time a `NEW_TRIAL` request comes in, and the tuner thread can decide when to submit a new trial by blocking and unblocking the thread itself. For example:
+```python
+    def generate_parameters(self, parameter_id):
+        self.thread_lock.acquire()
+        indiv = # configuration for a new trial
+        self.events[parameter_id] = threading.Event()
+        self.thread_lock.release()
+        if indiv.parent_id is not None:
+            self.events[indiv.parent_id].wait()
+
+    def receive_trial_result(self, parameter_id, parameters, reward):
+        self.thread_lock.acquire()
+        # code for processing trial results
+        self.thread_lock.release()
+        self.events[parameter_id].set()
+```
+
+
+[1]: https://arxiv.org/abs/1802.03268
+[2]: https://arxiv.org/abs/1707.07012
+[3]: https://arxiv.org/abs/1806.09055
+[4]: https://arxiv.org/abs/1806.10282
+[5]: https://arxiv.org/abs/1703.01041 
diff --git a/examples/trials/ga_squad/trial.py b/examples/trials/ga_squad/trial.py
@@ -338,7 +338,7 @@ def train_with_graph(graph, qp_pairs, dev_qp_pairs):
                 answers = generate_predict_json(
                     position1, position2, ids, contexts)
                 if save_path is not None:
-                    with open(save_path + 'epoch%d.prediction' % epoch, 'w') as file:
+                    with open(os.path.join(save_path, 'epoch%d.prediction' % epoch), 'w') as file:
                         json.dump(answers, file)
                 else:
                     answers = json.dumps(answers)
@@ -359,8 +359,8 @@ def train_with_graph(graph, qp_pairs, dev_qp_pairs):
                     bestacc = acc
 
                     if save_path is not None:
-                        saver.save(sess, save_path + 'epoch%d.model' % epoch)
-                        with open(save_path + 'epoch%d.score' % epoch, 'wb') as file:
+                        saver.save(os.path.join(sess, save_path + 'epoch%d.model' % epoch))
+                        with open(os.path.join(save_path, 'epoch%d.score' % epoch), 'wb') as file:
                             pickle.dump(
                                 (position1, position2, ids, contexts), file)
                 logger.debug('epoch %d acc %g bestacc %g' %

diff --git a/examples/trials/weight_sharing/ga_squad/attention.py b/examples/trials/weight_sharing/ga_squad/attention.py
@@ -0,0 +1,171 @@
+# Copyright (c) Microsoft Corporation
+# All rights reserved.
+#
+# MIT License
+#
+# Permission is hereby granted, free of charge,
+# to any person obtaining a copy of this software and associated
+# documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and
+# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+# The above copyright notice and this permission notice shall be included
+# in all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
+# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
+# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+import math
+
+import tensorflow as tf
+from tensorflow.python.ops.rnn_cell_impl import RNNCell
+
+
+def _get_variable(variable_dict, name, shape, initializer=None, dtype=tf.float32):
+    if name not in variable_dict:
+        variable_dict[name] = tf.get_variable(
+            name=name, shape=shape, initializer=initializer, dtype=dtype)
+    return variable_dict[name]
+
+
+class DotAttention:
+    '''
+    DotAttention
+    '''
+
+    def __init__(self, name,
+                 hidden_dim,
+                 is_vanilla=True,
+                 is_identity_transform=False,
+                 need_padding=False):
+        self._name = '/'.join([name, 'dot_att'])
+        self._hidden_dim = hidden_dim
+        self._is_identity_transform = is_identity_transform
+        self._need_padding = need_padding
+        self._is_vanilla = is_vanilla
+        self._var = {}
+
+    @property
+    def is_identity_transform(self):
+        return self._is_identity_transform
+
+    @property
+    def is_vanilla(self):
+        return self._is_vanilla
+
+    @property
+    def need_padding(self):
+        return self._need_padding
+
+    @property
+    def hidden_dim(self):
+        return self._hidden_dim
+
+    @property
+    def name(self):
+        return self._name
+
+    @property
+    def var(self):
+        return self._var
+
+    def _get_var(self, name, shape, initializer=None):
+        with tf.variable_scope(self.name):
+            return _get_variable(self.var, name, shape, initializer)
+
+    def _define_params(self, src_dim, tgt_dim):
+        hidden_dim = self.hidden_dim
+        self._get_var('W', [src_dim, hidden_dim])
+        if not self.is_vanilla:
+            self._get_var('V', [src_dim, hidden_dim])
+            if self.need_padding:
+                self._get_var('V_s', [src_dim, src_dim])
+                self._get_var('V_t', [tgt_dim, tgt_dim])
+            if not self.is_identity_transform:
+                self._get_var('T', [tgt_dim, src_dim])
+        self._get_var('U', [tgt_dim, hidden_dim])
+        self._get_var('b', [1, hidden_dim])
+        self._get_var('v', [hidden_dim, 1])
+
+    def get_pre_compute(self, s):
+        '''
+        :param s: [src_sequence, batch_size, src_dim]
+        :return: [src_sequence, batch_size. hidden_dim]
+        '''
+        hidden_dim = self.hidden_dim
+        src_dim = s.get_shape().as_list()[-1]
+        assert src_dim is not None, 'src dim must be defined'
+        W = self._get_var('W', shape=[src_dim, hidden_dim])
+        b = self._get_var('b', shape=[1, hidden_dim])
+        return tf.tensordot(s, W, [[2], [0]]) + b
+
+    def get_prob(self, src, tgt, mask, pre_compute, return_logits=False):
+        '''
+        :param s: [src_sequence_length, batch_size, src_dim]
+        :param h: [batch_size, tgt_dim] or [tgt_sequence_length, batch_size, tgt_dim]
+        :param mask: [src_sequence_length, batch_size]\
+             or [tgt_sequence_length, src_sequence_length, batch_sizse]
+        :param pre_compute: [src_sequence_length, batch_size, hidden_dim]
+        :return: [src_sequence_length, batch_size]\
+             or [tgt_sequence_length, src_sequence_length, batch_size]
+        '''
+        s_shape = src.get_shape().as_list()
+        h_shape = tgt.get_shape().as_list()
+        src_dim = s_shape[-1]
+        tgt_dim = h_shape[-1]
+        assert src_dim is not None, 'src dimension must be defined'
+        assert tgt_dim is not None, 'tgt dimension must be defined'
+
+        self._define_params(src_dim, tgt_dim)
+
+        if len(h_shape) == 2:
+            tgt = tf.expand_dims(tgt, 0)
+        if pre_compute is None:
+            pre_compute = self.get_pre_compute(src)
+
+        buf0 = pre_compute
+        buf1 = tf.tensordot(tgt, self.var['U'], axes=[[2], [0]])
+        buf2 = tf.tanh(tf.expand_dims(buf0, 0) + tf.expand_dims(buf1, 1))
+
+        if not self.is_vanilla:
+            xh1 = tgt
+            xh2 = tgt
+            s1 = src
+            if self.need_padding:
+                xh1 = tf.tensordot(xh1, self.var['V_t'], 1)
+                xh2 = tf.tensordot(xh2, self.var['S_t'], 1)
+                s1 = tf.tensordot(s1, self.var['V_s'], 1)
+            if not self.is_identity_transform:
+                xh1 = tf.tensordot(xh1, self.var['T'], 1)
+                xh2 = tf.tensordot(xh2, self.var['T'], 1)
+            buf3 = tf.expand_dims(s1, 0) * tf.expand_dims(xh1, 1)
+            buf3 = tf.tanh(tf.tensordot(buf3, self.var['V'], axes=[[3], [0]]))
+            buf = tf.reshape(tf.tanh(buf2 + buf3), shape=tf.shape(buf3))
+        else:
+            buf = buf2
+        v = self.var['v']
+        e = tf.tensordot(buf, v, [[3], [0]])
+        e = tf.squeeze(e, axis=[3])
+        tmp = tf.reshape(e + (mask - 1) * 10000.0, shape=tf.shape(e))
+        prob = tf.nn.softmax(tmp, 1)
+        if len(h_shape) == 2:
+            prob = tf.squeeze(prob, axis=[0])
+            tmp = tf.squeeze(tmp, axis=[0])
+        if return_logits:
+            return prob, tmp
+        return prob
+
+    def get_att(self, s, prob):
+        '''
+        :param s: [src_sequence_length, batch_size, src_dim]
+        :param prob: [src_sequence_length, batch_size]\
+            or [tgt_sequence_length, src_sequence_length, batch_size]
+        :return: [batch_size, src_dim] or [tgt_sequence_length, batch_size, src_dim]
+        '''
+        buf = s * tf.expand_dims(prob, axis=-1)
+        att = tf.reduce_sum(buf, axis=-3)
+        return att
diff --git a/examples/trials/weight_sharing/ga_squad/config_remote.yml b/examples/trials/weight_sharing/ga_squad/config_remote.yml
@@ -0,0 +1,31 @@
+authorName: default
+experimentName: ga_squad_weight_sharing
+trialConcurrency: 2
+maxExecDuration: 1h
+maxTrialNum: 200
+#choice: local, remote, pai
+trainingServicePlatform: remote
+#choice: true, false
+useAnnotation: false
+multiThread: true
+tuner:
+  codeDir: ../../../tuners/weight_sharing/ga_customer_tuner
+  classFileName: customer_tuner.py 
+  className: CustomerTuner
+  classArgs:
+    optimize_mode: maximize
+    population_size: 32
+    save_dir_root: /mnt/nfs/nni/ga_squad
+trial:
+  command: python3 trial.py --input_file /mnt/nfs/nni/train-v1.1.json --dev_file /mnt/nfs/nni/dev-v1.1.json --max_epoch 1 --embedding_file /mnt/nfs/nni/glove.6B.300d.txt
+  codeDir: .
+  gpuNum: 1
+machineList:
+  - ip: remote-ip-0
+    port: 8022
+    username: root 
+    passwd: screencast
+  - ip: remote-ip-1
+    port: 8022
+    username: root 
+    passwd: screencast