Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Support monitor mode when creating or resuming a new experiment #1933

Merged
merged 31 commits into from
Jan 8, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
704b50e
Merge pull request #200 from microsoft/master
SparkSnail Aug 6, 2019
5b0034e
Merge pull request #204 from microsoft/master
SparkSnail Aug 20, 2019
8fe2588
Merge pull request #205 from microsoft/master
SparkSnail Aug 30, 2019
9fae194
Merge pull request #206 from microsoft/master
SparkSnail Sep 16, 2019
c785655
Merge pull request #207 from microsoft/master
SparkSnail Oct 21, 2019
2f5272c
Merge pull request #208 from microsoft/master
SparkSnail Oct 24, 2019
1892bc2
Merge pull request #209 from microsoft/master
SparkSnail Oct 28, 2019
7c1ab11
Merge pull request #210 from microsoft/master
SparkSnail Oct 28, 2019
8c203f3
Merge pull request #211 from microsoft/master
SparkSnail Oct 31, 2019
d7a62f6
check pylint for nni_cmd
SparkSnail Oct 31, 2019
e259d10
fix id error
SparkSnail Oct 31, 2019
4997295
Merge pull request #212 from microsoft/master
SparkSnail Nov 3, 2019
c037a7c
Merge pull request #213 from microsoft/master
SparkSnail Nov 10, 2019
7620e7c
Merge pull request #214 from microsoft/master
SparkSnail Nov 14, 2019
d16dbe9
Merge pull request #215 from microsoft/master
SparkSnail Nov 19, 2019
9ce751d
Merge pull request #216 from microsoft/master
SparkSnail Nov 21, 2019
a0846f2
Merge pull request #217 from microsoft/master
SparkSnail Nov 22, 2019
cd3a912
Merge pull request #218 from microsoft/master
SparkSnail Nov 27, 2019
32efaa3
Merge pull request #219 from microsoft/master
SparkSnail Dec 10, 2019
543239c
Merge pull request #220 from microsoft/master
SparkSnail Dec 12, 2019
36e6e35
Merge pull request #221 from microsoft/master
SparkSnail Dec 19, 2019
f9ee589
Merge pull request #222 from microsoft/master
SparkSnail Dec 24, 2019
b9a7a95
Merge pull request #223 from microsoft/master
SparkSnail Dec 25, 2019
1a5c017
Merge pull request #224 from microsoft/master
SparkSnail Jan 6, 2020
b3c6ea4
init
SparkSnail Jan 7, 2020
667cf96
fix pylint
SparkSnail Jan 8, 2020
392460a
Merge pull request #225 from microsoft/master
SparkSnail Jan 8, 2020
69ab1d1
Merge branch 'master' of https://github.com/SparkSnail/nni into dev-n…
SparkSnail Jan 8, 2020
d992e2e
fix comments
SparkSnail Jan 8, 2020
5756894
change command to --watch
SparkSnail Jan 8, 2020
80d41db
add bash-completion for --watch command
SparkSnail Jan 8, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/en_US/Tutorial/Nnictl.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ nnictl support commands:
|--config, -c| True| |YAML configure file of the experiment|
|--port, -p|False| |the port of restful server|
|--debug, -d|False||set debug mode|
|--watch, -w|False||set watch mode|

* Examples

Expand Down Expand Up @@ -97,6 +98,7 @@ Debug mode will disable version check function in Trialkeeper.
|id| True| |The id of the experiment you want to resume|
|--port, -p| False| |Rest port of the experiment you want to resume|
|--debug, -d|False||set debug mode|
|--watch, -w|False||set watch mode|

* Example

Expand Down
4 changes: 2 additions & 2 deletions tools/bash-completion
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# list of commands/arguments
__nnictl_cmds="create resume view update stop trial experiment platform import export webui config log package tensorboard top"
__nnictl_create_cmds="--config --port --debug"
__nnictl_resume_cmds="--port --debug"
__nnictl_create_cmds="--config --port --debug --watch"
__nnictl_resume_cmds="--port --debug --watch"
__nnictl_view_cmds="--port"
__nnictl_update_cmds="searchspace concurrency duration trialnum"
__nnictl_update_searchspace_cmds="--filename"
Expand Down
6 changes: 4 additions & 2 deletions tools/nni_cmd/launcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
detect_port, get_user, get_python_dir
from .constants import NNICTL_HOME_DIR, ERROR_INFO, REST_TIME_OUT, EXPERIMENT_SUCCESS_INFO, LOG_HEADER, PACKAGE_REQUIREMENTS
from .command_utils import check_output_command, kill_command
from .nnictl_utils import update_experiment
from .nnictl_utils import update_experiment, set_monitor

def get_log_path(config_file_name):
'''generate stdout and stderr log path'''
Expand Down Expand Up @@ -493,6 +493,8 @@ def launch_experiment(args, experiment_config, mode, config_file_name, experimen
experiment_config['experimentName'])

print_normal(EXPERIMENT_SUCCESS_INFO % (experiment_id, ' '.join(web_ui_url_list)))
if args.watch:
set_monitor(True, 3, args.port, rest_process.pid)

def create_experiment(args):
'''start a new experiment'''
Expand All @@ -506,8 +508,8 @@ def create_experiment(args):
validate_all_content(experiment_config, config_path)

nni_config.set_config('experimentConfig', experiment_config)
launch_experiment(args, experiment_config, 'new', config_file_name)
nni_config.set_config('restServerPort', args.port)
launch_experiment(args, experiment_config, 'new', config_file_name)

def manage_stopped_experiment(args, mode):
'''view a stopped experiment'''
Expand Down
2 changes: 2 additions & 0 deletions tools/nni_cmd/nnictl.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,13 +51,15 @@ def parse_args():
parser_start.add_argument('--config', '-c', required=True, dest='config', help='the path of yaml config file')
parser_start.add_argument('--port', '-p', default=DEFAULT_REST_PORT, dest='port', help='the port of restful server')
parser_start.add_argument('--debug', '-d', action='store_true', help=' set debug mode')
parser_start.add_argument('--watch', '-w', action='store_true', help=' set watch mode')
parser_start.set_defaults(func=create_experiment)

# parse resume command
parser_resume = subparsers.add_parser('resume', help='resume a new experiment')
parser_resume.add_argument('id', nargs='?', help='The id of the experiment you want to resume')
parser_resume.add_argument('--port', '-p', default=DEFAULT_REST_PORT, dest='port', help='the port of restful server')
parser_resume.add_argument('--debug', '-d', action='store_true', help=' set debug mode')
parser_resume.add_argument('--watch', '-w', action='store_true', help=' set watch mode')
parser_resume.set_defaults(func=resume_experiment)

# parse view command
Expand Down
36 changes: 29 additions & 7 deletions tools/nni_cmd/nnictl_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

import csv
import os
import sys
import json
import time
import re
Expand Down Expand Up @@ -623,23 +624,44 @@ def show_experiment_info():
content[index].get('endTime'), content[index].get('status')))
print(TRIAL_MONITOR_TAIL)

def monitor_experiment(args):
'''monitor the experiment'''
if args.time <= 0:
print_error('please input a positive integer as time interval, the unit is second.')
exit(1)
def set_monitor(auto_exit, time_interval, port=None, pid=None):
'''set the experiment monitor engine'''
while True:
try:
os.system('clear')
if sys.platform == 'win32':
os.system('cls')
else:
os.system('clear')
update_experiment()
show_experiment_info()
time.sleep(args.time)
if auto_exit:
status = get_experiment_status(port)
if status in ['DONE', 'ERROR', 'STOPPED']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can print dispatcher and nnimanager log here (if the status is error). Because if user is running it in a container, when the program exit, the container is destroyed too. There is no way to retrieve the error info. Another option is to disable auto_exit in case --debug is set.

Copy link
Contributor Author

@SparkSnail SparkSnail Jan 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can print dispatcher and nnimanager log here (if the status is error). Because if user is running it in a container, when the program exit, the container is destroyed too. There is no way to retrieve the error info. Another option is to disable auto_exit in case --debug is set.

nniManager.log content maybe too long, maybe it's not suitable to show these content in screen. Users can mount NNI's logDir in container to their local path, the logDir contains log files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the case is not for screen. For screen user, they will never seek for foreground. For container users, they don't care. Furthermore, I don't think os.system("clear") has any effect if they are using container. Printing a lot is expected.

print_normal('Experiment status is {0}.'.format(status))
print_normal('Stopping experiment...')
kill_command(pid)
print_normal('Stop experiment success.')
exit(0)
time.sleep(time_interval)
except KeyboardInterrupt:
if auto_exit:
print_normal('Stopping experiment...')
kill_command(pid)
print_normal('Stop experiment success.')
else:
print_normal('Exiting...')
exit(0)
except Exception as exception:
print_error(exception)
exit(1)

def monitor_experiment(args):
'''monitor the experiment'''
if args.time <= 0:
print_error('please input a positive integer as time interval, the unit is second.')
exit(1)
set_monitor(False, args.time)

def export_trials_data(args):
'''export experiment metadata to csv
'''
Expand Down