-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Conversation
nni/experiment/config/convert.py
Outdated
_logger = logging.getLogger(__name__) | ||
|
||
|
||
def to_old_yaml(config: ExperimentConfig, skip_nnictl: bool = False) -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to_v1_yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay.
@@ -228,7 +228,7 @@ def __repr__(self): | |||
.format(self.trialJobId, self.status, self.hyperParameters, self.logPath, | |||
self.startTime, self.endTime, self.finalMetricData, self.stderrPath) | |||
|
|||
class Experiment: | |||
class ExternalExperiment: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is external experiment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means experiment created by other process.
You can suggest a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what scenario is it?
nni/tools/nnictl/launcher.py
Outdated
@@ -85,6 +85,7 @@ def start_rest_server(port, platform, mode, experiment_id, foreground=False, log | |||
log_header = LOG_HEADER % str(time_now) | |||
stdout_file.write(log_header) | |||
stderr_file.write(log_header) | |||
print('## [nnictl] cmds:', cmds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this debug message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops
data['trial'] = { | ||
'command': data.pop('trialCommand'), | ||
'codeDir': data.pop('trialCodeDirectory'), | ||
'gpuNum': data.pop('trialGpuNumber', '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the default value for gpuNum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be None
, set in nni/experiment/config/common.py
.
The change is in experiment config PR.
'reuse': ts['reuseMode'] | ||
} | ||
|
||
return data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens to other training platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be in experiment config PR. This PR bases on a draft snapshot. Same below.
data['remoteConfig'] = {'reuse': ts['reuseMode']} | ||
data['machineList'] = [] | ||
for machine in ts['machineList']: | ||
machine = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to have no effect.
@@ -0,0 +1,27 @@ | |||
# FIXME: For demonstration only. It should not be here | |||
|
|||
from pathlib import Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we releasing this feature as experimental or recommended way to launch an experiment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experimental I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ultmaster what is the difference? why experimental is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest full IT and UT pipeline if we are recommending this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should add IT and UT, but could in a follow up pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to the code completeness, the feature should be experimental. still suggest to add IT and UT in this release
please fix pipeline errors |
optional = any([ | ||
type_name.startswith('Optional['), | ||
type_name.startswith('Union[') and 'NoneType' in type_name, | ||
type_name == 'Any' | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
directly parse type using string operation seems hacky...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generic classes are not designed for attribute accessing.
if experiment_config.get('logCollection'): | ||
request_data['logCollection'] = experiment_config.get('logCollection') | ||
request_data['clusterMetaData'] = [] | ||
if experiment_config['trainingServicePlatform'] == 'local': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just reminding this place is a bit strange, also in launcher.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More detail?
|
||
def camel_case(key: str) -> str: | ||
words = key.split('_') | ||
return words[0] + ''.join(word.title() for word in words[1:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if _xxx_xxx
, or this will not happen? Do we need remove the prefix _
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this will not happen because there is no equivalent in camelCase.
nni/experiment/experiment.py
Outdated
while True: | ||
time.sleep(10) | ||
status = self.get_status() | ||
if status in ['ERROR', 'STOPPED', 'NO_MORE_TRIAL']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NO_MORE_TRIAL
means currSubmittedTrialNum >= experimentProfile.params.maxTrialNum
and there are unfinished jobs. We should not return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, got it.
nni/runtime/protocol.py
Outdated
@@ -32,8 +32,7 @@ class CommandType(Enum): | |||
_in_file = open(3, 'rb') | |||
_out_file = open(4, 'wb') | |||
except OSError: | |||
_msg = 'IPC pipeline not exists, maybe you are importing tuner/assessor from trial code?' | |||
logging.getLogger(__name__).warning(_msg) | |||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not need log here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the warning causes more trouble then benefit.
It doesn't introduce real problems to import the module in wrong place.
setup.py
Outdated
@@ -74,7 +74,8 @@ | |||
'pkginfo', | |||
'websockets', | |||
'filelock', | |||
'prettytable' | |||
'prettytable', | |||
'dataclasses ; python_version < "3.7"' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we support python_version >= 3.7
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a standard library in 3.7+.
'preCommand': machine['trialPrepareCommand'] | ||
} | ||
|
||
elif ts['platform'] == 'pai': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about other platforms? like kubeflow, aml etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in PR #3138
This PR is not about config schema. It merely includes a snapshot version so it can run end-to-end.
{'key': 'aml_config', 'value': experiment_config['amlConfig']}) | ||
request_data['clusterMetaData'].append( | ||
{'key': 'trial_config', 'value': experiment_config['trial']}) | ||
return request_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
miss 'adl', 'dlts' here.
@@ -21,9 +20,6 @@ | |||
os.makedirs(_outputdir) | |||
|
|||
_nni_platform = trial_env_vars.NNI_PLATFORM | |||
if _nni_platform == 'local': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove trial.log in local mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logging system is refactored. Its initialized when nni
(top-level package) get imported.
nni/runtime/protocol.py
Outdated
@@ -32,8 +32,7 @@ class CommandType(Enum): | |||
_in_file = open(3, 'rb') | |||
_out_file = open(4, 'wb') | |||
except OSError: | |||
_msg = 'IPC pipeline not exists, maybe you are importing tuner/assessor from trial code?' | |||
logging.getLogger(__name__).warning(_msg) | |||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove warning info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It causes more problem than benefit. Importing the package in wrong place won't cause real problem.
nni/runtime/protocol.py
Outdated
@@ -32,8 +32,7 @@ class CommandType(Enum): | |||
_in_file = open(3, 'rb') | |||
_out_file = open(4, 'wb') | |||
except OSError: | |||
_msg = 'IPC pipeline not exists, maybe you are importing tuner/assessor from trial code?' | |||
logging.getLogger(__name__).warning(_msg) | |||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove this log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It causes many problems (I have fixed relative issue multiple times) and has little real benefit.
It's more a debug message in nni's early stage, when I had no confidence about the correctness of ipc modules.
I don't know where should I place the example. Please give suggestion.