Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Create experiment from Python code #3111

Merged
merged 24 commits into from
Dec 9, 2020
Merged

Conversation

liuzhe-lz
Copy link
Contributor

@liuzhe-lz liuzhe-lz commented Nov 20, 2020

I don't know where should I place the example. Please give suggestion.

@liuzhe-lz liuzhe-lz marked this pull request as ready for review November 26, 2020 23:47
@liuzhe-lz liuzhe-lz mentioned this pull request Nov 30, 2020
77 tasks
@QuanluZhang QuanluZhang removed the request for review from chicm-ms December 4, 2020 09:09
_logger = logging.getLogger(__name__)


def to_old_yaml(config: ExperimentConfig, skip_nnictl: bool = False) -> Dict[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to_v1_yaml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay.

@@ -228,7 +228,7 @@ def __repr__(self):
.format(self.trialJobId, self.status, self.hyperParameters, self.logPath,
self.startTime, self.endTime, self.finalMetricData, self.stderrPath)

class Experiment:
class ExternalExperiment:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is external experiment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means experiment created by other process.
You can suggest a better name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what scenario is it?

@@ -85,6 +85,7 @@ def start_rest_server(port, platform, mode, experiment_id, foreground=False, log
log_header = LOG_HEADER % str(time_now)
stdout_file.write(log_header)
stderr_file.write(log_header)
print('## [nnictl] cmds:', cmds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this debug message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops

data['trial'] = {
'command': data.pop('trialCommand'),
'codeDir': data.pop('trialCodeDirectory'),
'gpuNum': data.pop('trialGpuNumber', '')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the default value for gpuNum?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be None, set in nni/experiment/config/common.py.
The change is in experiment config PR.

'reuse': ts['reuseMode']
}

return data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens to other training platforms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be in experiment config PR. This PR bases on a draft snapshot. Same below.

data['remoteConfig'] = {'reuse': ts['reuseMode']}
data['machineList'] = []
for machine in ts['machineList']:
machine = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have no effect.

@@ -0,0 +1,27 @@
# FIXME: For demonstration only. It should not be here

from pathlib import Path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we releasing this feature as experimental or recommended way to launch an experiment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experimental I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ultmaster what is the difference? why experimental is better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest full IT and UT pipeline if we are recommending this feature.

Copy link
Contributor

@QuanluZhang QuanluZhang Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should add IT and UT, but could in a follow up pr

Copy link
Contributor

@QuanluZhang QuanluZhang Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to the code completeness, the feature should be experimental. still suggest to add IT and UT in this release

@QuanluZhang
Copy link
Contributor

please fix pipeline errors

Comment on lines +125 to +129
optional = any([
type_name.startswith('Optional['),
type_name.startswith('Union[') and 'NoneType' in type_name,
type_name == 'Any'
])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directly parse type using string operation seems hacky...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic classes are not designed for attribute accessing.

if experiment_config.get('logCollection'):
request_data['logCollection'] = experiment_config.get('logCollection')
request_data['clusterMetaData'] = []
if experiment_config['trainingServicePlatform'] == 'local':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just reminding this place is a bit strange, also in launcher.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More detail?


def camel_case(key: str) -> str:
words = key.split('_')
return words[0] + ''.join(word.title() for word in words[1:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if _xxx_xxx, or this will not happen? Do we need remove the prefix _?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this will not happen because there is no equivalent in camelCase.

while True:
time.sleep(10)
status = self.get_status()
if status in ['ERROR', 'STOPPED', 'NO_MORE_TRIAL']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO_MORE_TRIAL means currSubmittedTrialNum >= experimentProfile.params.maxTrialNum and there are unfinished jobs. We should not return.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, got it.

@@ -32,8 +32,7 @@ class CommandType(Enum):
_in_file = open(3, 'rb')
_out_file = open(4, 'wb')
except OSError:
_msg = 'IPC pipeline not exists, maybe you are importing tuner/assessor from trial code?'
logging.getLogger(__name__).warning(_msg)
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not need log here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the warning causes more trouble then benefit.
It doesn't introduce real problems to import the module in wrong place.

setup.py Outdated
@@ -74,7 +74,8 @@
'pkginfo',
'websockets',
'filelock',
'prettytable'
'prettytable',
'dataclasses ; python_version < "3.7"'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we support python_version >= 3.7 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a standard library in 3.7+.

'preCommand': machine['trialPrepareCommand']
}

elif ts['platform'] == 'pai':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about other platforms? like kubeflow, aml etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in PR #3138
This PR is not about config schema. It merely includes a snapshot version so it can run end-to-end.

{'key': 'aml_config', 'value': experiment_config['amlConfig']})
request_data['clusterMetaData'].append(
{'key': 'trial_config', 'value': experiment_config['trial']})
return request_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

miss 'adl', 'dlts' here.

@@ -21,9 +20,6 @@
os.makedirs(_outputdir)

_nni_platform = trial_env_vars.NNI_PLATFORM
if _nni_platform == 'local':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove trial.log in local mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logging system is refactored. Its initialized when nni (top-level package) get imported.

@@ -32,8 +32,7 @@ class CommandType(Enum):
_in_file = open(3, 'rb')
_out_file = open(4, 'wb')
except OSError:
_msg = 'IPC pipeline not exists, maybe you are importing tuner/assessor from trial code?'
logging.getLogger(__name__).warning(_msg)
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove warning info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It causes more problem than benefit. Importing the package in wrong place won't cause real problem.

@@ -32,8 +32,7 @@ class CommandType(Enum):
_in_file = open(3, 'rb')
_out_file = open(4, 'wb')
except OSError:
_msg = 'IPC pipeline not exists, maybe you are importing tuner/assessor from trial code?'
logging.getLogger(__name__).warning(_msg)
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this log?

Copy link
Contributor Author

@liuzhe-lz liuzhe-lz Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It causes many problems (I have fixed relative issue multiple times) and has little real benefit.
It's more a debug message in nni's early stage, when I had no confidence about the correctness of ipc modules.

@liuzhe-lz liuzhe-lz merged commit 765206c into microsoft:master Dec 9, 2020
@liuzhe-lz liuzhe-lz deleted the exp branch March 17, 2021 22:37
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants