Align `nni.experiment` tuner behavior with nnictl #3419

liuzhe-lz · 2021-03-04T07:53:58Z

This PR also refactors NNI manager to support quit from REST request.

QuanluZhang · 2021-03-04T08:16:04Z

nni/experiment/config/common.py

+    'tuner_gpu_indices': lambda value: [int(idx) for idx in value.split(',')] if isinstance(value, str) else value,
+    'tuner': lambda config: None if config.name == '_none_' else config,
+    'assessor': lambda config: None if config.name == '_none_' else config,
+    'advisor': lambda config: None if config.name == '_none_' else config,


where 'none' is assigned to name and in which case?

In Experiment.__init__(). The purpose is to let users omit calling AlgorithmConfig() explicitly.
AlgorithmConfig with name "none" is designed to be equivalent to None, just like Path('/home') is equivalent to str('/home/'). This means everywhere tuner can be None, it can also be AlgorithmConfig(name='_none_').

QuanluZhang · 2021-03-04T11:06:40Z

nni/experiment/launcher.py

@@ -97,7 +86,8 @@ def _start_rest_server(config: ExperimentConfig, port: int, debug: bool, experim
        from subprocess import CREATE_NEW_PROCESS_GROUP
        proc = Popen(cmd, cwd=node_dir, creationflags=CREATE_NEW_PROCESS_GROUP)
    else:
-        proc = Popen(cmd, cwd=node_dir)
+        import os
+        proc = Popen(cmd, cwd=node_dir, preexec_fn=os.setpgrp)


why preexec_fn is not set in the previous version?

Because it's not needed.
This is about terminating subprocess. Previously tuner is not a subprocess.

QuanluZhang · 2021-03-04T11:16:25Z

ts/nni_manager/main.ts

-        const ds: DataStore = component.get(DataStore);
-        await ds.close();
-        const restServer: NNIRestServer = component.get(NNIRestServer);
-        await restServer.stop();


it is a little strange that these components are created in this file, but stopped in nnimanager.ts

Maybe. But I don't want to make huge refactor at this point.

QuanluZhang · 2021-03-04T23:34:50Z

nni/experiment/config/convert.py

@@ -14,7 +14,7 @@


 def to_v1_yaml(config: ExperimentConfig, skip_nnictl: bool = False) -> Dict[str, Any]:
-    config.validate(skip_nnictl)
+    config.validate(False)


skip_nnictl is not used in this function. what is the meaning of skip_nnictl?

This function is likely to be removed soon. I'm too lazy to make it elegant.

J-shang · 2021-03-05T02:33:27Z

ts/nni_manager/rest_server/restHandler.ts

+        router.delete('/experiment', (req: Request, res: Response) => {
+            this.nniManager.stopExperimentTopHalf().then(() => {
+                res.send();
+                this.nniManager.stopExperimentBottomHalf();


Maybe we can wait for the whole experiment to stop, then res.send()? To avoid something like the dispatcher has an independent process in the future. Or we need to put this kind of independent process cleanup() in stopExperimentTopHalf()? Anyway, it is fine in this version.

Because killing trials is too time consuming. I'm afraid users (like Quanlu) will kill the process with second ctrl-c if we let them wait that long.
In theory if a clean up routine should block the requester, it goes into top half; if the routine takes too much time which harms the user experience, it goes into bottom half.

ultmaster · 2021-03-05T06:48:50Z

ts/nni_manager/core/nnimanager.ts

+        await this.stopExperimentBottomHalf();
+    }
+
+    public async stopExperimentTopHalf(): Promise<void> {


The API sounds very internal. Can we think of a formal name?

The name comes from Linux interrupt handler. Top-half means it should be completed before handling control to caller, while bottom-half means it can be handled later in background.

align Experiment config with nnictl

29bd9d4

liuzhe-lz requested review from J-shang and QuanluZhang March 4, 2021 07:54

QuanluZhang reviewed Mar 4, 2021

View reviewed changes

fix lint

f818e35

QuanluZhang reviewed Mar 4, 2021

View reviewed changes

QuanluZhang requested review from SparkSnail and ultmaster March 4, 2021 11:20

QuanluZhang reviewed Mar 4, 2021

View reviewed changes

J-shang approved these changes Mar 5, 2021

View reviewed changes

ultmaster reviewed Mar 5, 2021

View reviewed changes

QuanluZhang approved these changes Mar 5, 2021

View reviewed changes

J-shang closed this Mar 5, 2021

J-shang reopened this Mar 5, 2021

J-shang added 2 commits March 5, 2021 12:21

update doc

4cf9112

fix lint

ccde5df

QuanluZhang self-requested a review March 6, 2021 13:40

try fix ut

933f4a3

J-shang merged commit bc55eec into microsoft:master Mar 8, 2021

liuzhe-lz deleted the exp-tuner branch March 17, 2021 22:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align `nni.experiment` tuner behavior with nnictl #3419

Align `nni.experiment` tuner behavior with nnictl #3419

liuzhe-lz commented Mar 4, 2021

QuanluZhang Mar 4, 2021

QuanluZhang Mar 4, 2021

liuzhe-lz Mar 4, 2021

QuanluZhang Mar 4, 2021

liuzhe-lz Mar 5, 2021 •

edited

Loading

QuanluZhang Mar 4, 2021

liuzhe-lz Mar 5, 2021

QuanluZhang Mar 4, 2021

liuzhe-lz Mar 5, 2021

J-shang Mar 5, 2021

liuzhe-lz Mar 5, 2021 •

edited

Loading

ultmaster Mar 5, 2021

liuzhe-lz Mar 8, 2021 •

edited

Loading

Align nni.experiment tuner behavior with nnictl #3419

Align nni.experiment tuner behavior with nnictl #3419

Conversation

liuzhe-lz commented Mar 4, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liuzhe-lz Mar 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liuzhe-lz Mar 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liuzhe-lz Mar 8, 2021 • edited Loading

Choose a reason for hiding this comment

Align `nni.experiment` tuner behavior with nnictl #3419

Align `nni.experiment` tuner behavior with nnictl #3419

liuzhe-lz Mar 5, 2021 •

edited

Loading

liuzhe-lz Mar 5, 2021 •

edited

Loading

liuzhe-lz Mar 8, 2021 •

edited

Loading