Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Add preCommand option to support configuring experimental environment by user #2875

Merged
merged 6 commits into from
Sep 21, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions docs/en_US/TrainingService/RemoteMachineMode.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,3 +107,79 @@ Files in `codeDir` will be uploaded to remote machines automatically. You can ru
```bash
nnictl create --config examples/trials/mnist-annotation/config_remote.yml
```

### Configure python environment

By default, commands and scripts will be executed in the default environment in remote machine. If there are multiple python virtual environments in your remote machine, and you want to run experiments in a specific environment, then use __preCommand__ to specify a python environment on your remote machine.

Use `examples/trials/mnist-tfv2` as the example. Below is content of `examples/trials/mnist-tfv2/config_remote.yml`:

```yaml
authorName: default
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: remote
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 0
#machineList can be empty if the platform is local
machineList:
- ip: ${replace_to_your_remote_machine_ip}
username: ${replace_to_your_remote_machine_username}
sshKeyPath: ${replace_to_your_remote_machine_sshKeyPath}
# Pre-command will be executed before the remote machine executes other commands.
# Below is an example of specifying python environment.
# If you want to execute multiple commands, please use "&&" to connect them.
# preCommand: source ${replace_to_absolute_path_recommended_here}/bin/activate
# preCommand: source ${replace_to_conda_path}/bin/activate ${replace_to_conda_env_name}
preCommand: export PATH=${replace_to_python_environment_path_in_your_remote_machine}:$PATH
```

The __preCommand__ will be executed before the remote machine executes other commands. So you can configure python environment path like this:

```yaml
# Linux remote machine
preCommand: export PATH=${replace_to_python_environment_path_in_your_remote_machine}:$PATH
# Windows remote machine
preCommand: set path=${replace_to_python_environment_path_in_your_remote_machine};%path%
```

Or if you want to activate the `virtualenv` environment:

```yaml
# Linux remote machine
preCommand: source ${replace_to_absolute_path_recommended_here}/bin/activate
# Windows remote machine
preCommand: ${replace_to_absolute_path_recommended_here}\\scripts\\activate
```

Or if you want to activate the `conda` environment:

```yaml
# Linux remote machine
preCommand: source ${replace_to_conda_path}/bin/activate ${replace_to_conda_env_name}
# Windows remote machine
preCommand: call activate ${replace_to_conda_env_name}
```

If you want multiple commands to be executed, you can use `&&` to connect these commands:

```yaml
preCommand: command1 && command2 && command3
```

__Note__: Because __preCommand__ will execute before other commands each time, it is strongly not recommended to set __preCommand__ that will make changes to system, i.e. `mkdir` or `touch`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doc is very clear!

15 changes: 15 additions & 0 deletions docs/en_US/Tutorial/ExperimentConfig.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ This document describes the rules to write the config file, and provides some ex
- [gpuIndices](#gpuindices-3)
- [maxTrialNumPerGpu](#maxtrialnumpergpu-1)
- [useActiveGpu](#useactivegpu-1)
- [preCommand](#preCommand)
+ [kubeflowConfig](#kubeflowconfig)
- [operator](#operator)
- [storage](#storage)
Expand Down Expand Up @@ -583,6 +584,14 @@ Optional. Bool. Default: false.

Used to specify whether to use a GPU if there is another process. By default, NNI will use the GPU only if there is no other active process in the GPU. If __useActiveGpu__ is set to true, NNI will use the GPU regardless of another processes. This field is not applicable for NNI on Windows.

#### preCommand

Optional. String.

Specifies the pre-command that will be executed before the remote machine executes other commands. Users can configure the experimental environment on remote machine by setting __preCommand__. If there are multiple commands need to execute, use `&&` to connect them, such as `preCommand: command1 && command2 && ...`.

__Note__: Because __preCommand__ will execute before other commands each time, it is strongly not recommended to set __preCommand__ that will make changes to system, i.e. `mkdir` or `touch`.

### kubeflowConfig

#### operator
Expand Down Expand Up @@ -795,6 +804,12 @@ If run trial jobs in remote machine, users could specify the remote machine info
username: test
sshKeyPath: /nni/sshkey
passphrase: qwert
# Pre-command will be executed before the remote machine executes other commands.
# Below is an example of specifying python environment.
# If you want to execute multiple commands, please use "&&" to connect them.
# preCommand: source ${replace_to_absolute_path_recommended_here}/bin/activate
# preCommand: source ${replace_to_conda_path}/bin/activate ${replace_to_conda_env_name}
preCommand: export PATH=${replace_to_python_environment_path_in_your_remote_machine}:$PATH
```

### PAI mode
Expand Down
32 changes: 32 additions & 0 deletions examples/trials/mnist-tfv2/config_remote.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
authorName: default
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: remote
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 0
#machineList can be empty if the platform is local
machineList:
- ip: ${replace_to_your_remote_machine_ip}
username: ${replace_to_your_remote_machine_username}
sshKeyPath: ${replace_to_your_remote_machine_sshKeyPath}
# Pre-command will be executed before the remote machine executes other commands.
# Below is an example of specifying python environment.
# If you want to execute multiple commands, please use "&&" to connect them.
# preCommand: source ${replace_to_absolute_path_recommended_here}/bin/activate
# preCommand: source ${replace_to_conda_path}/bin/activate ${replace_to_conda_env_name}
preCommand: export PATH=${replace_to_python_environment_path_in_your_remote_machine}:$PATH
3 changes: 2 additions & 1 deletion src/nni_manager/rest_server/restValidationSchemas.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ export namespace ValidationSchemas {
passphrase: joi.string(),
gpuIndices: joi.string(),
maxTrialNumPerGpu: joi.number(),
useActiveGpu: joi.boolean()
useActiveGpu: joi.boolean(),
preCommand: joi.string()
})),
local_config: joi.object({ // eslint-disable-line @typescript-eslint/camelcase
gpuIndices: joi.string(),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -123,11 +123,19 @@ class LinuxCommands extends OsCommands {
if (isFile) {
command = `bash '${script}'`;
} else {
script = script.replace('"', '\\"');
script = script.replace(/"/g, '\\"');
command = `bash -c "${script}"`;
}
return command;
}

public addPreCommand(preCommand: string | undefined, command: string | undefined): string | undefined{
if (command === undefined || command === '' || preCommand === undefined || preCommand === ''){
return command;
} else {
return `${preCommand} && ${command}`;
}
}
}

export { LinuxCommands };
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ class WindowsCommands extends OsCommands {
}

public generateGpuStatsScript(scriptFolder: string): string {
return `powershell -command $env:METRIC_OUTPUT_DIR='${scriptFolder}';$app = Start-Process -FilePath python -NoNewWindow -passthru -ArgumentList '-m nni_gpu_tool.gpu_metrics_collector' -RedirectStandardOutput ${scriptFolder}\\scriptstdout -RedirectStandardError ${scriptFolder}\\scriptstderr;Write $PID ^| Out-File ${scriptFolder}\\pid -NoNewline -encoding utf8;wait-process $app.ID`;
return `powershell -command $env:Path=If($env:prePath){$env:prePath}Else{$env:Path};$env:METRIC_OUTPUT_DIR='${scriptFolder}';$app = Start-Process -FilePath python -NoNewWindow -passthru -ArgumentList '-m nni_gpu_tool.gpu_metrics_collector' -RedirectStandardOutput ${scriptFolder}\\scriptstdout -RedirectStandardError ${scriptFolder}\\scriptstderr;Write $PID ^| Out-File ${scriptFolder}\\pid -NoNewline -encoding utf8;wait-process $app.ID`;
}

public createFolder(folderName: string, sharedFolder: boolean = false): string {
Expand Down Expand Up @@ -122,6 +122,14 @@ class WindowsCommands extends OsCommands {
const command = `${script}`;
return command;
}

public addPreCommand(preCommand: string | undefined, command: string | undefined): string | undefined{
if (command === undefined || command === '' || preCommand === undefined || preCommand === ''){
return command;
} else {
return `${preCommand} && set prePath=%path% && ${command}`;
}
}
}

export { WindowsCommands };
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ abstract class OsCommands {
public abstract killChildProcesses(pidFileName: string, killSelf: boolean): string;
public abstract extractFile(tarFileName: string, targetFolder: string): string;
public abstract executeScript(script: string, isFile: boolean): string;
public abstract addPreCommand(preCommand: string | undefined, command: string | undefined): string | undefined;

public joinPath(...paths: string[]): string {
let dir: string = paths.filter((path: any) => path !== '').join(this.pathSpliter);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ export class RemoteMachineMeta {
//TODO: initialize varialbe in constructor
public occupiedGpuIndexMap?: Map<number, number>;
public readonly useActiveGpu?: boolean = false;
public readonly preCommand?: string;
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ class ShellExecutor {
private tempPath: string = "";
private isWindows: boolean = false;
private channelDefaultOutputs: string[] = [];
private preCommand: string | undefined;

constructor() {
this.log = getLogger();
Expand All @@ -47,6 +48,7 @@ class ShellExecutor {
username: rmMeta.username,
tryKeyboard: true,
};
this.preCommand = rmMeta.preCommand;
this.name = `${rmMeta.username}@${rmMeta.ip}:${rmMeta.port}`;
if (rmMeta.passwd !== undefined) {
connectConfig.password = rmMeta.passwd;
Expand Down Expand Up @@ -349,6 +351,9 @@ class ShellExecutor {
let exitCode: number;

const commandIndex = randomInt(10000);
if(this.osCommands !== undefined){
command = this.osCommands.addPreCommand(this.preCommand, command);
}
this.log.debug(`remoteExeCommand(${commandIndex}): [${command}]`);

// Windows always uses shell, and it needs to disable to get it works.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ async function getRemoteFileContentLoop(executor: ShellExecutor): Promise<void>

describe('ShellExecutor test', () => {
let skip: boolean = false;
let isWindows: boolean;
let rmMeta: any;
try {
rmMeta = JSON.parse(fs.readFileSync('../../.vscode/rminfo.json', 'utf8'));
Expand Down Expand Up @@ -86,4 +87,28 @@ describe('ShellExecutor test', () => {
await getRemoteFileContentLoop(executor);
await executor.close();
});

it('Test preCommand-1', async () => {
if (skip) {
return;
}
const executor: ShellExecutor = new ShellExecutor();
await executor.initialize(rmMeta);
const result = await executor.executeScript("ver", false, false);
isWindows = result.exitCode == 0 && result.stdout.search("Windows") > -1;
await executor.close();
});

it('Test preCommand-2', async () => {
if (skip) {
return;
}
const executor: ShellExecutor = new ShellExecutor();
rmMeta.preCommand = isWindows ? "set TEST_PRE_COMMAND=test_pre_command" : "export TEST_PRE_COMMAND=test_pre_command";
await executor.initialize(rmMeta);
const command = isWindows ? "python -c \"import os; print(os.environ.get(\'TEST_PRE_COMMAND\'))\"" : "python3 -c \"import os; print(os.environ.get(\'TEST_PRE_COMMAND\'))\"";
const result = (await executor.executeScript(command, false, false)).stdout.replace(/[\ +\r\n]/g, "");
chai.expect(result).eq("test_pre_command");
await executor.close();
});
});
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ describe('Unit Test for RemoteMachineTrainingService', () => {
Default/.vscode/rminfo.json, whose content looks like:
{
"ip": "10.172.121.40",
"user": "user1",
"password": "mypassword"
"username": "user1",
"passwd": "mypassword"
}
*/
let skip: boolean = false;
Expand Down
6 changes: 4 additions & 2 deletions tools/nni_cmd/config_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -382,7 +382,8 @@ def validate(self, data):
Optional('passphrase'): setType('passphrase', str),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
Optional('maxTrialNumPerGpu'): setType('maxTrialNumPerGpu', int),
Optional('useActiveGpu'): setType('useActiveGpu', bool)
Optional('useActiveGpu'): setType('useActiveGpu', bool),
Optional('preCommand'): setType('preCommand', str)
},
{
'ip': setType('ip', str),
Expand All @@ -391,7 +392,8 @@ def validate(self, data):
'passwd': setType('passwd', str),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
Optional('maxTrialNumPerGpu'): setType('maxTrialNumPerGpu', int),
Optional('useActiveGpu'): setType('useActiveGpu', bool)
Optional('useActiveGpu'): setType('useActiveGpu', bool),
Optional('preCommand'): setType('preCommand', str)
})]
}

Expand Down