-
Notifications
You must be signed in to change notification settings - Fork 1.8k
sharedstorage support remote umount and fix bug #3456
Conversation
@@ -248,17 +257,20 @@ export class RemoteEnvironmentService extends EnvironmentService { | |||
} | |||
this.environmentExecutorManagerMap.set(environment.id, executorManager); | |||
const executor = await this.getExecutor(environment.id); | |||
let remoteWorkingRoot: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remoteWorkingRoot -> remoteWorkingRootDir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modify it
executor.joinPath(executor.getRemoteExperimentRootDir(getExperimentId()), | ||
'envs', environment.id) | ||
remoteWorkingRoot = executor.getRemoteExperimentRootDir(getExperimentId()); | ||
environment.runnerWorkingFolder = executor.joinPath(remoteWorkingRoot, 'envs', environment.id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated with line 263
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix it
} | ||
environment.command = `cd ${environment.runnerWorkingFolder} && \ | ||
environment.command = `cd ${remoteWorkingRoot} && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change to remoteWorkingRoot
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because ${environment.runnerWorkingFolder}
is ./exp_id/env/runner_id
and we have ${environment.command}=`mkdir -p envs/${envId} && cd envs/${envId} && ${environment.command}`
in trialDispatcher.py, so change dir to ./exp_id
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
command also contains sh ../install_nni.sh
, if you only change root folder of the script, it can not find install_nni.sh
file. I have another PR to fix this: #3472, perhaps we could merge the logic.
@@ -348,6 +348,11 @@ class TrialDispatcher implements TrainingService { | |||
for (const commandChannel of this.commandChannelSet) { | |||
await commandChannel.stop(); | |||
} | |||
if (this.useSharedStorage) { | |||
this.log.info(`stopping shared storage...`) | |||
component.get<SharedStorageService>(SharedStorageService).cleanUp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
await?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, add it
throw new Error(result.stderr); | ||
} | ||
} catch (error) { | ||
const errorMessage: string = `${this.storageType} Shared Storage: Mount ${this.nfsServer}:${this.exportedDirectory} to ${this.localMountPoint} failed, error is ${error}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error message is not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix it
throw new Error(result.stderr); | ||
} | ||
} catch (error) { | ||
const errorMessage: string = `${this.storageType} Shared Storage: get account key failed, error is ${error}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the message contains get account key
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is a mistake, thank you, fix it
@@ -124,6 +124,12 @@ class LinuxCommands extends OsCommands { | |||
command = `bash '${script}'`; | |||
} else { | |||
script = script.replace(/"/g, '\\"'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like resolving escape characters but can't understand what exactly it is doing.
Please describe what kind of script are you trying to escape or unescape. And if it is possible, we should avoid handling escape manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wants to solve the situation that bash -c
nested echo "some command \\"something\\""
. This happens in sharedstorage mount command.
No description provided.