Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Support 3rd-party training service #3662

Merged
merged 12 commits into from
May 27, 2021
Merged

Conversation

liuzhe-lz
Copy link
Contributor

Doc and example (template) will be added later.

@liuzhe-lz liuzhe-lz requested review from SparkSnail and ultmaster May 21, 2021 10:55
@ultmaster ultmaster linked an issue May 22, 2021 that may be closed by this pull request
} else {
configPath = path.join(os.homedir(), '.config/nni', fileName);
}
return fs.readFileSync(configPath).toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not consistent with the logic in nni.runtime.config. What will happen in the case of conda?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops, I forgot.


export async function getCustomEnvironmentServiceConfig(name: string): Promise<CustomEnvironmentServiceConfig | null> {
const configJson = await readConfigFile('training_services.json');
const config = JSON.parse(configJson);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try catch format error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file is programmatically generated by NNI and this function is only invoked when user asks a custom training service.
So I prefer to just crash if the json file is broken.

@@ -32,9 +31,9 @@ export class OpenPaiEnvironmentService extends EnvironmentService {
private experimentId: string;
private config: FlattenOpenpaiConfig;

constructor(config: ExperimentConfig) {
constructor(_experimentRootDir: string, experimentId: string, config: ExperimentConfig) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pass rootDir and expId from constructor function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they are stored in "module scope objects" and custom training services cannot access them.
We must either (1) provide stateful API function to get experiment ID, (2) pass experiment ID through parameter.
According to the meeting last week, (2) is preferred by most people.

print_error('Bad experiment config class')
return

try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if user register an existed training service name? such as 'local' or 'aml'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's a problem, I'll push a fix.

@liuzhe-lz liuzhe-lz merged commit 277e63f into microsoft:master May 27, 2021
@liuzhe-lz liuzhe-lz deleted the custom-ts branch May 27, 2021 07:08
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve support for custom training service
3 participants