Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

support display trial log on local mode #2718

Merged
merged 34 commits into from
Aug 12, 2020
Merged

support display trial log on local mode #2718

merged 34 commits into from
Aug 12, 2020

Conversation

JunweiSUN
Copy link
Contributor

No description provided.

@scarlett2018 scarlett2018 mentioned this pull request Jul 24, 2020
66 tasks
@Lijiaoa
Copy link
Contributor

Lijiaoa commented Jul 28, 2020

Approve for webui changes.

@JunweiSUN JunweiSUN changed the title support display trial log on local mode on backend support display trial log on local mode Jul 30, 2020
@@ -325,6 +325,10 @@ class NNIManager implements Manager {
// FIXME: unit test
}

public async getTrialLog(trialJobId: string, logType: LogType): Promise<string> {
return this.trainingService.getTrialLog(trialJobId, logType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be better to catch MethodNotImplementedError here, then return log message like XXX training service does not support retrieving log.

Copy link
Contributor Author

@JunweiSUN JunweiSUN Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain how did you fix this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest commit, I revert this to the original version (i.e., not handle the error here) since the front end will check the training service now. The error here could also be an file operation error, so I let the restHander to catch the error throwed here.

} else {
throw new Error('unexpected log type');
}
return fs.promises.readFile(logPath, 'utf8');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this readFile() function need to catch exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception should be catched by getTrialLog in restHandler.ts

@@ -8,6 +8,8 @@
*/
type TrialJobStatus = 'UNKNOWN' | 'WAITING' | 'RUNNING' | 'SUCCEEDED' | 'FAILED' | 'USER_CANCELED' | 'SYS_CANCELED' | 'EARLY_STOPPED';

type LogType = 'TRIAL_LOG' | 'TRIAL_STDERR';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does TRIAL_LOG means stdout log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, corresponding to trial.log.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed and decided that it should be ERROR_MESSAGE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to TRIAL_LOG and TRIAL_ERROR

localTrainingService.run()
await delay(5000);
//chai.expect(jobDetail.status).to.be.equals('RUNNING');
chai.expect(await localTrainingService.getTrialLog(jobDetail.id, 'TRIAL_LOG')).to.be.a('string');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not compare with 'This is trial log'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part need refactor, since the submitted jobs are not actually "running". The run function doesn't take effect. After sync with liuzhe, it may be better to add this test in IT. Maybe I should arrange an meeting to discuss issues about this pr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in line 91, you already run the training service, why it does not take effect? @liuzhe-lz @chicm-ms

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add a generate_logfile function in mockedTrial.py to generate mock logfiles. However, these files seems not generated after run().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

fs.rmdirSync(path.join(rootDir, 'trials'))

await localTrainingService.cancelTrialJob(jobDetail.id);
}).timeout(20000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is testing training service, do we need to also test nnimanager or restserver?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This UT for getTrialLog looks OK for me, we only want to test getTrialLog here, we do not care about other parts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -72,6 +74,38 @@ describe('Unit Test for LocalTrainingService', () => {
chai.expect(jobDetail.status).to.be.equals('USER_CANCELED');
}).timeout(20000);

it('Get trial log', async () => {
// set meta data
const trialConfig: string = `{\"command\":\"python3 mockedTrial.py\", \"codeDir\":\"${localCodeDir}\",\"gpuNum\":0}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether this mockedTrial.py will cause some conflicts regarding log content. It may generate log which make the log file content is not as expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about change the command to sleep 5 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. now use the original trialConifig in line 23

@ultmaster ultmaster merged commit 10c177c into microsoft:master Aug 12, 2020
LovPe pushed a commit to LovPe/nni that referenced this pull request Aug 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants