Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

quick fix unknow trial report to resume experiment #3096

Merged
merged 3 commits into from
Nov 25, 2020

Conversation

J-shang
Copy link
Contributor

@J-shang J-shang commented Nov 17, 2020

It should be noted that this may still cause some errors. The fundamental solution is to send trialId or parameter together when ReportMetricData. Or let dispatcher generate trialId.

Involves four JobRestServer, dlts, kubernetes, pai, remote. Need check if TrialJobId exist before emit metric event.

Add trial job id filtering function in nnimanager.

self.tuner.receive_trial_result(id_, _trial_params[id_], value, customized=customized,
trial_job_id=data.get('trial_job_id'))
else:
_logger.info('Find unknown job parameter id %s, maybe something goes wrong.', _trial_params[id_])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to use warning

self.tuner.receive_trial_result(id_, _trial_params[id_], value, customized=customized,
trial_job_id=data.get('trial_job_id'))
else:
_logger.info('Find unknown job parameter id %s, maybe something goes wrong.', _trial_params[id_])
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this return is useless

}
this.dispatcher.sendCommand(REPORT_METRIC_DATA, metric.data);
} else {
this.log.error(`NNIManager received non-existent trial job metrics: ${metric}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also warning might be better here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed them

@liuzhe-lz liuzhe-lz merged commit 2c5d89a into microsoft:master Nov 25, 2020
@J-shang J-shang deleted the fix-unknown branch December 15, 2020 02:31
@J-shang J-shang restored the fix-unknown branch December 15, 2020 02:31
@J-shang J-shang deleted the fix-unknown branch December 15, 2020 02:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants