Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

execDuration does not incrase when TUNER_NO_MORE_TRIAL #2758

Closed
prismformore opened this issue Aug 1, 2020 · 4 comments · Fixed by #3043
Closed

execDuration does not incrase when TUNER_NO_MORE_TRIAL #2758

prismformore opened this issue Aug 1, 2020 · 4 comments · Fixed by #3043
Assignees

Comments

@prismformore
Copy link

prismformore commented Aug 1, 2020

Environment:

  • NNI version: v1.7
  • NNI mode (local|remote|pai): local
  • Client OS: ubuntu 16.04
  • Server OS (for remote mode only):
  • Python version: .3.6
  • PyTorch/TensorFlow version: Pytorch 1.6
  • Is conda/virtualenv/venv used?: Yes
  • Is running in Docker?: Yes

Log message:
`$ nnictl experiment show
return:
"execDuration":1,
......
'

What issue meet, what's expected?:
When using BOHB, execDuration does not increase when status=='TUNER_NO_MORE_TRIAL', because it only increases when status == 'RUNNING', as shown here:

if (this.status.status === 'RUNNING') {

However, I think the time consumption of waiting for the trials to finish should also be considered. The trials ARE running.

As mentioned in this issue,

Simply put, it goes through the process of: [generate n1 parameters] -> [get n1 parameters' metrics to update BO model](indicates with TUNER_NO_MORE_TRIAL) -> [generate n2 parameters based on new model] -> [get n2 parameters] .....

The 'TUNER_NO_MORE_TRIAL' is supposed to be very long (most experiment time is spent on training and evaluating the trials, which means the status is 'TUNER_NO_MORE_TRIAL').

How to reproduce it?:
Use BOHB with limited computation resources.
$ nnictl experiment show

Additional information:

@prismformore
Copy link
Author

And it is probably the same for the 'NO_MORE_TRIAL' status

@prismformore
Copy link
Author

Anybody noticing this issue? I think this problem is quite critical.

@liuzhe-lz
Copy link
Contributor

Ooops, don't know why I missed this issue. We will investigate this problem soon.

@J-shang J-shang self-assigned this Oct 28, 2020
@J-shang J-shang mentioned this issue Oct 28, 2020
77 tasks
@J-shang
Copy link
Contributor

J-shang commented Nov 1, 2020

Very grateful for your issue. The good news is that this problem will be fixed in version 2.0 ( #3043 ) and this issue will be closed. If there are any questions, please continue to communicate with us, we will confirm and resolve them as soon as possible.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants