Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Reporting multiple metrics and then resuming crashes SMAC tuner #2140

Closed
arvoelke opened this issue Mar 9, 2020 · 1 comment · Fixed by #2143
Closed

Reporting multiple metrics and then resuming crashes SMAC tuner #2140

arvoelke opened this issue Mar 9, 2020 · 1 comment · Fixed by #2143
Assignees
Labels

Comments

@arvoelke
Copy link

arvoelke commented Mar 9, 2020

Short summary about the issue/question:

Reporting a dictionary of metrics, stopping the experiment, and then resuming it, will crash the SMAC tuner.

How to reproduce it:

Run an experiment that uses the SMAC tuner and contains calls of the form: nni.report_intermediate_result({'default': loss, 'other_metric': some_other_value}). Let it run for at least one trial. Run nnictl stop <experiment> and then nnictl resume <experiment>.

Contents of dispatcher.log:

[03/09/2020, 03:51:05 PM] INFO (nni.msg_dispatcher_base/MainThread) Start dispatcher
[03/09/2020, 03:51:05 PM] INFO (nni.tuner/MainThread) Load checkpoint ignored by tuner, checkpoint path: /home/arvoelke/nni/experiments/NdfeH2R1/checkpoint
[03/09/2020, 03:51:05 PM] INFO (nni.assessor/MainThread) Load checkpoint ignored by assessor, checkpoint path: /home/arvoelke/nni/experiments/NdfeH2R1/checkpoint
[03/09/2020, 03:51:06 PM] INFO (smac_AutoML/Thread-1) update search space in SMAC.
[03/09/2020, 03:51:06 PM] INFO (smac_AutoML/Thread-1) SMAC call: /home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/__main__.py --exp_params eyJhdXRob3JOYW1lIjoiYXJ2b2Vsa2UiLCJleHBlcmltZW50TmFtZSI6IkxNVSIsInRyaWFsQ29uY3VycmVuY3kiOjEsIm1heEV4ZWNEdXJhdGlvbiI6ODYzMTM2MDAsIm1heFRyaWFsTnVtIjo5OTk5OSwidHJhaW5pbmdTZXJ2aWNlUGxhdGZvcm0iOiJsb2NhbCIsInR1bmVyIjp7ImJ1aWx0aW5UdW5lck5hbWUiOiJTTUFDIiwiY2xhc3NBcmdzIjp7Im9wdGltaXplX21vZGUiOiJtaW5pbWl6ZSJ9LCJjaGVja3BvaW50RGlyIjoiL2hvbWUvYXJ2b2Vsa2Uvbm5pL2V4cGVyaW1lbnRzL05kZmVIMlIxL2NoZWNrcG9pbnQifSwiYXNzZXNzb3IiOnsiY29kZURpciI6Ii9ob21lL2Fydm9lbGtlL2dpdC9ubmktcHl0b3JjaC1rYWxkaS9jb25maWcvLi4iLCJjbGFzc0ZpbGVOYW1lIjoiYXNzZXNzb3IucHkiLCJjbGFzc05hbWUiOiJGaXhlZE1lZGlhbnN0b3BBc3Nlc3NvciIsImNsYXNzQXJncyI6eyJvcHRpbWl6ZV9tb2RlIjoibWluaW1pemUiLCJzdGFydF9zdGVwIjoyfSwiY2hlY2twb2ludERpciI6Ii9ob21lL2Fydm9lbGtlL25uaS9leHBlcmltZW50cy9OZGZlSDJSMS9jaGVja3BvaW50In0sInZlcnNpb25DaGVjayI6dHJ1ZSwiY2x1c3Rlck1ldGFEYXRhIjpbeyJrZXkiOiJjb2RlRGlyIiwidmFsdWUiOiIvaG9tZS9hcnZvZWxrZS9naXQvbm5pLXB5dG9yY2gta2FsZGkvY29uZmlnLy4uIn0seyJrZXkiOiJjb21tYW5kIiwidmFsdWUiOiJweXRob24gcnVuX25uaS5weSBMTVUifV19
[03/09/2020, 03:51:06 PM] PRINT INFO:   Reading scenario file: scenario.txt

[03/09/2020, 03:51:06 PM] PRINT INFO:   Output to smac3-output_2020-03-09_15:51:06_571519

[03/09/2020, 03:51:06 PM] PRINT INFO:   Importing data, current processing progress 0 / 109

[03/09/2020, 03:51:06 PM] PRINT ERROR:  unsupported operand type(s) for +: 'float' and 'collections.OrderedDict'
Traceback (most recent call last):
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher_base.py", line 90, in command_queue_worker
    self.process_command(command, data)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher_base.py", line 149, in process_command
    command_handlers[command](data)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/msg_dispatcher.py", line 118, in handle_import_data
    self.tuner.import_data(data)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/nni/smac_tuner/smac_tuner.py", line 332, in import_data
    self.smbo_solver.nni_smac_receive_first_run(config, _value)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/smac/optimizer/smbo.py", line 175, in nni_smac_receive_first_run
    cost=reward, time=-1, status=StatusType.SUCCESS)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/smac/runhistory/runhistory.py", line 168, in add
    self._add(k, v, status, origin)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/smac/runhistory/runhistory.py", line 197, in _add
    self.incremental_update_cost(self.ids_config[k.config_id], v.cost)
  File "/home/arvoelke/anaconda3/envs/*/lib/python3.7/site-packages/smac/runhistory/runhistory.py", line 256, in incremental_update_cost
    (old_cost * n_runs) + cost) / (n_runs + 1)
TypeError: unsupported operand type(s) for +: 'float' and 'collections.OrderedDict'

[03/09/2020, 03:51:11 PM] PRINT INFO:   Dispatcher exiting...

[03/09/2020, 03:51:14 PM] PRINT INFO:   Terminated by NNI manager

nni Environment:

  • nni version: master
  • nni mode(local|pai|remote): local
  • OS: Ubuntu 18.04
  • python version Python 3.7.6:
  • is conda or virtualenv used?: conda
  • is running in docker?: no

need to update document(yes/no): no

Anything else we need to know:

Location of call that triggers the error:

self.smbo_solver.nni_smac_receive_first_run(config, _value)

Related to #2117 / #2121.

@QuanluZhang
Copy link
Contributor

@arvoelke thanks for reporting this bug. This is because smac tuner does not correctly deal with final metric in dict. I will fix it soon.

@QuanluZhang QuanluZhang linked a pull request Mar 11, 2020 that will close this issue
@scarlett2018 scarlett2018 added HPO bug Something isn't working labels Apr 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants