verdi daemon status command fails #2485

ltalirz · 2019-02-15T13:56:50Z

This is on provenance_redesign:

From time to time, I get an error when running verdi daemon status

$ verdi daemon status
Profile: test_qb
Traceback (most recent call last):
  File "/Users/leopold/Applications/miniconda3/envs/aiida_rmq/bin/verdi", line 10, in <module>
    sys.exit(verdi())
  File "/Users/leopold/Applications/miniconda3/envs/aiida_rmq/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Users/leopold/Applications/miniconda3/envs/aiida_rmq/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Users/leopold/Applications/miniconda3/envs/aiida_rmq/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/leopold/Applications/miniconda3/envs/aiida_rmq/lib/python2.7/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/leopold/Applications/miniconda3/envs/aiida_rmq/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/leopold/Applications/miniconda3/envs/aiida_rmq/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Users/leopold/Personal/Postdoc-MARVEL/repos/aiida/aiida_rmq/aiida/cmdline/commands/cmd_daemon.py", line 82, in status
    result = get_daemon_status(client)
  File "/Users/leopold/Personal/Postdoc-MARVEL/repos/aiida/aiida_rmq/aiida/cmdline/utils/daemon.py", line 72, in get_daemon_status
    worker_row = [worker_pid, worker_info['mem'], worker_info['cpu'], format_local_time(worker_info['create_time'])]
TypeError: string indices must be integers, not str

The text was updated successfully, but these errors were encountered:

ltalirz · 2019-05-22T04:23:57Z

Haven't seen this in a while. Closing now - happy to reopen if this resurfaces.

ConradJohnston · 2019-12-17T11:19:27Z

Hi @ltalirz ,
I'm seeing this with a fresh install of the develop branch.

ltalirz · 2019-12-17T12:33:55Z

Hi @ConradJohnston - thanks for the report.
Would you mind printing the content of worker_info (and perhaps even worker_response)?

ConradJohnston · 2019-12-17T12:35:34Z

Some more info:

I can replicate the fault quite reliably If I run the command in quick succession like this:
verdi daemon status ; verdi daemon status

It seems to be due to this line:

aiida-core/aiida/cmdline/utils/daemon.py

Line 66 in 999ae3a

worker_response = client.get_worker_info()

EDIT:
In réponse to Leo's comment:

This is the content of worker_response when it works:
{'status': 'ok', 'time': 1576585659.221961, 'name': 'aiida-production', 'info': {'4990': {'mem_info1': '37M', 'mem_info2': '4G', 'cpu': 0.0, 'mem': 0.231, 'ctime': '0:00.48', 'pid': 4990, 'username': 'cjohnson', 'nice': 0, 'create_time': 1576585658.730482, 'age': 0.48801684379577637, 'cmdline': 'python3.6', 'children': [], 'started': 1576585658.7299762, 'wid': 1}}, 'id': '4e1d768a522a44b59f85039806f9af14'}

and when it fails:
{'status': 'ok', 'time': 1576585660.2456262, 'name': 'aiida-production', 'info': {'4990': 'No such process (stopped?)'}, 'id': '148af3087f9347fb98ef3e58985e6e84'}

ltalirz · 2019-12-17T12:51:32Z

@sphuber as discovered by conrad, worker_response['info'] contains an error message when it doesn't find the worker.
Could you perhaps provide some guidance on where this should be fixed?

sphuber · 2019-12-17T13:04:22Z

I myself cannot reproduce the behavior even when calling the command twice consecutively and even for a very busy daemon. However this is on a powerful server. I take it this problem is transient @ConradJohnston and the command will work when issued again some time after it failed? It just seems that when called in quick succession sometimes the circus daemon process fails to poll one or multiple of the daemon workers. I guess there will always be a possibility for this so we should simply add error handling code in the get_daemon_status function. I will make a PR.

ConradJohnston · 2019-12-17T14:37:52Z

@sphuber - It's indeed transient. I cannot always reproduce it, even when using a loop to hammer the DB, while at other times it simply happens. There does seem to be some sort of performance issue occurring with my fresh Postgres installation though, which I suppose this is a symptom of.

sphuber · 2019-12-17T14:39:27Z

This should have nothing to do with the database, it does not touch it at all.

ConradJohnston · 2019-12-17T15:21:45Z

@sphuber Hmm, I'm experiencing this quite frequently even without issuing the commands in succession. Your PR gives some relief - but what is the underlying cause of this problem? I haven't seen this behaviour for other installations on other machines.

ltalirz closed this as completed May 22, 2019

ltalirz reopened this Dec 17, 2019

ltalirz closed this as completed Dec 17, 2019

ltalirz reopened this Dec 17, 2019

sphuber self-assigned this Dec 17, 2019

sphuber added priority/nice-to-have topic/daemon topic/verdi type/bug labels Dec 17, 2019

sphuber added this to the v1.1.0 milestone Dec 17, 2019

sphuber mentioned this issue Dec 17, 2019

Deal with unreachable daemon worker in get_daemon_status #3683

Merged

sphuber closed this as completed in #3683 Dec 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

verdi daemon status command fails #2485

verdi daemon status command fails #2485

ltalirz commented Feb 15, 2019

ltalirz commented May 22, 2019

ConradJohnston commented Dec 17, 2019

ltalirz commented Dec 17, 2019

ConradJohnston commented Dec 17, 2019 •

edited

Loading

ltalirz commented Dec 17, 2019 •

edited

Loading

sphuber commented Dec 17, 2019

ConradJohnston commented Dec 17, 2019

sphuber commented Dec 17, 2019

ConradJohnston commented Dec 17, 2019 •

edited

Loading

verdi daemon status command fails #2485

verdi daemon status command fails #2485

Comments

ltalirz commented Feb 15, 2019

ltalirz commented May 22, 2019

ConradJohnston commented Dec 17, 2019

ltalirz commented Dec 17, 2019

ConradJohnston commented Dec 17, 2019 • edited Loading

ltalirz commented Dec 17, 2019 • edited Loading

sphuber commented Dec 17, 2019

ConradJohnston commented Dec 17, 2019

sphuber commented Dec 17, 2019

ConradJohnston commented Dec 17, 2019 • edited Loading

ConradJohnston commented Dec 17, 2019 •

edited

Loading

ltalirz commented Dec 17, 2019 •

edited

Loading

ConradJohnston commented Dec 17, 2019 •

edited

Loading