Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrecognised job #5978

Closed
mbercx opened this issue Apr 23, 2023 · 2 comments
Closed

Unrecognised job #5978

mbercx opened this issue Apr 23, 2023 · 2 comments
Labels

Comments

@mbercx
Copy link
Member

mbercx commented Apr 23, 2023

Describe the bug

Not sure if this is a bug, but when running with run_get_node I get these Unrecognized job_state '?' for job id 15748 warnings.

04[/23/2023](https://file+.vscode-resource.vscode-cdn.net/23/2023) 08:08:09 AM <13933> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [197|PwBaseWorkChain|run_process]: launching PwCalculation<202> iteration #1
04[/23/2023](https://file+.vscode-resource.vscode-cdn.net/23/2023) 08:08:15 AM <13933> aiida.scheduler.direct: [WARNING] Unrecognized job_state '?' for job id 15748
04[/23/2023](https://file+.vscode-resource.vscode-cdn.net/23/2023) 08:08:26 AM <13933> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [197|PwBaseWorkChain|results]: work chain completed after 1 iterations
04[/23/2023](https://file+.vscode-resource.vscode-cdn.net/23/2023) 08:08:26 AM <13933> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [197|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned

Note that these don't show up in the report:

(mlist) mbercx@Marniks-Mac-mini code % verdi process report 197
2023-04-23 08:08:09 [32 | REPORT]: [197|PwBaseWorkChain|run_process]: launching PwCalculation<202> iteration #1
2023-04-23 08:08:26 [33 | REPORT]: [197|PwBaseWorkChain|results]: work chain completed after 1 iterations
2023-04-23 08:08:26 [34 | REPORT]: [197|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned

I'm using my M1 with an updated version of RabbitMQ and the "consumer_timeout" solution. Could be related to that?

Steps to reproduce

Install a fresh environment with the versions below and run the PwBaseWorkChain using run_get_node().

Your environment

  • Operating system [e.g. Linux]: macOS Monterey
  • Python version [e.g. 3.7.1]: Python 3.9.16
  • aiida-core version [e.g. 1.2.1]: v2.3.0
  • RabbitMQ: 3.11.13
  • Postgres: psql (PostgreSQL) 14.7 (Homebrew)
@sphuber
Copy link
Contributor

sphuber commented Apr 23, 2023

Don't think this should have anything to do with RabbitMQ. This should be a message from the Scheduler that is parsing the output when requesting the status of calcjobs. Are you using the DirectScheduler by any chance? Think this is a duplicate of #3107 . Given that both Chris and Leopold were using Mac's at the time, I think this has to do with the DirectScheduler on MacOS.

The plugin executed ps -xo pid,stat,user,time JOB_ID to get the status. Try running that manually to see the raw output. Apparently the job state contains ? instead of any of the known ones. You can start investigating there why that may be the case. I don't have access to a Mac (un)fortunately.

@mbercx
Copy link
Member Author

mbercx commented Apr 23, 2023

Right, another issue I just quickly opened without doing me due diligence 😓. Will close and discuss with @chrisjsewell at some point, but yeah it's not a priority since it's just a bit of extra noise.

@mbercx mbercx closed this as completed Apr 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants