Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New daemon 2) running as daemon process #1055

Closed
muhrin opened this issue Jan 17, 2018 · 7 comments
Closed

New daemon 2) running as daemon process #1055

muhrin opened this issue Jan 17, 2018 · 7 comments

Comments

@muhrin
Copy link
Contributor

muhrin commented Jan 17, 2018

The daemon from step 1 (see #1054) needs to be set up to run as a OS daemon service. Either using celeryd or something else.

@sphuber
Copy link
Contributor

sphuber commented Jan 17, 2018

This will also involve, address and potentially solve other issues:

@muhrin
Copy link
Contributor Author

muhrin commented Jan 24, 2018

I can think of a couple of ways to go:

  1. Keep celeryd and make each celery worker just launch a Runner
  2. Rip out celeryd and use another library to handle the daemonisation and use mutiprocessing to instantiate mutiple Runners

Because of the heavyweight nature of celeryd and the issues we already have with it requiring a broker with a other than SQLA in recent versions (see requirements file) I would favour ripping it out.

@muhrin
Copy link
Contributor Author

muhrin commented Jan 25, 2018

@DropD is currently looking into daemonize as a potential demonizer library

@muhrin
Copy link
Contributor Author

muhrin commented Jan 26, 2018

@DropD will have a look at circus. Things to check:

  • Logging to file works including settings AiiDA log level
  • Is there log rotation support?
  • How are child process kills handled? Is there a signal we can catch to exit gracefully?
  • Check how restarting of crashed processses works - is there exponential backoff?
  • Check API is capable of giving us a verdi daemon status like report?
  • Does API support start/stop hooks (without using their CLI tools)?

@DropD
Copy link
Contributor

DropD commented Jan 26, 2018

Restarting:

  • Default behaviour is to always restart immediately (implied in docs, manually verified by sending sigkill multiple times and watching restarts happen).
  • There is a builtin plugin that provides a service watching for runners that restart often (tunable frequency) and times them out for a user set amount of time. If exponential backoff is required, maybe the plugin would be easy to modify thusly.
    The plugin, which is called "Flapping" has yet to be tested.

Logging:

  • python logging does it's own thing, so aiida logs still get written exactly the same regardless (not tested)
  • circus deamon uses python logger too, can rotate logs
  • redirecting stdout / stderr into file only works if starting from a config file, not for dynamically added watchers.

@muhrin
Copy link
Contributor Author

muhrin commented Jan 29, 2018

@DropD good work. I'm happy without exponential backoff for now.
Let me know if you get a chance to check the other issues above and then maybe you could get a working proof of concept - I'd be very happy to have a look

sphuber added a commit to sphuber/aiida-core that referenced this issue Feb 20, 2018
To have the daemon test script run on Travis, we had to change
the JobCalculations to go properly through the Process level.
However, that means that the old daemon, currently launched by
verdi daemon start, which launches a celery worker, won't be
able to run them at all.

Instead we temporarily replace the celery worker with a subprocess
call to `verdi devel run_daemon` which will run the new daemon
in the background. Note that this is not actually daemonized, but
just runs a DaemonRunner in a separate process. For the tests this
should work for the time being, until issue aiidateam#1055 is fixed, that
will implement a properly daemonized version of the daemon runner.
@sphuber
Copy link
Contributor

sphuber commented Mar 7, 2018

Fixed in #1217

@sphuber sphuber closed this as completed Mar 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants