-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cactus pipeline hangs when attempting to send TERMINATE
signal to ktserver process under Singularity 3
#60
Comments
See my previous comments regarding this issue: #57 (comment), repeated here for clarity: I am now simple trying to run the example workflow on our interactive node and it gets "stuck" on the |
I'm presuming that the code is somehow stuck in this loop: https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/src/cactus/pipeline/ktserverControl.py#L104 I can issue the command
and I get the response:
with no error, which makes me think there is no communication error, but that somehow the parameter isn't being recognized properly? |
I'm fairly sure this issue is due to a problem sending the cactus/src/cactus/pipeline/ktserverControl.py Line 120 in d9039e7
|
TERMINATE
signal to ktserver process under Singularity 3
I have identified a workaround for now, but would like to discuss a more robust solution going forward. The workaround is to use a Singularity definition file to build the image, replacing the default
The command to build the image was:
|
This appears to be fixed in Singularity 3.1.0 release candidates. |
I am using singularity 3.2.1 and still having the issue of ktserver process not being terminated. I also tried your workaround of building a container with the definition file you suggested and the behaviour is the same. |
Perhaps you could share some of the log output? It may be an unrelated issue? |
Sure.
And from the log of the particular job
General version info
|
A colleague and I discovered (in the context of a different application) that Singularity 3.2 wasn't propagating signals to processes within the container. We Slacked the Singularity developers, and the issue was fixed in Singularity 3.3 (sylabs/singularity@f7b429f) |
When running the example data on our Centos6 system, interactively (no job manager) I run into a situation where the
SavePrimaryDB
step (among others) hangs. The problem is the DB service is not being terminated as it should be because theSIGINT
signal isn't passed through the default Singularity 3 runscript. Under Singularity 2, the runscript used anexec
command, but under Singularity 3, the runscript uses aneval
command. This has the effect that signals are not passed down to the/opt/cactus/wrapper.sh
script.The text was updated successfully, but these errors were encountered: