Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow running processes in such a way that stop signals are preserved #3461

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rabidaudio
Copy link
Contributor

One-off processes, including convox run, convox exec, and timers don't pass stop signals to processes properly.

For exec and non-detached runs, this makes sense, as Convox runs a sleep {timeout} command in the container and then attaches a secondary process to it (a la docker exec). But for timers and detached runs, SIGTERM is never received by the process. The problem stems from the use of ["sh", "-c", command], as sh -c does not propagate signals to apps.

This violates 12 factor as processes aren't given the opportunity to run any cleanup tasks. While apps should also be robust against the occasional SIGTERM, Convox should do it's best to allow more graceful stoppage.

Our use-case revolves around spinning up ETL pipelines on regular intervals, but really any worker process use-case that that wants to use Convox's autoscaling features to create processes on-demand will benefit from this.

This change allows you to disable the shell wrapper for detached runs. Simply removing sh -c would likely be a breaking change for users who have come to expect their commands to be run in a shell, so instead I put the behavior behind a flag. I called is flag "bare" but I don't love the name, perhaps [--no]-shell, --shell=false or similar would be better. Alternatively, we could use a rack parameter, or we could make this the default behavior going forward with a version check. Looking for guidance on this.

convox run --detach app python test.py # current behavior, passes ['sh', '-c', 'python test.py'] to container
convox run --detach --bare app python test.py # passes ['python', 'test.py'] directly to container

I only added support to the aws provider, as the k8s provider (and local by extension) already behave this way.

Note this MR does not fix the same issue for timers, as I wasn't sure of a good backwards-compatible solution.

By the way, detached runs don't actually support the timeout option. This could be resolved by using the unix timeout command. I could add that to this MR also.

@rabidaudio
Copy link
Contributor Author

I wasn't totally correct when I said sh -c doesn't propagate signals. Rather, it doesn't when run as a docker command. Take this example:

import signal
import sys
import time
import os

signals = [
  signal.SIGINT, signal.SIGTERM,
  signal.SIGQUIT, signal.SIGTRAP, signal.SIGUSR1,
  signal.SIGHUP, signal.SIGSEGV, signal.SIGUSR2,
  signal.SIGILL,
]

def handler(signum, frame):
  s = [s for s in signals if s.value == signum]
  print(f"{os.getpid()} got signal: {s}")
  sys.exit(0)

def main():
  for s in signals:
    signal.signal(s, handler)
  print("running....")
  while True:
    time.sleep(5)

if __name__ == "__main__":
  main()
FROM python:3
ENV PYTHONUNBUFFERED=true
COPY test.py test.py
CMD ["python3", "test.py"]

sh -c works as expected:

$ sh -c 'python3 test.py' &
[1] 15280
running....
$ kill 15280
15280 got shutdown: 15
[1]+  Done                    sh -c 'python3 test.py'

... and without sh -c, docker stops as expected:

$ docker build -t signal-test .
$ docker run -d -t signal-test python3 test.py
e174b1b026812cfe8c10223de854210cd183b8de0bc36cbc24f3cd356dfc27a4
$ docker stop e174b1b026812cfe8c10223de854210cd183b8de0bc36cbc24f3cd356dfc27a4
e174b1b026812cfe8c10223de854210cd183b8de0bc36cbc24f3cd356dfc27a4
$ docker logs e174b1b026812cfe8c10223de854210cd183b8de0bc36cbc24f3cd356dfc27a4
running....
1 got shutdown: 15

...but running sh -c with docker does not:

$ docker run -d -t signal-test sh -c 'python3 test.py'
e605023a84f3215c77bb31619b5a2879e867d389c781c84b38b61f134b717095
$ docker stop e605023a84f3215c77bb31619b5a2879e867d389c781c84b38b61f134b717095
# hangs for 30 seconds waiting for process to exit due to SIGTERM, then sends SIGKILL
e605023a84f3215c77bb31619b5a2879e867d389c781c84b38b61f134b717095 
$ docker logs e605023a84f3215c77bb31619b5a2879e867d389c781c84b38b61f134b717095
running....

@rabidaudio
Copy link
Contributor Author

Okay I feel a little silly now. The problem is on Docker Linux environments sh is sim-linked to dash, which does not propagate signals. bash happily will though, so a much easier solution is:

RUN ln -f $(which bash) $(which sh)

This then works for timers too.

@Twsouza
Copy link
Contributor

Twsouza commented Jul 25, 2022

Okay I feel a little silly now. The problem is on Docker Linux environments sh is sim-linked to dash, which does not propagate signals. bash happily will though, so a much easier solution is:

RUN ln -f $(which bash) $(which sh)

This then works for timers too.

@rabidaudio would that have to be done in each app's Dockerfile?

@rabidaudio
Copy link
Contributor Author

@Twsouza yes, we put this at the beginning of all of our dockerfiles now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants