Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic container healthcheck #2625

Merged
merged 7 commits into from
Jun 7, 2022

Conversation

casperklein
Copy link
Member

@casperklein casperklein commented Jun 6, 2022

Description

This PR adds a basic healthcheck for DMS. The container is considered healthy, when postfix is running and listening on port tcp/25.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (non-breaking change that does improve existing functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (README.md or the documentation under docs/)
  • If necessary I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@casperklein casperklein added kind/improvement Improve an existing feature, configuration file or the documentation area/configuration (file) labels Jun 6, 2022
@casperklein casperklein added this to the v11.1.0 milestone Jun 6, 2022
@casperklein casperklein requested a review from a team June 6, 2022 10:51
@casperklein casperklein self-assigned this Jun 6, 2022
@georglauterbach
Copy link
Member

georglauterbach commented Jun 6, 2022

Nice! Didn‘t know Compose has now received this feature as well.

How does Compose handle the check though? Is it only executed once? Is there some initial timeout? I have set up checks as well (though for Kubernetes) that I will share later today here as well so we can have a look as well (for the future, as a point of reference, not necessarily for this PR).

@casperklein
Copy link
Member Author

The check is run first after $START_PERIOD, then every $INTERVAL.

These are the default values:

    healthcheck:
      start_period: 30s
      interval: 30s
      timeout: 30s
      retries: 3

For more information see the docs 😉

@georglauterbach
Copy link
Member

Alright, I see.


Here are my probes:

livenessProbe:
  exec:
    command:
      - /bin/bash
      - -c
      - supervisorctl status postfix | grep -q 'RUNNING'
  initialDelaySeconds: 0
  periodSeconds: 300
  timeoutSeconds: 5
  successThreshold: 1
  failureThreshold: 3

readinessProbe:
  exec:
    command:
      - /bin/bash
      - -c
      - supervisorctl status postfix | grep -q 'RUNNING'
  initialDelaySeconds: 15
  periodSeconds: 300
  timeoutSeconds: 5
  successThreshold: 1
  failureThreshold: 3

startupProbe:
  exec:
    command: [/bin/bash, /tmp/docker-mailserver/startup_probe.sh]
  initialDelaySeconds: 15
  periodSeconds: 5
  timeoutSeconds: 5
  successThreshold: 1
  failureThreshold: 5

I admit I like the usage of the ss command more than my current version that utilizes supervisorctl.

I would, however, really like to see the "long" options for ss being, which makes it clear for everyone which flags are used and why.

docker-compose.yml Outdated Show resolved Hide resolved
georglauterbach
georglauterbach previously approved these changes Jun 6, 2022
@georglauterbach
Copy link
Member

georglauterbach commented Jun 6, 2022

For the future, I have devised a more elaborate script we could put in the docs for those that want to do more elaborate health checks:

#! /bin/bash

set -eEu -o pipefail
shopt -s inherit_errexit

if ! supervisorctl status postfix | grep -q 'RUNNING'
then
  printf 'Postfix is not running\n' >/tmp/dms/lr-probe.error-cause.txt
  exit 1
fi

STATUS_FILE='/tmp/dms/lr-probe.ss-current-state.txt'
ss --tcp --ipv4 --listening --numeric >"${STATUS_FILE}"

GREP_PATTENS=(
  '0\.0\.0\.0:25'      # Postfix transfer
  '0\.0\.0\.0:143'     # Dovecot IMAP
  '0\.0\.0\.0:465'     # Postfix ESMTP explicit
  '0\.0\.0\.0:587'     # Postfix ESMTP implicit
  '0\.0\.0\.0:993'     # Dovecot IMAPS
  '127\.0\.0\.1:8891'  # OpenDKIM
  '127\.0\.0\.1:8893'  # OpenDMARC
  '127\.0\.0\.1:10023' # Postgrey
  '127\.0\.0\.1:10024' # Amavis
)

for PATTERN in "${GREP_PATTENS[@]}"
do
  if ! grep -qE "LISTEN.*${PATTERN}" "${STATUS_FILE}"
  then
    printf "Pattern '%s' was not found\n" "${PATTERN}" \
      >/tmp/dms.lr-probe.error-cause.txt
    exit 2
  fi
done

@casperklein
Copy link
Member Author

Why not making it future proof and remove --ipv4?

I started with something similar, but decided to go without --ipv4, that's why the healthcheck only looks for :smtp and not 0.0.0.0:25

@georglauterbach
Copy link
Member

Why not making it future proof and remove --ipv4?

I started with something similar, but decided to go without --ipv4, that's why the healthcheck only looks for :smtp and not 0.0.0.0:25

This script is for my setup only, we'd need to adjust the script shown above anyway :)

@casperklein
Copy link
Member Author

casperklein commented Jun 6, 2022

@polarathene The check-for-changes test is failing. I doubt, this is due to changes of this PR. Also, locally the tests pass. Any idea?

@polarathene
Copy link
Member

I'll look over this PR and discussion after I've had some rest, it's late here 😅

@polarathene The check-for-changes test is failing.

Were you referring to this one?

not ok 141 checking relay hosts: default mapping is added from env vars for new virtual user entry in 11430ms
# (from function `run_until_success_or_timeout' in file test/test_helper/common.bash, line 67,
#  in test file test/mail_with_relays.bats, line 52)
#   `run_until_success_or_timeout 10 docker exec mail_with_relays grep -e domain2.tld /etc/postfix/relayhost_map' failed
# Timed out on command: docker exec mail_with_relays grep -e domain2.tld /etc/postfix/relayhost_map

I didn't see any other test failures in the log. This one was the setup.sh -c bug that's now merged to master, so it shouldn't be an issue.

@casperklein
Copy link
Member Author

casperklein commented Jun 6, 2022

Look here: https://github.com/docker-mailserver/docker-mailserver/runs/6755505004?check_suite_focus=true#step:8:347

Edit: I just resolved the conflicts with your PTRACE commit. Tests are running again now. Let's see what happens 😎
Edit2: Tests pass now.

@casperklein casperklein enabled auto-merge (squash) June 6, 2022 14:22
@polarathene
Copy link
Member

For the future, I have devised a more elaborate script we could put in the docs for those that want to do more elaborate health checks

Looks like a good addition to docs 👍

@casperklein casperklein merged commit 72650d4 into docker-mailserver:master Jun 7, 2022
@casperklein casperklein deleted the healthcheck branch June 14, 2022 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/configuration (file) kind/improvement Improve an existing feature, configuration file or the documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants