Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bootstrap-containers: prevent bootstrap containers from restarting #1508

Merged

Conversation

etungsten
Copy link
Contributor

Issue number:
Fixes #1488

Description of changes:

Author: Erikson Tung <etung@amazon.com>
Date:   Tue Apr 20 11:53:30 2021 -0700

    bootstrap-containers: prevent bootstrap containers from restarting
    
    bootstrap-containers@ units creates a sentinel file when they first
     run. If the sentinel file exists, the unit will be skipped over.
    
    This will prevent non-essential bootstrap-containers from being
    restarted by systemd when multi-user.target is reached.

Testing done:

I launched an instance with the following in my userdata:

# Invalid bootstrap-container
[settings.bootstrap-containers.test]
source = "docker.io/bad_source"
essential = false
mode = "always"

# Valid bootstrap-container that prints bear emojis to the journal
[settings.bootstrap-containers.bear]
source="my-bootstrap-container-image"
mode="once"
user-data="ypXCt82h4bSlwrfKlA=="

The instance comes up find. The host reaches multi-user.target successfully.
All target units are active at the end. The bootstrap container unit statuses are as expected.

bootstrap-containers@bear:

systemctl status bootstrap-containers@bear
● bootstrap-containers@bear.service - bootstrap container bear
     Loaded: loaded (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/bootstrap-containers@.service; disabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/bootstrap-containers@bear.service.d
             └─overrides.conf
     Active: inactive (dead)

Apr 21 00:26:03  host-ctr[3367]: time="2021-04-21T00:26:03Z" level=info msg="Container does not exist, proceeding to create it" ctr-id=boot.bear
Apr 21 00:26:03  host-ctr[3367]: time="2021-04-21T00:26:03Z" level=info msg="container task does not exist, proceeding to create it" container-id=boot.bear
Apr 21 00:26:04  host-ctr[3367]: ʕ·͡ᴥ·ʔʕ·͡ᴥ·ʔtime="2021-04-21T00:26:04Z" level=info msg="successfully started container task"
Apr 21 00:28:04 host-ctr[3367]: time="2021-04-21T00:28:04Z" level=info msg="container task exited" code=0
Apr 21 00:28:04 bootstrap-containers[3529]: 00:28:04 [INFO] bootstrap-containers started
Apr 21 00:28:04 bootstrap-containers[3529]: 00:28:04 [INFO] Mode for 'bear' is 'once'
Apr 21 00:28:04 bootstrap-containers[3529]: 00:28:04 [INFO] Turning off container 'bear'
Apr 21 00:28:04 systemd[1]: Finished bootstrap container bear.
Apr 21 00:28:04 systemd[1]: bootstrap-containers@bear.service: Succeeded.
Apr 21 00:28:04 systemd[1]: Stopped bootstrap container bear.

bootstrap-containers@test:

bash-5.0# systemctl status bootstrap-containers@test 
● bootstrap-containers@test.service - bootstrap container test
     Loaded: loaded (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/bootstrap-containers@.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/bootstrap-containers@test.service.d
             └─overrides.conf
     Active: failed (Result: exit-code) since Wed 2021-04-21 00:26:54 UTC; 6min ago
  Condition: start condition failed at Wed 2021-04-21 00:28:04 UTC; 5min ago
   Main PID: 3369 (code=exited, status=1/FAILURE)

Apr 21 00:26:08 host-ctr[3369]: time="2021-04-21T00:26:08Z" level=warning msg="failed to pull image. waiting 6.677s before retrying..." error="failed to resolve reference \"docker.io/bad_source\": object required"
Apr 21 00:26:14 host-ctr[3369]: time="2021-04-21T00:26:14Z" level=warning msg="failed to pull image. waiting 9.141s before retrying..." error="failed to resolve reference \"docker.io/bad_source\": object required"
Apr 21 00:26:23 host-ctr[3369]: time="2021-04-21T00:26:23Z" level=warning msg="failed to pull image. waiting 11.463s before retrying..." error="failed to resolve reference \"docker.io/bad_source\": object required"
Apr 21 00:26:35 host-ctr[3369]: time="2021-04-21T00:26:35Z" level=warning msg="failed to pull image. waiting 19.196s before retrying..." error="failed to resolve reference \"docker.io/bad_source\": object required"
Apr 21 00:26:54 host-ctr[3369]: time="2021-04-21T00:26:54Z" level=error msg="retries exhausted: failed to resolve reference \"docker.io/bad_source\": object required" ref=docker.io/bad_source
Apr 21 00:26:54 host-ctr[3369]: time="2021-04-21T00:26:54Z" level=fatal msg="retries exhausted: failed to resolve reference \"docker.io/bad_source\": object required"
Apr 21 00:26:54 systemd[1]: bootstrap-containers@test.service: Main process exited, code=exited, status=1/FAILURE
Apr 21 00:26:54 systemd[1]: bootstrap-containers@test.service: Failed with result 'exit-code'.
Apr 21 00:26:54 systemd[1]: Failed to start bootstrap container test.
Apr 21 00:28:04 systemd[1]: Condition check resulted in bootstrap container test being skipped.

The invalid bootstrap container is only started once then skipped over when activate-multi-user.service runs.

journal:

bash-5.0# journalctl -o cat  \
  -u preconfigured.target  \
  -u configured.target \
  -u multi-user.target \
  -u bootstrap-containers@test \
  -u bootstrap-containers@bear
Reached target Bottlerocket initial configuration complete.
Starting bootstrap container bear...
Starting bootstrap container test...
time="2021-04-21T00:26:02Z" level=info msg="pulling with Amazon ECR Resolver" ref="ecr.aws/arn:aws:ecr:us-west-2:722737851570:repository/my-bootstrap-container:latest"
time="2021-04-21T00:26:02Z" level=warning msg="failed to pull image. waiting 5.053s before retrying..." error="failed to resolve reference \"docker.io/bad_source\": object required"
time="2021-04-21T00:26:03Z" level=info msg="pulled image successfully" img="ecr.aws/arn:aws:ecr:us-west-2:722737851570:repository/my-bootstrap-container:latest"
time="2021-04-21T00:26:03Z" level=info msg="unpacking image..." img="ecr.aws/arn:aws:ecr:us-west-2:722737851570:repository/my-bootstrap-container:latest"
time="2021-04-21T00:26:03Z" level=info msg="tagging image" img="722737851570.dkr.ecr.us-west-2.amazonaws.com/my-bootstrap-container:latest"
time="2021-04-21T00:26:03Z" level=info msg="Container does not exist, proceeding to create it" ctr-id=boot.bear
time="2021-04-21T00:26:03Z" level=info msg="container task does not exist, proceeding to create it" container-id=boot.bear
ʕ·͡ᴥ·ʔʕ·͡ᴥ·ʔtime="2021-04-21T00:26:04Z" level=info msg="successfully started container task"
time="2021-04-21T00:26:08Z" level=warning msg="failed to pull image. waiting 6.677s before retrying..." error="failed to resolve reference \"docker.io/bad_source\": object required"
time="2021-04-21T00:26:14Z" level=warning msg="failed to pull image. waiting 9.141s before retrying..." error="failed to resolve reference \"docker.io/bad_source\": object required"
time="2021-04-21T00:26:23Z" level=warning msg="failed to pull image. waiting 11.463s before retrying..." error="failed to resolve reference \"docker.io/bad_source\": object required"
time="2021-04-21T00:26:35Z" level=warning msg="failed to pull image. waiting 19.196s before retrying..." error="failed to resolve reference \"docker.io/bad_source\": object required"
time="2021-04-21T00:26:54Z" level=error msg="retries exhausted: failed to resolve reference \"docker.io/bad_source\": object required" ref=docker.io/bad_source
time="2021-04-21T00:26:54Z" level=fatal msg="retries exhausted: failed to resolve reference \"docker.io/bad_source\": object required"
bootstrap-containers@test.service: Main process exited, code=exited, status=1/FAILURE
bootstrap-containers@test.service: Failed with result 'exit-code'.
Failed to start bootstrap container test.
time="2021-04-21T00:28:04Z" level=info msg="container task exited" code=0
00:28:04 [INFO] bootstrap-containers started
00:28:04 [INFO] Mode for 'bear' is 'once'
00:28:04 [INFO] Turning off container 'bear'
Finished bootstrap container bear.
Reached target Bottlerocket final configuration complete.
bootstrap-containers@bear.service: Succeeded.
Stopped bootstrap container bear.
Condition check resulted in bootstrap container test being skipped.
Reached target Multi-User System.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@webern
Copy link
Contributor

webern commented Apr 21, 2021

Aren't some bootstrap containers intended to run on every boot? Would this prevent that behavior?

Copy link
Contributor

@bcressey bcressey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👢


[Service]
Type=oneshot
EnvironmentFile=/etc/bootstrap-containers/%i.env
# Create a sentinel file to mark that we've ran
ExecStart=/usr/bin/mkdir -p /run/bootstrap-containers/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I have a slight preference to do this mkdir via bootstrap-containers-tmpfiles.conf so we don't repeat the work for N bootstrap containers.

@@ -7,10 +7,17 @@ Wants=host-containers.service
# started by systemd
RefuseManualStart=true
RefuseManualStop=true
# If a sentinel file exists for this bootstrap container, it means we should skip
# since we've ran this bootstrap container already.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wording

Suggested change
# since we've ran this bootstrap container already.
# since we've run this bootstrap container already.

@bcressey
Copy link
Contributor

Aren't some bootstrap containers intended to run on every boot? Would this prevent that behavior?

The tracking file is in /run which is a tmpfs, so it is cleared out across boots.

@arnaldo2792
Copy link
Contributor

Nit: In your commit message:

bootstrap-containers@ units creates

Should be "create".

@etungsten etungsten force-pushed the bootstrap-container-inits branch from fd9b40b to 6075474 Compare April 21, 2021 17:24
@etungsten
Copy link
Contributor Author

etungsten commented Apr 21, 2021

Push above addresses comments by @arnaldo2792 and @bcressey

Tested things and they still work as expected.

bootstrap-containers@ units create a sentinel file when they first
 run. If the sentinel file exists, the unit will be skipped over.

This will prevent non-essential bootstrap-containers from being
restarted by systemd when multi-user.target is reached.
@etungsten etungsten force-pushed the bootstrap-container-inits branch from 6075474 to 2f1530a Compare April 21, 2021 17:27
@etungsten
Copy link
Contributor Author

Push above fixes another grammar error. English is hard.

Copy link
Contributor

@arnaldo2792 arnaldo2792 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@zmrow zmrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤶

@etungsten etungsten merged commit 0b7b3f8 into bottlerocket-os:develop Apr 22, 2021
@etungsten etungsten deleted the bootstrap-container-inits branch April 22, 2021 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Not essential bootstrap containers shouldn't be started more than once if they fail
5 participants