-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix containerd install for fcos #8107
Fix containerd install for fcos #8107
Conversation
Welcome @mafn! |
Hi @mafn. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @oomichi
@@ -1,5 +1,5 @@ | |||
--- | |||
|
|||
runc_bin_dir: /usr/bin/ | |||
runc_bin_dir: "{{ bin_dir }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to change this runc_bin_dir
here?
The default value of bin_dir
is /usr/local/bin
, not /usr/bin
.
This can affect the existing environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A proper solution for this would be to add the per-OS vars format to this role (see https://github.com/kubernetes-sigs/kubespray/blob/master/roles/container-engine/docker/tasks/main.yml#L14-L31). A solution I find we duplicate way too often in this code base, in the end it might be a good idea to move all of this per-os-version-minor-major-family code to the bootstrap-os role or to a similar role to set all os-version-specific variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proper per os-solution is actually what happens to bin_dir
and almost everything that is downloaded is put there. I'm not sure if downloading stuff to /usr/bin
would be considered good practise.
runc_bin_dir
is just used in that role so there are no other dependencies on the variable. AFAICS there are no actual dependencies on the location of runc as long as it is the PATH
(and for fcos the position in PATH
as we are not actually removing runc from the base image), but to be sure I'm testing the upgrade from 10c30ea to this PR on a test cluster to see if there is any breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The upgrade went smoothly for both upgrade-cluster.yml
as well as cluster.yml
.
The only problem through the upgrade would be orphaned files in /usr/bin
if run after ea8e2fc or if using kata-containers. That could be fixed by adding a cleanup step for the files in /usr/bin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, the purpose of the change was to keep location compatibility with the deployment coming from os packages but changing the containerd_bin_dir
and runc_bin_dir
to bin_dir
does make overall sense from a clean deployment point of view.
Since you already tested and identified the corner case above, may I suggest you add the necessary fixes to this PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at it again, Kata container already used /usr/local/bin
so there is nothing to do, for the other case, I added a task to remove the binaries in /usr/bin
(with ignore_errors: true
for the case of a read-only /usr/bin
as in fcos)
84d67d2
to
2583df6
Compare
2583df6
to
86000f0
Compare
86000f0
to
2594047
Compare
/ok-to-test |
Thanks for fixing this @mafn ! /lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mafn Thank you for the PR 🙇
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cristicalin, floryut, mafn The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
It seems that this has broken remove-node playbook:
the error extracted for better visibility:
this is caused by containerd-shim still running since before the upgrade to 2.18 and through the change of path to runc and calling runc like this: I probably should put this into a new ticket... |
* Fix containerd install for fcos * rm orphaned runc and containerd binaries
* Fix containerd install for fcos * rm orphaned runc and containerd binaries
What type of PR is this?
/kind bug
What this PR does / why we need it:
Allow deploying of
containerd
by download for Fedora CoreOSWhich issue(s) this PR fixes:
Fixes #8106
Special notes for your reviewer:
Does this PR introduce a user-facing change?: