Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pod launched by unexpected CNI when the health checking of the agent fails and multus.conf is lost #3758

Merged
merged 1 commit into from
Jul 31, 2024

Conversation

cyclinder
Copy link
Collaborator

@cyclinder cyclinder commented Jul 24, 2024

Thanks for contributing!

What type of PR is this?

  • release/bug

What this PR does / why we need it:

the prestop of the agent's container would clean up the multus cni config(00-multus.conf), If the container restart but not pod restarts, the config file wouldn't be re-generated, which can cause the multus doesn't work.

the pr moves the multus-cni from init-container to container, and choose the it as the first containers.

Which issue(s) this PR fixes:

Fixes #3755

Special notes for your reviewer:

  1. restart the multus-cni container, the 00-multus.conf would be re-generated.
root@spider-worker:/# crictl ps
CONTAINER           IMAGE               CREATED              STATE               NAME                     ATTEMPT             POD ID              POD
a4d9d4af2fe69       d958cfbb222dd       About a minute ago   Running             spiderpool-agent         1                   4ee679bf58463       spiderpool-agent-tzv5j
d831dc97340f0       c0e8690ae66a1       2 minutes ago        Running             multus-cni               2                   4ee679bf58463       spiderpool-agent-tzv5j
8d75c8c8afcc8       0cdd32d9ecad4       6 minutes ago        Running             spiderpool-controller    0                   63e231c0e00a4       spiderpool-controller-6c96859774-xfk8k
1cb443c9e2275       ded66453eb630       24 hours ago         Running             calico-node              0                   07b5a657f52d2       calico-node-99ns6
c6b519fbbc373       ce18e076e9d4b       24 hours ago         Running             local-path-provisioner   0                   84e082ef2ce9e       local-path-provisioner-6f8956fb48-rrjxj
de187f755a123       cbb01a7bd410d       24 hours ago         Running             coredns                  0                   5ccf34350d4d2       coredns-76f75df574-vrvvw
c3bd1ec34ff92       cbb01a7bd410d       24 hours ago         Running             coredns                  0                   0d54dd25c240c       coredns-76f75df574-qxjwr
70065ffc12219       fa4dee78049db       24 hours ago         Running             kube-proxy               0                   d2acd3475056d       kube-proxy-tzcfr
root@spider-worker:/# ls -l /etc/cni/net.d/
total 16
-rw-r--r-- 1 root root  461 Jul 25 08:11 00-multus.conf
-rw-r--r-- 1 root root  752 Jul 24 07:58 10-calico.conflist
-rw------- 1 root root 2737 Jul 25 01:42 calico-kubeconfig
drwxr-xr-x 2 root root 4096 Jul 25 08:11 multus.d
root@spider-worker:/# crictl stop d831dc97340f0
d831dc97340f0
root@spider-worker:/# ls -l /etc/cni/net.d/
total 16
-rw-r--r-- 1 root root  461 Jul 25 08:14 00-multus.conf
-rw-r--r-- 1 root root  752 Jul 24 07:58 10-calico.conflist
-rw------- 1 root root 2737 Jul 25 01:42 calico-kubeconfig
drwxr-xr-x 2 root root 4096 Jul 25 08:14 multus.d
  1. restart the agent's container, the 00-multus.conf has no changes.

@cyclinder cyclinder added release/bug cherrypick-release-v0.8 Cherry-pick the PR to branch release-v0.8. cherrypick-release-v0.9 cherrypick-release-v1.0 Cherry-pick the PR to branch release-v1.0. labels Jul 24, 2024
@weizhoublue
Copy link
Collaborator

currently, there is two containers in the pod, all debugging commands in the CI should change to kubectl logs spiderpool-agent -c xxx to get the expected log

{{- if .Values.multus.multusCNI.extraVolumes }}
{{- include "tplvalues.render" ( dict "value" .Values.multus.multusCNI.extraVolumeMounts "context" $ ) | nindent 12 }}
{{- end }}
{{- end }}
- name: {{ .Values.spiderpoolAgent.name | trunc 63 | trimSuffix "-" }}
Copy link
Collaborator

@weizhoublue weizhoublue Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make a difference to do kubect logs , if shifting this container as the first container

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should make the agent's container as the first container.

@cyclinder cyclinder force-pushed the charts/multus_uninstall branch 2 times, most recently from a709cea to a0a4247 Compare July 25, 2024 08:50
Copy link

codecov bot commented Jul 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.16%. Comparing base (1138e66) to head (6dae80f).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3758      +/-   ##
==========================================
- Coverage   81.68%   81.16%   -0.53%     
==========================================
  Files          50       50              
  Lines        4391     4391              
==========================================
- Hits         3587     3564      -23     
- Misses        643      670      +27     
+ Partials      161      157       -4     
Flag Coverage Δ
unittests 81.16% <ø> (-0.53%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 1 file with indirect coverage changes

@weizhoublue weizhoublue changed the title charts: avoiding unexpect loss of 00-multus.conf on node pod run in unexpected CNI when the health checking of the agent fails and multus.conf is lost Jul 26, 2024
@weizhoublue
Copy link
Collaborator

@ty-dc pay an attention to https://github.com/spidernet-io/spiderpool/actions/runs/10094313282/job/27912257643?pr=3758

weizhoublue
weizhoublue previously approved these changes Jul 26, 2024
test/Makefile Outdated
@@ -382,6 +383,15 @@ uninstall_spiderpool:
@echo -e "\033[35m [helm uninstall spiderpool] \033[0m"
helm uninstall $(RELEASE_NAME) --wait --debug -n $(RELEASE_NAMESPACE) \
--kubeconfig $(E2E_KUBECONFIG) || { KIND_CLUSTER_NAME=$(E2E_CLUSTER_NAME) ./scripts/debugEnv.sh $(E2E_KUBECONFIG) "detail" ; exit 1 ; } ; \
NODE_LIST=` kind get nodes --name $(E2E_CLUSTER_NAME) `; \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will rebase it after #3716 merged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean these codes could be removed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: cyclinder <qifeng.guo@daocloud.io>
@weizhoublue weizhoublue changed the title pod run in unexpected CNI when the health checking of the agent fails and multus.conf is lost pod launched by unexpected CNI when the health checking of the agent fails and multus.conf is lost Jul 31, 2024
@weizhoublue weizhoublue merged commit cfae10b into spidernet-io:main Jul 31, 2024
56 checks passed
github-actions bot pushed a commit that referenced this pull request Jul 31, 2024
 pod launched by unexpected CNI when the health checking of the agent fails and multus.conf is lost

Signed-off-by: robot <tao.yang@daocloud.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherrypick-release-v0.8 Cherry-pick the PR to branch release-v0.8. cherrypick-release-v0.9 cherrypick-release-v1.0 Cherry-pick the PR to branch release-v1.0. release/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

After restarting spiderpool-agent, the multus config configuration is lost and the IP address changes.
2 participants