Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine Health Check controller has excessive trace logging #9418

Closed
cnmcavoy opened this issue Sep 13, 2023 · 1 comment · Fixed by #9419
Closed

Machine Health Check controller has excessive trace logging #9418

cnmcavoy opened this issue Sep 13, 2023 · 1 comment · Fixed by #9419
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@cnmcavoy
Copy link
Contributor

What steps did you take and what happened?

When log level 3 or higher on the CAPI controller manager and a machine health check is configured, the machine health check controller logs it's reconciliation for each target. However, it accumulates the targets it reconciles in the logger values, which results in extremely long log statements. In larger clusters (> 100 ndoes) this becomes excessive and breaks the ability to effectively debug or read the logs for the machine health checks.

An example log message from one of our clusters (this is only 1 log line, but it's 13k chars!):

I0913 17:26:13.952478       1 machinehealthcheck_targets.go:290] "Health checking target" controller="machinehealthcheck" controllerGroup="cluster.x-k8s.io" controllerKind="MachineHealthCheck" MachineHealthCheck="capi-awscmh3/awscmh3-worker-node-healthcheck" namespace="capi-awscmh3" name="awscmh3-worker-node-healthcheck" reconcileID=aa7f688e-28d9-4b0e-9ae6-e3c257952d20 Cluster="capi-awscmh3/awscmh3" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/iqldaemon-rad-worker-m6i-az2-84d65ffd86-99t2v/ip-10-163-209-44.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/worker-private-m6-mixed-az3-77c8d9568d-x7hh6/ip-10-163-237-182.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/imhotep-worker-i4i-az2-75695b8996-4dpbd/ip-10-163-209-169.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az3-5db4f47986-xwn5m/ip-10-163-233-251.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/worker-private-m6-mixed-az2-5d5bd685d-mnmdp/ip-10-163-214-20.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az1-74bc999bdc-r7hgg/ip-10-163-205-43.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/imhotep-worker-i4i-az1-6585566dd-lh7vk/ip-10-163-202-126.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az2-57886594fb-jgk97/ip-10-163-220-193.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az2-57886594fb-wg2vt/ip-10-163-215-60.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/gpu-worker-private-az2-bd9d7bfc6-2wrgk/ip-10-163-222-150.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/gpu-worker-private-g4dn-az2-6f854c86bf-zdmqf/ip-10-163-216-128.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/agg-crawler-only-worker-private-az2-77ffdc94c7-2c82z/ip-10-163-211-62.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/agg-crawler-only-worker-private-az1-7d5d7b8d4-p6rv6/ip-10-163-196-80.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/agg-crawler-only-worker-private-az1-7d5d7b8d4-cbclc/ip-10-163-206-66.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/worker-private-m6-mixed-az3-77c8d9568d-l2t59/ip-10-163-238-201.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az3-5db4f47986-xxjsx/ip-10-163-225-52.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/imhotep-worker-i4i-az1-6585566dd-wt99v/ip-10-163-203-31.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/agg-crawler-only-worker-private-az3-66c6dfb85f-c2mbq/ip-10-163-226-97.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/worker-private-m6-mixed-az1-644c9cbfd4-75z8k/ip-10-163-199-120.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/worker-private-m6-mixed-az3-77c8d9568d-7r9jn/ip-10-163-230-217.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/jobsearch-backend-us-rad-worker-private-az3-76cbdbb7cb-g4484/ip-10-163-227-74.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/imhotep-worker-i4i-az1-6585566dd-cg86q/ip-10-163-200-120.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/worker-private-m6-mixed-az2-5d5bd685d-jwjwn/ip-10-163-223-215.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/worker-private-m6-mixed-az2-5d5bd685d-qzmc4/ip-10-163-217-194.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az3-5db4f47986-w2npl/ip-10-163-229-213.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az1-74bc999bdc-qmq5l/ip-10-163-198-31.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/agg-crawler-only-worker-private-az3-66c6dfb85f-wg7kt/ip-10-163-226-25.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az1-74bc999bdc-grxqs/ip-10-163-200-146.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az2-57886594fb-qjh4j/ip-10-163-209-158.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az1-74bc999bdc-f654g/ip-10-163-193-149.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az2-57886594fb-ld7db/ip-10-163-221-44.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/agg-crawler-only-worker-private-az2-77ffdc94c7-t96bx/ip-10-163-212-218.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/hub-worker-az3-85c9d5d987-xts2q/ip-10-163-227-203.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/imhotep-worker-i4i-az2-75695b8996-57pl6/ip-10-163-222-61.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/waldo-global-rad-worker-private-az3-84f4546857-sjq55/ip-10-163-239-24.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az3-5db4f47986-7zmsk/ip-10-163-224-145.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/agg-crawler-only-worker-private-az1-7d5d7b8d4-pmzxv/ip-10-163-192-20.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/jobsearch-backend-us-rad-worker-private-az3-76cbdbb7cb-lzwdj/ip-10-163-228-29.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/agg-crawler-only-worker-private-az1-7d5d7b8d4-4xnxh/ip-10-163-193-208.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az2-57886594fb-b6lh6/ip-10-163-221-214.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/agg-crawler-only-worker-private-az2-77ffdc94c7-r88hr/ip-10-163-209-93.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/jobsearch-backend-global-rad-worker-private-az1-7f7786cfd-dbnfm/ip-10-163-204-176.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az3-5db4f47986-2zsxj/ip-10-163-234-13.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/agg-crawler-only-worker-private-az2-77ffdc94c7-6jwjd/ip-10-163-215-99.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/agg-crawler-only-worker-private-az3-66c6dfb85f-pdlfp/ip-10-163-232-13.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/agg-crawler-only-worker-private-az1-7d5d7b8d4-qb4kj/ip-10-163-197-215.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/agg-crawler-only-worker-private-az1-7d5d7b8d4-h6cj4/ip-10-163-205-153.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/waldo-global-rad-worker-private-az2-6dcbf8dfc7-dxrt2/ip-10-163-223-170.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/jobsearch-backend-us-rad-worker-private-az2-7c889b8784-bp85t/ip-10-163-216-137.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/jobsearch-backend-us-rad-worker-private-az1-7c4cdb6fd8-9j4hr/ip-10-163-207-188.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az1-74bc999bdc-qr24c/ip-10-163-203-18.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/worker-private-m6-mixed-az3-77c8d9568d-lchqt/ip-10-163-225-157.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az3-5db4f47986-vdj92/ip-10-163-238-239.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/imhotep-worker-i4i-az1-6585566dd-8r2tj/ip-10-163-194-217.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/imhotep-worker-i4i-az3-7ffb4fc55d-jvsh2/ip-10-163-228-237.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/waldo-global-rad-worker-private-az3-84f4546857-8llml/ip-10-163-237-140.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/agg-crawler-only-worker-private-az2-77ffdc94c7-fpzqh/ip-10-163-209-21.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/agg-crawler-only-worker-private-az2-77ffdc94c7-jcprp/ip-10-163-215-33.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/jobsearch-backend-us-rad-worker-private-az1-7c4cdb6fd8-5kgdw/ip-10-163-199-199.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az2-57886594fb-vmhjv/ip-10-163-209-47.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az3-5db4f47986-99snz/ip-10-163-229-148.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/worker-private-m6-mixed-az2-5d5bd685d-bhbnn/ip-10-163-208-163.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/imhotep-worker-i4i-az3-7ffb4fc55d-pr5qr/ip-10-163-231-65.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az2-57886594fb-c8wcr/ip-10-163-212-244.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az2-57886594fb-g6jn9/ip-10-163-221-253.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/jobsearch-backend-global-rad-worker-private-az2-597bb4cf79tf8jd/ip-10-163-215-160.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/jobsearch-backend-us-rad-worker-private-az3-76cbdbb7cb-n58q5/ip-10-163-231-55.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az3-5db4f47986-f72pn/ip-10-163-229-89.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az1-74bc999bdc-dhfkz/ip-10-163-197-229.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/waldo-global-rad-worker-private-az3-84f4546857-h2rcl/ip-10-163-230-218.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/imhotep-worker-i4i-az3-7ffb4fc55d-nlkfx/ip-10-163-226-82.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/worker-private-m6-mixed-az2-5d5bd685d-kds5z/ip-10-163-215-105.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az2-57886594fb-qb272/ip-10-163-221-187.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/imhotep-worker-i4i-az1-6585566dd-t9fsv/ip-10-163-201-182.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az3-5db4f47986-h2xft/ip-10-163-236-89.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/jobsearch-backend-global-rad-worker-private-az3-657bb7fd798fk54/ip-10-163-224-68.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az1-74bc999bdc-gnqbh/ip-10-163-204-225.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/worker-private-m6-mixed-az2-5d5bd685d-zs5pz/ip-10-163-214-210.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/jobsearch-backend-us-rad-worker-private-az2-7c889b8784-pvbb2/ip-10-163-222-66.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az1-74bc999bdc-tb68b/ip-10-163-194-169.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/agg-crawler-only-worker-private-az3-66c6dfb85f-hpxws/ip-10-163-239-57.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/agg-crawler-only-worker-private-az1-7d5d7b8d4-pspgd/ip-10-163-204-243.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/worker-private-m6-mixed-az1-644c9cbfd4-lbf4q/ip-10-163-207-161.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/agg-crawler-only-worker-private-az1-7d5d7b8d4-5k9nz/ip-10-163-201-49.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/worker-private-m6-mixed-az3-77c8d9568d-6fz86/ip-10-163-231-33.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az3-5db4f47986-zdgtb/ip-10-163-233-199.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az1-74bc999bdc-6l6c8/ip-10-163-195-247.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/agg-crawler-only-worker-private-az3-66c6dfb85f-v49vl/ip-10-163-225-171.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/jobsearch-backend-us-rad-worker-private-az2-7c889b8784-vkt7f/ip-10-163-208-73.us-east-2.compute.internal" Target="capi-awscmh3/awscmh3-worker-node-healthcheck/rad-worker-private-m6-mixed-az3-5db4f47986-srszd/ip-10-163-227-212.us-east-2.compute.internal" 

What did you expect to happen?

The machine health check controller reconciliation should only annotate it's logger values with the target it's currently examining the health check for, not an accumulation of all previous machines.

Cluster API version

1.5.1

Kubernetes version

1.24.14

Anything else you would like to add?

No response

Label(s) to be applied

/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 13, 2023
@killianmuldoon
Copy link
Contributor

/triage accepted

This definitely looks excessive.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants