All the following applies first and foremost to the
NodeTopologyMatch plugin, but can be easily generalized.
Troubleshooting the behaviour of the components requires almost
always increasing the log level to peek into the flow.
This is due by the fact that, by default, components want to run
with concise logging to avoid log churn and spam.
But increasing the log level very often requires a configuration
change and a component reboot, which is nontrivial to do.
More than that, restarting requires changing the cluster state,
which can make torubleshooting harder, or just longer, while we
reproduce the state.
More than that, the log level is a global, or in the best case
scenario a per-module setting. On the other hand, we are often
interested in the behaviour of a specific flow, possibly across
components, rather than the behaviour of some components.
Increasing the log level then creates unnecessary and uninteresting
logs, which can actually make harder to track the flow.
It would thus be nice to have the option to dynamically enable
detailed logging per-flow across components, avoiding component restart.
I'm not aware of loggers package, including klog, implementing this
feature, but we can have a pretty close approximation with this PR.
We add a new annotation: pods can opt-in including this annotation
in the new behaviour. This allows extra verbosity only on selected
cases (pods).
We add utility function to tune the verbosiness of the logs with
the aforementioned annotation, overriding the default verbosiness level.
IOW, log messages can be emitted if *either*
- the component (/module) verbosity is high enough
OR
- a pod opts in adding the special annotation.
Finally, we consume this new feature in the NodeTopologyMatch plugin,
to both help troubleshooting and to demonstrate its usage.
CAVEATs:
- the feature cannot be turned off (it should probably deserve a global
disable flag?)
- potentially malicious actors can trigger extra logging adding
unnecessary annotations.
- but these actors need to be able to send pods to the scheduler,
so they have already means to overload the component and/or cause
extra logging. Not sure this is a practical concern, highligthing
for full transparency.
Signed-off-by: Francesco Romani <fromani@redhat.com>