log: noderesourcetopology: per-flow additional log #289

All the following applies first and foremost to the NodeTopologyMatch plugin, but can be easily generalized. Troubleshooting the behaviour of the components requires almost always increasing the log level to peek into the flow. This is due by the fact that, by default, components want to run with concise logging to avoid log churn and spam. But increasing the log level very often requires a configuration change and a component reboot, which is nontrivial to do. More than that, restarting requires changing the cluster state, which can make torubleshooting harder, or just longer, while we reproduce the state. More than that, the log level is a global, or in the best case scenario a per-module setting. On the other hand, we are often interested in the behaviour of a specific flow, possibly across components, rather than the behaviour of some components. Increasing the log level then creates unnecessary and uninteresting logs, which can actually make harder to track the flow. It would thus be nice to have the option to dynamically enable detailed logging per-flow across components, avoiding component restart. I'm not aware of loggers package, including klog, implementing this feature, but we can have a pretty close approximation with this PR. We add a new annotation: pods can opt-in including this annotation in the new behaviour. This allows extra verbosity only on selected cases (pods). We add utility function to tune the verbosiness of the logs with the aforementioned annotation, overriding the default verbosiness level. IOW, log messages can be emitted if *either* - the component (/module) verbosity is high enough OR - a pod opts in adding the special annotation. Finally, we consume this new feature in the NodeTopologyMatch plugin, to both help troubleshooting and to demonstrate its usage. CAVEATs: - the feature cannot be turned off (it should probably deserve a global disable flag?) - potentially malicious actors can trigger extra logging adding unnecessary annotations. - but these actors need to be able to send pods to the scheduler, so they have already means to overload the component and/or cause extra logging. Not sure this is a practical concern, highligthing for full transparency. Signed-off-by: Francesco Romani <fromani@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

log: noderesourcetopology: per-flow additional log #289

log: noderesourcetopology: per-flow additional log #289

Commits on Nov 25, 2021