-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Agent] Support Node and Service autodiscovery in k8s provider #26801
[Agent] Support Node and Service autodiscovery in k8s provider #26801
Conversation
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Signed-off-by: chrismark <chrismarkou92@gmail.com>
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
Trends 🧪💚 Flaky test reportTests succeeded. Expand to view the summary
Test stats 🧪
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It is nice splitting different resource watchers in their respectful files.
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Pinging @elastic/integrations (Team:Integrations) |
Opening this for review. I tested it manually with the scenarios mentioned in this PR's description. I plan to add unit tests for the provider too but the implementation should be ready for review now. |
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Note: Latest commit (43f5136) tries to improve data emission handling in |
@ChrsMark thanks for working on this. Before going deep into implementation details, I would like to raise a concern, do we have plans to support discovery of multiple resources? With current configuration it doesn't seem to be possible. There is no way to declare multiple providers of the same kind as in Beats, and
I would see two options for that, one would be to make Of course, with current implementation there is also the option of running multiple agents, one for each kind of resource, but this may be a bit overkill and cumbersome for users. |
That's a fair point @jsoriano , thanks for bringing this up! Especially now that same Agent will run collectors for logs+metrics+uptime at the same time, using the same config, we should be able to provide such flexibility to our users. I think that splitting into different providers per k8s resource would make sense but fields should be reported under the same namespace like
${kubernetes_service.service.name} == 'kube-controller-manager' , which I'm not if is something we would like.
Other way to tackle this is to enable support for defining k8s provider multiple times but this most probably should happen on controller provider's layer. @exekias any thoughts on this? |
// Check if resource is service. If yes then default the scope to "cluster". | ||
if c.Resources.Service != nil { | ||
if c.Scope == "node" { | ||
logp.L().Warnf("can not set scope to `node` when using resource `Service`. resetting scope to `cluster`") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this logger be namespaced? I also see logger
widely used in agent, not sure if there is any preference here
type ResourceConfig struct { | ||
KubeConfig string `config:"kube_config"` | ||
Namespace string `config:"namespace"` | ||
SyncPeriod time.Duration `config:"sync_period"` | ||
CleanupTimeout time.Duration `config:"cleanup_timeout" validate:"positive"` | ||
|
||
// Needed when resource is a Pod or Node | ||
Node string `config:"node"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering about use cases for resource specific settings, do you have anyone in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I'm thinking of it again, maybe it is over-engineering to provide this option at the moment since the base config shared for all the resources should cover the cases. Flexibility for different accesses per resource or variant settings options would be nice to think of but we can and see if users actually need them. So, I will change it and move to single config for all of the resources.
func (c *Config) Validate() error { | ||
// Check if resource is service. If yes then default the scope to "cluster". | ||
if c.Resources.Service != nil { | ||
if c.Scope == "node" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting that you can override almost all settings per resource, but not scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #26801 (comment).
Node: c.Node, | ||
} | ||
if c.Resources.Pod == nil { | ||
c.Resources.Pod = baseCfg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the user only overrides resources.pod.namespace
? Does that mean that the rest of settings will be empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #26801 (comment)
func (n *node) emitRunning(node *kubernetes.Node) { | ||
data := generateNodeData(node) | ||
data.mapping["scope"] = n.scope | ||
if data == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this happen taking the previous line into account?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch
n.emitStopped(node) | ||
n.emitRunning(node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory emitRunning
should be enough here, right? This will AddOrUpdate
return false | ||
} | ||
|
||
func getAddress(node *kubernetes.Node) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A explanatory comment here would help
// Pass annotations to all events so that it can be used in templating and by annotation builders. | ||
annotations := common.MapStr{} | ||
for k, v := range node.GetObjectMeta().GetAnnotations() { | ||
safemapstr.Put(annotations, k, v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be dedotting these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about adding this when we will deal with metadata in general, but it's ok to add it now. Adding.
"node": map[string]interface{}{ | ||
"uid": string(node.GetUID()), | ||
"name": node.GetName(), | ||
"labels": node.GetLabels(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for labels, should we dedot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above!
|
||
providerDataChan := make(chan providerData) | ||
done := make(chan bool, 1) | ||
go generateContainerData(pod, containers, containerstatuses, providerDataChan, done) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using a channel for this? What would you think about emitting directly from the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Channel usage helps to isolate the generator function so as to be possible to be tested with unit tests, following a yield-like approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
understood, you could also build a mocked emitter passed as a parameter to retrieve the results in the tests, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good idea. Something like 6f39378?
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Comments addressed and tested locally following the updated scenarios listed in the PR's description. |
Signed-off-by: chrismark <chrismarkou92@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. Added a question about the future of dedotting and the possibility of using flattened
type instead.
Also I think we still need to polish some use cases that we recently polished in Beats, as discovery of short-living pods, crashing containers, ephemeral containers and so on, but this can be done as follow ups. elastic/e2e-testing#1090 will help to validate these use cases 😇
// Scope of the provider (cluster or node) | ||
Scope string `config:"scope"` | ||
LabelsDedot bool `config:"labels.dedot"` | ||
AnnotationsDedot bool `config:"annotations.dedot"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@exekias @ChrsMark wdyt about start using the flattened
type in Agent instead of dedotting? There are still some trade-offs but they should be eventually addressed. Hopefully flattened
is the future for this kind of fields.
More context here: https://github.com/elastic/obs-dc-team/issues/461
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not completely aware of pros/cons but with a quick view flattened
type looks better than dedoting to me, and sounds like a good idea regarding timing to do this change now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taking this into account I'm inclined to leave dedotting out of this PR and investigate the experience when using flattened for these fields, any thoughts? Also, in case we end up introducing it, I would like to be opinionated here, and avoid adding any config parameter for it.
I'm particularly concerned about doing things like grouping some metric by a label, which is a valid use. I'm less concerned about annotations...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +1 in leaving this out for now and open a follow-up issue to work on for dedotting/flattening
} | ||
|
||
// InitDefaults initializes the default values for the config. | ||
func (c *Config) InitDefaults() { | ||
c.SyncPeriod = 10 * time.Minute | ||
c.CleanupTimeout = 60 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may be hitting this issue #20543, not sure how we can do something now depending on the kind of data we are collecting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap, the provider is not really aware of the inputs right now, but maybe it could be handled better in the future if we introduce a "smart" controller which enables providers according to the inputs.
p.emitContainers(pod, pod.Spec.Containers, pod.Status.ContainerStatuses) | ||
|
||
// TODO deal with init containers stopping after initialization | ||
p.emitContainers(pod, pod.Spec.InitContainers, pod.Status.InitContainerStatuses) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: add support for ephemeral containers.
Signed-off-by: chrismark <chrismarkou92@gmail.com>
Thanks! Testing should be improved for sure and in a more complete/e2e approach. We have in our backlog elastic/e2e-testing#1090 which can cover this need I think, and I'm thinking that this could be better to be implemented after we have a more complete codebase including metadata handling too. |
Signed-off-by: chrismark <chrismarkou92@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be merged, and we can iterate on details and specific use cases in future PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for all the fixes and changes!
(cherry picked from commit 6635acb)
What does this PR do?
This PR adds more resources in
kubernetes
dynamic provider.Why is it important?
So as to support Node and Service discovery via
kubernetes
dynamic provider of Agent.Checklist
- [ ] I have made corresponding changes to the documentationCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
Test with
node
resource`:inspect
command to check what is the compiled configuration (use proper path for the elastic-agent.yml config file :./elastic-agent -c /Users/ubuntu/Desktop/elastic-agent.yml inspect output -o default
Test with
service
resource.Test with
pod
Test with
service
resource andnode
scope.cluster
scope (can not set scope to node when using resource Service. resetting scope to cluster
), and thatkubernetes.scope
field is populated withcluster
value.Test with
pod
at node scope and define nodeRelated issues