[BUG] Disabling webhook breaks metrics- and health-endpoints #1376

muffl0n · 2022-04-11T10:09:24Z

When setting

bpValidatingWebhook:
  enabled: false

the webhook server is not started. Instead the "normal" server is started, serving these endpoints

/metrics
/v0/healthz

See

kanister/pkg/kancontroller/kancontroller.go

Lines 60 to 80 in 4e590f0

    
           if isCACertMounted() { 
        
           	go func(config *rest.Config) { 
        
           		err := handler.RunWebhookServer(config) 
        
           		if err != nil { 
        
           			log.WithError(err).Print("Failed to start validating webhook server") 
        
           			return 
        
           		} 
        
           	}(config) 
        
           } else { 
        
           	s := handler.NewServer() 
        
           	defer func() { 
        
           		if err := s.Shutdown(ctx); err != nil { 
        
           			log.WithError(err).Print("Failed to shutdown health check server") 
        
           		} 
        
           	}() 
        
           	go func() { 
        
           		if err := s.ListenAndServe(); err != nil { 
        
           			log.WithError(err).Print("Failed to start health check server") 
        
           		} 
        
           	}() 
        
           }

Unfortunately, disabling the bpValidatingWebhook does not adjust the service created in https://github.com/kanisterio/kanister/blob/master/helm/kanister-operator/templates/service.yaml
So disabling the ValidatingWebhook actually breaks accessing /metrics via the service.
It also would break the health-checks (if there were any) cause the port has changed:

webhook server listens on port 9443
"normal" server listens on port 8000

To Reproduce

Install kanister with bpValidatingWebhook enabled
Access /metrics via https://kanister-kanister-operator/metrics -> metrics are served
Reinstall kanister with bpValidatingWebhook disabled
Access /metrics via https://kanister-kanister-operator/metrics -> error Connection refused

Expected behavior
Enabling/disabling the webhook should allow the metrics to be accessed. Also, it should not change the URL they can be reached at.

I would propose to decouple serving the webhook and the other endpoints. On the one hand, there is no need to serve the other endpoints via https. Also, as the certificate is self-signed, one would have to allow this self-signed certificate to be accepted when accessing the metrics. When accessing health-checks via https, kubernetes automatically accepts all certificates.

The text was updated successfully, but these errors were encountered:

ihcsim · 2022-04-11T16:50:16Z

Sounds like the Service resource needs to expose port 8000 when the webhook is disabled. Yeah, I would prefer the health check and metrics endpoints be decoupled from the admission endpoint. How would Prometheus even scrape the metrics tls endpoint if the server is using a self-signed cert?

muffl0n · 2022-04-11T17:23:14Z

One can set insecure_skip_verify in tls_config. But I'm not sure about how to do that with the preferred way with ServiceMonitor. Speaking of: we should add one for Kanister ;)

github-actions · 2022-06-13T00:11:26Z

This issue is marked as stale due to inactivity. Add a new comment to reactivate it.

ihcsim · 2022-06-13T02:35:33Z

Still relevant.

ihcsim · 2022-06-16T21:15:01Z

Fixed by #1476 and #1488.

muffl0n · 2022-06-21T19:34:07Z

Thank you for fixing this!

I took a deeper look and am having these concerns:
The endpoint (protocol and port) of the health and metrics service change when the webhook is enabled/disabled. For the user this is pretty much unexpected behavior.
Let’s say, the user configured a prometheus ServiceMonitor to scrape the operator. After enabling/disabling the webhook, some labels of the resulting metrics change unexpectedly. This could lead to confusion.

These are the solutions I am thinking of:

We enable the http server (for metrics and health) as a default and start the second https server for the webhook when it is configured to be enabled. We would need a second service for the webhook, but the metrics and health endpoint would never change.
We enable the https server as a default and just add the handlers we want dynamically: metrics and health as a default and webhook handler when enabled.

I would prefer solution 1, cause we really don’t need https for metrics or health, removing the need to ignore the „invalid“ certificate when scraping the metrics with prometheus.

What do you think?

ihcsim · 2022-06-21T21:19:13Z

I think ultimately, the metrics/health endpoints should be separated from the webhook. In addition,

i. the webhook should be a required component - what's the point of having it if I can just disable it
i. all traffic should be TLS'd - if someone can MITM your metrics data, they can get a very good sense of all the namespaces, workloads and policies in your cluster. The need to protect /metrics is why things like kube-rbac-proxy exists.

ihcsim · 2022-06-21T21:45:56Z

Created #1500 to track webhook hardening improvements.

muffl0n added the bug label Apr 11, 2022

pavannd1 added the triage label Apr 11, 2022

ihcsim removed the triage label Apr 13, 2022

github-actions bot added the stale label Jun 13, 2022

viveksinghggits self-assigned this Jun 13, 2022

viveksinghggits mentioned this issue Jun 13, 2022

Fix kanister service port issue in case if webhook is disabled #1476

Merged

10 tasks

github-actions bot removed the stale label Jun 14, 2022

ihcsim mentioned this issue Jun 16, 2022

Separate Service Secured And Insecured Ports #1488

Merged

10 tasks

ihcsim closed this as completed Jun 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Disabling webhook breaks metrics- and health-endpoints #1376

[BUG] Disabling webhook breaks metrics- and health-endpoints #1376

muffl0n commented Apr 11, 2022 •

edited

Loading

ihcsim commented Apr 11, 2022

muffl0n commented Apr 11, 2022

github-actions bot commented Jun 13, 2022

ihcsim commented Jun 13, 2022

ihcsim commented Jun 16, 2022

muffl0n commented Jun 21, 2022 •

edited

Loading

ihcsim commented Jun 21, 2022

ihcsim commented Jun 21, 2022

[BUG] Disabling webhook breaks metrics- and health-endpoints #1376

[BUG] Disabling webhook breaks metrics- and health-endpoints #1376

Comments

muffl0n commented Apr 11, 2022 • edited Loading

ihcsim commented Apr 11, 2022

muffl0n commented Apr 11, 2022

github-actions bot commented Jun 13, 2022

ihcsim commented Jun 13, 2022

ihcsim commented Jun 16, 2022

muffl0n commented Jun 21, 2022 • edited Loading

ihcsim commented Jun 21, 2022

ihcsim commented Jun 21, 2022

muffl0n commented Apr 11, 2022 •

edited

Loading

muffl0n commented Jun 21, 2022 •

edited

Loading