Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-396: Flp prom tls #158

Merged
merged 9 commits into from
Sep 6, 2022

Conversation

OlivierCazade
Copy link
Contributor

@OlivierCazade OlivierCazade commented Aug 26, 2022

Adding TLS to the FLP prometheus configuration.
This provide two options:

  • using an automatic mode using openshift annotations
  • letting the user provide its own certificates

Breaking change: spec.flowlogsPipeline.prometheusPort is now spec.flowlogsPipeline.prometheus.port

maximum: 65535
minimum: 1
type: integer
tlsType:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between tlsType: DISABLED and enable: false ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I changed it and manualtls only contain cert configuration now.


// TLS configuration.
// +optional
ManualTLS ClientTLS `json:"manualTls"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should not reuse ClientTLS here, and create a new struct, I believe it is not 100% relevant when used to configure tls on server side (or at least it should be renamed).
As @jpinsonneau pointed, ClientTLS.enable setting can be confusing as we have TLSType here. Also I believe InsecureSkipVerify is not relevant on server side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, after Julien comment, I changed the code to use CertificateReference instead of ClientTLS

@@ -201,6 +201,37 @@ type FlowCollectorKafka struct {
TLS ClientTLS `json:"tls"`
}

const (
PrometheusTLSDiasbled = "DISABLED"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo "Disabled"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed thanks

@@ -508,6 +539,48 @@ func (b *builder) service(old *corev1.Service) *corev1.Service {
return newService
}

func (b *builder) promService(old *corev1.Service) *corev1.Service {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already didn't like much this pattern of "if service doesn't exist do ... else do ..." (I blame myself for that), now it spreads in the code, it's ok for this PR but at some point I'd like to see if we can have something more concise | readable | elegant ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but as your comment state this come from the immutable cluster ip field in the service.

I am curious to know how the kubectl cli works when updating a service. Does it first check existing service and enrich the new one? This would be strange since it add a lot of service specific code to kubectl.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure either

// Select the type of TLS configuration
// "DISABLED" (default) to not configure TLS for the endpoint, "MANUAL" to manually provide cert file and a key file,
// and "AUTO" to use Openshift auto generated certificate using annotations
// +unionDiscriminator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @mariomac is that how unionDiscriminator works? Looking at your PR, I would think that the enum values need to match the discriminated field names; in other words, here, the enum for MANUAL would have to be ManualTLS ?
or this is pure coincidence in your code (with IPFIX/EBPF) and it is not required?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS I could find any doc about this unionDiscriminator annotation, do you have a link?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the documentation here

Discriminant values should be PascalCase and should be equivalent to the camelCase field name (json tag) of one member of the union

So having the enum and the field name matching is a recommendation, I will change the name here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, in the PR @jotak mention, it is done different, following the suggestion of an API review comment, but double-checking the documentation examples, it seems the correct way is the way Olivier did.

Copy link
Contributor

@mariomac mariomac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM. Just few comments open for discussion


// TLS configuration.
// +optional
Manual *CertificateReference `json:"manual"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd change manual by certificate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the recommendation this field name should match with the name in the enum discriminator.
I will also set it to provided to match your other comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could it be CERTIFICATE is a valid option for the tls type field instead of MANUAL or PROVIDED?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having only CERTIFICATE would be a bit ambiguous since AUTO mode also use certificate, it is just that they are generated and not provided by the user.

// "DISABLED" (default) to not configure TLS for the endpoint, "MANUAL" to manually provide cert file and a key file,
// and "AUTO" to use Openshift auto generated certificate using annotations
// +unionDiscriminator
// +kubebuilder:validation:Enum:="DISABLED";"MANUAL";"AUTO"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd change MANUAL by PROVIDED

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed, thanks!

@@ -510,6 +540,48 @@ func (b *builder) service(old *corev1.Service) *corev1.Service {
return newService
}

func (b *builder) promService(old *corev1.Service) *corev1.Service {
if old == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that later, promService method is invoked with a non-nil object and also explicitly nil.

Could be this method explicitly divided into two? e.g.:

func (b *builder) newPromService() *corev1.Service
func (b *builder) updatedPromService(old *corev1.Service) *corev1.Service

@@ -123,8 +126,8 @@ func (r *FLPReconciler) GetServiceName(kafka *flowsv1alpha1.FlowCollectorKafka)
}

func (r *FLPReconciler) Reconcile(ctx context.Context, desired *flowsv1alpha1.FlowCollector) error {
for _, singleFlp := range r.singleReconcilers {
err := singleFlp.Reconcile(ctx, desired)
for i := 0; i < len(r.singleReconcilers); i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's more "idiomatic":

for i := range r.singleReconcilers

but do it as you prefer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks

@jotak
Copy link
Member

jotak commented Sep 1, 2022

/lgtm

Tested with this ServiceMonitor (in "AUTO" mode) :

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: flowlogs-pipeline
  namespace: network-observability
spec:
  endpoints:
  - interval: 15s
    port: prometheus
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  namespaceSelector:
    matchNames:
    - network-observability
  selector:
    matchLabels:
      app: flowlogs-pipeline

Prom config in CR:

    prometheus:
      port: 9102
      tls:
        type: AUTO

Two remarks:

  1. We must figure out what's required to get rid of the insecureSkipVerify: true in ServiceMonitor and document it. I haven't dug much, but had a host name mismatch issue without it.
  2. I just noticed we use the old "alpha" version of the cert annotation, whereas a "beta" now exists and is recommended. We should update everywhere we use it.

@openshift-ci openshift-ci bot added the lgtm label Sep 1, 2022
@jotak
Copy link
Member

jotak commented Sep 1, 2022

Quick update: I'm trying with this ServiceMonitor and it doesn't work, I don't know why:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: flowlogs-pipeline
  namespace: network-observability
spec:
  endpoints:
  - interval: 15s
    port: prometheus
    scheme: https
    tlsConfig:
#      insecureSkipVerify: true
      caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
      serverName: flowlogs-pipeline-prom.network-observability.svc
  namespaceSelector:
    matchNames:
    - network-observability
  selector:
    matchLabels:
      app: flowlogs-pipeline

Whereas this works:

oc exec -n openshift-monitoring prometheus-k8s-0 -- curl https://flowlogs-pipeline-prom.network-observability.svc:9102/metrics --cacert /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt

Which I believe should be an equivalent call...

@OlivierCazade
Copy link
Contributor Author

The CA cert file was already present in the prometheus pod?

@jotak
Copy link
Member

jotak commented Sep 2, 2022

Yeah I expected the curl command would fail, but it worked :)
See also how other servicemonitors are configured: e.g. from monitor-ovn-node

    - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
      interval: 30s
      port: metrics
      scheme: https
      tlsConfig:
        caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
        serverName: ovn-kubernetes-node.openshift-ovn-kubernetes.svc

They all use the same service-ca.crt, and adapt serverName.

@Amoghrd
Copy link
Contributor

Amoghrd commented Sep 2, 2022

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Sep 2, 2022
@github-actions
Copy link

github-actions bot commented Sep 2, 2022

New image: ["quay.io/netobserv/network-observability-operator:0785789"]. It will expire after two weeks.

@OlivierCazade
Copy link
Contributor Author

I got the service monitor working with this config:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: flowlogs-pipeline
  namespace: network-observability
spec:
  endpoints:
  - interval: 15s
    port: prometheus
    scheme: https
    tlsConfig:
      # caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
      ca:
        configMap:
          name: "openshift-service-ca.crt"
          key: "service-ca.crt"
      serverName: flowlogs-pipeline-prom.network-observability.svc
  namespaceSelector:
    matchNames:
    - network-observability
  selector:
    matchLabels:
      app: flowlogs-pipeline

My understanding is that user workload metrics are handled by different pods in a different namespace. The contrary to the openshift monitoring pod the ca cert may not be already there.

@jotak
Copy link
Member

jotak commented Sep 6, 2022

/lgtm

@OlivierCazade
Copy link
Contributor Author

/approve

@openshift-ci
Copy link

openshift-ci bot commented Sep 6, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: OlivierCazade

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Sep 6, 2022
@openshift-merge-robot openshift-merge-robot merged commit 1a04395 into netobserv:main Sep 6, 2022
@jotak jotak added the breaking-change This pull request has breaking changes. They should be described in PR description. label Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved breaking-change This pull request has breaking changes. They should be described in PR description. lgtm ok-to-test To set manually when a PR is safe to test. Triggers image build on PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants